Faster implementation of blobFromImages for cpu nchw output #26127
Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case
Running time on my pc in ms:
**blobFromImage**
```
image size old new speed-up
32x32x3 0.008 0.002 4.0x
64x64x3 0.021 0.009 2.3x
128x128x3 0.164 0.037 4.4x
256x256x3 0.728 0.158 4.6x
512x512x3 3.310 0.628 5.2x
1024x1024x3 14.503 3.124 4.6x
2048x2048x3 61.647 28.049 2.2x
```
**blobFromImages**
```
image size old new speed-up
16x32x32x3 0.122 0.041 3.0x
16x64x64x3 0.790 0.165 4.8x
16x128x128x3 3.313 0.652 5.1x
16x256x256x3 13.495 3.127 4.3x
16x512x512x3 58.795 28.127 2.1x
16x1024x1024x3 251.135 121.955 2.1x
16x2048x2048x3 1023.570 487.188 2.1x
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake