mirror of
https://github.com/opencv/opencv.git
synced 2025-06-20 10:00:51 +08:00
8035aade11
204 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
7a9ce585f0 | core(ocl): fix POWN OpenCL implementation | ||
![]() |
2fb786532a
|
Merge pull request #27257 from fengyuentau:4x/hal_rvv/flip_opt
hal_rvv: further optimized flip #27257 Checklist: - [x] flipX - [x] flipY - [x] flipXY ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
325e59bd4c
|
Merge pull request #27229 from fengyuentau:4x/hal_rvv/transpose
HAL: implemented cv_hal_transpose in hal_rvv #27229 Checklists: - [x] transpose2d_8u - [x] transpose2d_16u - [ ] ~transpose2d_8uC3~ - [x] transpose2d_32s - [ ] ~transpose2d_16uC3~ - [x] transpose2d_32sC2 - [ ] ~transpose_32sC3~ - [ ] ~transpose_32sC4~ - [ ] ~transpose_32sC6~ - [ ] ~transpose_32sC8~ - [ ] ~inplace transpose~ ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
1b3db545a3
|
Merge pull request #27145 from fengyuentau:4x/core/copyMask-simd
core: further vectorize copyTo with mask #27145 Merge with https://github.com/opencv/opencv_extra/pull/1247. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
8207549638
|
Merge pull request #26991 from fengyuentau:4x/core/norm2hal_rvv
core: improve norm of hal rvv #26991 Merge with https://github.com/opencv/opencv_extra/pull/1241 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
2090407002
|
Merge pull request #26999 from GenshinImpactStarts:polar_to_cart
[HAL RVV] unify and impl polar_to_cart | add perf test #26999 ### Summary 1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces. 2. Add `polarToCart` performance tests 3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar` 4. To achieve the 3rd point, the original implementation was moved, and some modifications were made. Tested through: ```sh opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300 ``` ### HAL performance test ***UPDATE***: Current implementation is no more depending on vlen. **NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`. Vlen 256 (Muse Pi): ``` Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44 ``` ### Effect of Point 4 To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`). #### Reason for Changes: This function works as follows: $y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`. Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation. ***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method. #### Test Result `scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. `scalar` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13 ``` `ui` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72 ``` AVX2 test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51 ``` A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
2a8d4b8e43
|
Merge pull request #27000 from GenshinImpactStarts:cart_to_polar
[HAL RVV] reuse atan | impl cart_to_polar | add perf test #27000 Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces. Add `cartToPolar` performance tests. cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar. **UPDATE**: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears. Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. ```sh $ opencv_test_core --gtest_filter="*CartToPolar*:*Core_CartPolar_reverse*:*Phase*" $ opencv_perf_core --gtest_filter="*CartToPolar*:*phase*" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.106 0.059 1.80 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.155 0.070 2.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 4.188 2.317 1.81 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 6.593 2.889 2.28 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 12.600 7.057 1.79 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 19.860 8.797 2.26 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809 1.79 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398 2.30 phase32f::VectorLength::128 0.002 0.002 1.20 phase32f::VectorLength::1000 0.008 0.006 1.32 phase32f::VectorLength::131072 1.061 0.731 1.45 phase32f::VectorLength::524288 3.997 2.976 1.34 phase32f::VectorLength::1048576 8.001 5.959 1.34 phase64f::VectorLength::128 0.002 0.002 1.33 phase64f::VectorLength::1000 0.012 0.008 1.58 phase64f::VectorLength::131072 1.648 0.931 1.77 phase64f::VectorLength::524288 6.836 3.837 1.78 phase64f::VectorLength::1048576 14.060 7.540 1.86 ``` Test result before and after enabling UI on RVV: ``` Name of Test perf perf perf ui ui ui orig pr pr vs perf ui orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.141 0.106 1.33 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.187 0.155 1.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 5.990 4.188 1.43 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 8.370 6.593 1.27 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 18.214 12.600 1.45 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 25.365 19.860 1.28 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295 1.43 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573 1.27 phase32f::VectorLength::128 0.003 0.002 1.54 phase32f::VectorLength::1000 0.016 0.008 1.90 phase32f::VectorLength::131072 2.048 1.061 1.93 phase32f::VectorLength::524288 8.219 3.997 2.06 phase32f::VectorLength::1048576 16.426 8.001 2.05 phase64f::VectorLength::128 0.003 0.002 1.44 phase64f::VectorLength::1000 0.020 0.012 1.60 phase64f::VectorLength::131072 2.621 1.648 1.59 phase64f::VectorLength::524288 10.780 6.836 1.58 phase64f::VectorLength::1048576 22.723 14.060 1.62 ``` Test result before and after modifying UI on AVX2: ``` Name of Test perf perf perf avx2 avx2 avx2 orig pr pr vs perf avx2 orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.006 0.005 1.14 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.010 0.009 1.08 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 0.273 0.264 1.03 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 0.511 0.487 1.05 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 0.760 0.723 1.05 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 2.009 1.937 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509 1.04 phase32f::VectorLength::128 0.000 0.000 0.98 phase32f::VectorLength::1000 0.001 0.001 0.97 phase32f::VectorLength::131072 0.105 0.111 0.95 phase32f::VectorLength::524288 0.402 0.402 1.00 phase32f::VectorLength::1048576 0.775 0.767 1.01 phase64f::VectorLength::128 0.000 0.000 1.00 phase64f::VectorLength::1000 0.001 0.001 1.01 phase64f::VectorLength::131072 0.163 0.162 1.01 phase64f::VectorLength::524288 0.669 0.653 1.02 phase64f::VectorLength::1048576 1.660 1.634 1.02 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
e30697fd42
|
Merge pull request #27002 from GenshinImpactStarts:magnitude
[HAL RVV] impl magnitude | add perf test #27002 Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces. **UPDATE**: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`. Perf test done on MUSE-PI. ```sh $ opencv_test_core --gtest_filter="*Magnitude*" $ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.029 0.016 1.75 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.057 0.036 1.57 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.063 0.648 1.64 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.261 1.530 1.48 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.261 2.118 1.54 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 6.802 4.682 1.45 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287 4.738 1.54 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334 1.47 ``` Test result before and after enabling UI: ``` Name of Test orig pr pr vs orig (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.032 0.029 1.11 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.067 0.057 1.17 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.228 1.063 1.16 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.786 2.261 1.23 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.762 3.261 1.15 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 8.549 6.802 1.26 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408 7.287 1.15 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226 1.24 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
eefa327f30
|
Merge pull request #27042 from fengyuentau:4x/core/normDiff_simd
core: vectorize normDiff with universal intrinsics #27042 Merge with https://github.com/opencv/opencv_extra/pull/1242. Performance results on Desktop Intel i7-12700K, Apple M2, Jetson Orin and SpaceMIT K1: [perf-normDiff.zip](https://github.com/user-attachments/files/19178689/perf-normDiff.zip) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
524d8ae01c |
impl exp and log | add log perf test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn> |
||
![]() |
57a78cb9df
|
Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv
Impl hal_rvv LUT | Add more LUT test #26941 Implement through the existing `cv_hal_lut` interfaces. Add more LUT accuracy and performance tests: - **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust. - **Performance test**: Multi-channel input and multi-channel table tests are added. Perf test done on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_LUT*" $ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300 ``` ```sh Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) LUT::SizePrm::320x240 0.248 0.249 0.052 1.00 4.74 LUT::SizePrm::640x480 0.277 0.275 0.085 1.01 3.28 LUT::SizePrm::1920x1080 0.950 0.947 0.634 1.00 1.50 LUT_multi2::SizePrm::320x240 2.051 2.045 2.049 1.00 1.00 LUT_multi2::SizePrm::640x480 2.128 2.134 2.125 1.00 1.00 LUT_multi2::SizePrm::1920x1080 7.397 7.380 7.390 1.00 1.00 LUT_multi::SizePrm::320x240 0.715 0.747 0.154 0.96 4.64 LUT_multi::SizePrm::640x480 0.741 0.766 0.257 0.97 2.88 LUT_multi::SizePrm::1920x1080 2.766 2.765 1.925 1.00 1.44 ``` This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions: - For the U8C1 table type, the optimization works when `vlen >= 256` - For U16C1, it works when `vlen >= 512` - For U32C1, it works when `vlen >= 1024` Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. This patch does not implement optimizations for multi-channel tables. Previous attempts: 1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version. 2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version. 3. Using `vluxei8` as a general solution does not show any performance improvement. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
6a6a5a765d
|
Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv
Impl RISC-V HAL for cv::flip | Add perf test for flip #26943 Implement through the existing `cv_hal_flip` interfaces. Add perf test for `cv::flip`. The reason why select these args for testing: - **size**: copied from perf_lut - **type**: - U8C1: basic situation - U8C3: unaligned element size - U8C4: large element size Tested on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*" $ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300 ``` ``` Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 0.026 0.033 0.031 0.81 0.84 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 0.206 0.212 0.091 0.97 2.26 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 0.185 0.189 0.082 0.98 2.25 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 0.070 0.084 0.084 0.83 0.83 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 0.616 0.612 0.235 1.01 2.62 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 0.587 0.603 0.204 0.97 2.88 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 0.263 0.110 0.109 2.40 2.41 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 0.930 0.831 0.316 1.12 2.95 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 1.175 1.129 0.313 1.04 3.75 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 0.303 0.118 0.111 2.57 2.73 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 0.949 0.836 0.405 1.14 2.34 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 0.784 0.783 0.409 1.00 1.92 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 1.084 0.360 0.355 3.01 3.06 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 3.768 3.348 1.364 1.13 2.76 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 4.361 4.473 1.296 0.97 3.37 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 1.252 0.469 0.451 2.67 2.78 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 5.732 5.220 1.303 1.10 4.40 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 5.041 5.105 1.203 0.99 4.19 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 2.382 0.903 0.903 2.64 2.64 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606 7.508 2.581 1.15 3.33 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 8.421 8.535 2.219 0.99 3.80 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 6.312 2.416 2.429 2.61 2.60 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761 1.12 2.29 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 25.373 25.500 13.382 1.00 1.90 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 7.620 3.204 3.115 2.38 2.45 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976 1.12 2.53 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 28.831 29.094 14.919 0.99 1.93 ``` The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
b5f5540e8a
|
Merge pull request #26886 from sk1er52:feature/exp64f
Enable SIMD_SCALABLE for exp and sqrt #26886 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` CPU - Banana Pi k1, compiler - clang 18.1.4 ``` ``` Geometric mean (ms) Name of Test baseline hal ui hal ui vs vs baseline baseline (x-factor) (x-factor) Exp::ExpFixture::(127x61, 32FC1) 0.358 -- 0.033 -- 10.70 Exp::ExpFixture::(640x480, 32FC1) 14.304 -- 1.167 -- 12.26 Exp::ExpFixture::(1280x720, 32FC1) 42.785 -- 3.538 -- 12.09 Exp::ExpFixture::(1920x1080, 32FC1) 96.206 -- 7.927 -- 12.14 Exp::ExpFixture::(127x61, 64FC1) 0.433 0.050 0.098 8.59 4.40 Exp::ExpFixture::(640x480, 64FC1) 17.315 1.935 3.813 8.95 4.54 Exp::ExpFixture::(1280x720, 64FC1) 52.181 5.877 11.519 8.88 4.53 Exp::ExpFixture::(1920x1080, 64FC1) 117.082 13.157 25.854 8.90 4.53 ``` Additionally, this PR brings Sqrt optimization with UI: ``` Geometric mean (ms) Name of Test baseline ui ui vs baseline (x-factor) Sqrt::SqrtFixture::(127x61, 5, false) 0.111 0.027 4.11 Sqrt::SqrtFixture::(127x61, 6, false) 0.149 0.053 2.82 Sqrt::SqrtFixture::(640x480, 5, false) 4.374 0.967 4.52 Sqrt::SqrtFixture::(640x480, 6, false) 5.885 2.046 2.88 Sqrt::SqrtFixture::(1280x720, 5, false) 12.960 2.915 4.45 Sqrt::SqrtFixture::(1280x720, 6, false) 17.648 6.107 2.89 Sqrt::SqrtFixture::(1920x1080, 5, false) 29.178 6.524 4.47 Sqrt::SqrtFixture::(1920x1080, 6, false) 39.709 13.670 2.90 ``` Reference Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006. https://www.springer.com/gp/book/9780817643720 |
||
![]() |
d32d4da9a3
|
Merge pull request #26887 from kyler1cartesis:4.x
invSqrt SIMD_SCALABLE implementation & HAL tests refactoring #26887 Enable CV_SIMD_SCALABLE for invSqrt. * Banana Pi BF3 (SpacemiT K1) RISC-V * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline simd simd scalable scalable vs baseline (x-factor) InvSqrtf::InvSqrtfFixture::(127x61, 32FC1) 0.163 0.051 3.23 InvSqrtf::InvSqrtfFixture::(127x61, 64FC1) 0.241 0.103 2.35 InvSqrtf::InvSqrtfFixture::(640x480, 32FC1) 6.460 1.893 3.41 InvSqrtf::InvSqrtfFixture::(640x480, 64FC1) 9.687 3.999 2.42 InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1) 19.292 5.701 3.38 InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1) 29.452 11.963 2.46 InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1) 43.326 12.805 3.38 InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1) 65.566 26.881 2.44 ``` |
||
![]() |
bfb54aa691 | Remove useless C headers | ||
![]() |
bf914a7681 | 8uc2 added | ||
![]() |
7590813b69
|
Merge pull request #26115 from savuor:rv/flip_ocl_dtypes
Added more data types to OCL flip() and rotate() perf tests #26115 Connected PR with updated sanity data: https://github.com/opencv/opencv_extra/pull/1206 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
f4c2e4f872
|
Merge pull request #26061 from penghuiho:fix-pow-bug
Fixed the simd bugs of iPow8u and iPow16u #26061 Add the following cases in opencv_perf_core: * OCL_PowFixture_iPow.iPow/0, where GetParam() = (640x480, 8UC1) * OCL_PowFixture_iPow.iPow/2, where GetParam() = (640x480, 16UC1) iPow8u and iPow16u failed to call to simd accelerating while executing. Fix the bug by changing the input type of iPow_SIMD function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
a7e53aa184
|
Merge pull request #25671 from savuor:rv/arithm_extend_tests
Tests added for mixed type arithmetic operations #25671 ### Changes * added accuracy tests for mixed type arithmetic operations _Note: div-by-zero values are removed from checking since the result is implementation-defined in common case_ * added perf tests for the same cases * fixed a typo in `getMulExtTab()` function that lead to dead code ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
b267f1791c
|
Merge pull request #25633 from savuor:rv/rotate_tests
Tests for cv::rotate() added #25633 fixes #25449 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
357b9abaef
|
Merge pull request #25450 from savuor:rv/svd_perf
Perf tests for SVD and solve() created #25450 fixes #25336 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
40533dbf69
|
Merge pull request #24918 from opencv-pushbot:gitee/alalek/core_convertfp16_replacement
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918 relates #24909 relates #24917 relates #24892 Performance changes: - [x] 12700K (1 thread) + Intel iGPU |Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH| |---|:-:|:-:|:-:|:-:| |ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136| |ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671| |ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854| |ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061| |ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721| |ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947| |ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528| |ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988| Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map). There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization). - [x] 12700K + AMD dGPU |Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU| |---|:-:|:-:|:-:|:-:| |ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087| |ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729| |ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452| |ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170| |ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689| |ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191| |ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834| |ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176| convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED. dGPU has much more power, so results are x16-17 better than single cpu core. Patched version is not worse than convertFp16 and convertTo baseline. There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers. Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com> |
||
![]() |
ea47cb3ffe
|
Merge pull request #24480 from savuor:backport_patch_nans
Backport to 4.x: patchNaNs() SIMD acceleration #24480 backport from #23098 connected PR in extra: [#1118@extra](https://github.com/opencv/opencv_extra/pull/1118) ### This PR contains: * new SIMD code for `patchNaNs()` * CPU perf test <details> <summary>Performance comparison</summary> Geometric mean (ms) |Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)| |---|:-:|:-:|:-:|:-:|:-:| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95| </details> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
![]() |
5fb3869775
|
Merge pull request #23109 from seanm:misc-warnings
* Fixed clang -Wnewline-eof warnings * Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings * Removed trailing semi from various macros * Fixed various -Wunused-macros warnings * Fixed some trivial -Wdocumentation warnings * Fixed some -Wdocumentation-deprecated-sync warnings * Fixed incorrect indentation * Suppressed some clang warnings in 3rd party code * Fixed QRCodeEncoder::Params documentation. --------- Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai> |
||
![]() |
a308dfca98
|
core: add broadcast (#23965)
* add broadcast_to with tests * change name * fix test * fix implicit type conversion * replace type of shape with InputArray * add perf test * add perf tests which takes care of axis * v2 from ficus expand * rename to broadcast * use randu in place of declare * doc improvement; smaller scale in perf * capture get_index by reference |
||
![]() |
60b806f9b8
|
Merge pull request #22947 from chacha21:hasNonZero
Added cv::hasNonZero() #22947 `cv::hasNonZero()` is semantically equivalent to (`cv::countNonZero()>0`) but stops parsing the image when a non-zero value is found, for a performance gain - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake This pull request might be refused, but I submit it to know if further work is needed or if I just stop working on it. The idea is only a performance gain vs `countNonZero()>0` at the cost of more code. Reasons why it might be refused : - this is just more code - the execution time is "unfair"/"unpredictable" since it depends on the position of the first non-zero value - the user must be aware that default search is from first row/col to last row/col and has no way to customize that, even if his use case lets him know where a non zero could be found - the PR in its current state is using, for the ocl implementation, a mere `countNonZero()>0` ; there is not much sense in trying to break early the ocl kernel call when non-zero is encountered. So the ocl implementation does not bring any improvement. - there is no IPP function that can help (`countNonZero()` is based in `ippCountInRange`) - the PR in its current state might be slower than a call to `countNonZero()>0` in some cases (see "challenges" below) Reasons why it might be accepted : - the performance gain is huge on average, if we consider that "on average" means "non zero in the middle of the image" - the "missing" IPP implementation is replaced by an "Open-CV universal intrinsics" implementation - the PR in its current state is almost always faster than a call to `countNonZero()>0`, is only slightly slower in the worst cases, and not even for all matrices **Challenges** The worst case is either an all-zero matrix, or a non-zero at the very last position. In such a case, the `hasNonZero()` implementation will parse the whole matrix like `countNonZero()` would do. But we expect the performance to be the same in this case. And `ippCountInRange` is hard to beat ! There is also the case of very small matrices (<=32x32...) in 8b, where the SIMD can be hard to feed. For all cases but the worse, my custom `hasNonZero()` performs better than `ippCountInRange()` For the worst case, my custom `hasNonZero()` performs better than `ippCountInRange()` *except for large matrices of type CV_32S or CV_64F* (but surprisingly, not CV_32F). The difference is small, but it exists (and I don't understand why). For very small CV_8U matrices `ippCountInRange()` seems unbeatable. Here is the code that I use to check timings ``` //test cv::hasNonZero() vs (cv::countNonZero()>0) for different matrices sizes, types, strides... { cv::setRNGSeed(1234); const std::vector<cv::Size> sizes = {{32, 32}, {64, 64}, {128, 128}, {320, 240}, {512, 512}, {640, 480}, {1024, 768}, {2048, 2048}, {1031, 1000}}; const std::vector<int> types = {CV_8U, CV_16U, CV_32S, CV_32F, CV_64F}; const size_t iterations = 1000; for(const cv::Size& size : sizes) { for(const int type : types) { for(int c = 0 ; c<2 ; ++c) { const bool continuous = !c; for(int i = 0 ; i<4 ; ++i) { cv::Mat m = continuous ? cv::Mat::zeros(size, type) : cv::Mat(cv::Mat::zeros(cv::Size(2*size.width, size.height), type), cv::Rect(cv::Point(0, 0), size)); const bool nz = (i <= 2); const unsigned int nzOffsetRange = 10; const unsigned int nzOffset = cv::randu<unsigned int>()%nzOffsetRange; const cv::Point pos = (i == 0) ? cv::Point(nzOffset, 0) : (i == 1) ? cv::Point(size.width/2-nzOffsetRange/2+nzOffset, size.height/2) : (i == 2) ? cv::Point(size.width-1-nzOffset, size.height-1) : cv::Point(0, 0); std::cout << "============================================================" << std::endl; std::cout << "size:" << size << " type:" << type << " continuous = " << (continuous ? "true" : "false") << " iterations:" << iterations << " nz=" << (nz ? "true" : "false"); std::cout << " pos=" << ((i == 0) ? "begin" : (i == 1) ? "middle" : (i == 2) ? "end" : "none"); std::cout << std::endl; cv::Mat mask = cv::Mat::zeros(size, CV_8UC1); mask.at<unsigned char>(pos) = 0xFF; m.setTo(cv::Scalar::all(0)); m.setTo(cv::Scalar::all(nz ? 1 : 0), mask); std::vector<bool> results; std::vector<double> timings; { bool res = false; auto ref = cv::getTickCount(); for(size_t k = 0 ; k<iterations ; ++k) res = cv::hasNonZero(m); auto now = cv::getTickCount(); const bool error = (res != nz); if (error) printf("!!ERROR!!\r\n"); results.push_back(res); timings.push_back(1000.*(now-ref)/cv::getTickFrequency()); } { bool res = false; auto ref = cv::getTickCount(); for(size_t k = 0 ; k<iterations ; ++k) res = (cv::countNonZero(m)>0); auto now = cv::getTickCount(); const bool error = (res != nz); if (error) printf("!!ERROR!!\r\n"); results.push_back(res); timings.push_back(1000.*(now-ref)/cv::getTickFrequency()); } const size_t bestTimingIndex = (std::min_element(timings.begin(), timings.end())-timings.begin()); if ((bestTimingIndex != 0) || (std::find_if_not(results.begin(), results.end(), [&](bool r) {return (r == nz);}) != results.end())) { std::cout << "cv::hasNonZero\t\t=>" << results[0] << ((results[0] != nz) ? " ERROR" : "") << " perf:" << timings[0] << "ms => " << (iterations/timings[0]*1000) << " im/s" << ((bestTimingIndex == 0) ? " * " : "") << std::endl; std::cout << "cv::countNonZero\t=>" << results[1] << ((results[1] != nz) ? " ERROR" : "") << " perf:" << timings[1] << "ms => " << (iterations/timings[1]*1000) << " im/s" << ((bestTimingIndex == 1) ? " * " : "") << std::endl; } } } } } } ``` Here is a report of this benchmark (it only reports timings when `cv::countNonZero()` is faster) My CPU is an Intel Core I7 4790 @ 3.60Ghz ``` ============================================================ size:[32 x 32] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:0 continuous = false iterations:1000 nz=true pos=middle cv::hasNonZero =>1 perf:0.353764ms => 2.82674e+06 im/s cv::countNonZero =>1 perf:0.282044ms => 3.54555e+06 im/s * ============================================================ size:[32 x 32] type:0 continuous = false iterations:1000 nz=true pos=end cv::hasNonZero =>1 perf:0.610478ms => 1.63806e+06 im/s cv::countNonZero =>1 perf:0.283182ms => 3.5313e+06 im/s * ============================================================ size:[32 x 32] type:0 continuous = false iterations:1000 nz=false pos=none cv::hasNonZero =>0 perf:0.630115ms => 1.58701e+06 im/s cv::countNonZero =>0 perf:0.282044ms => 3.54555e+06 im/s * ============================================================ size:[32 x 32] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:5 continuous = false iterations:1000 nz=true pos=end cv::hasNonZero =>1 perf:0.607347ms => 1.64651e+06 im/s cv::countNonZero =>1 perf:0.467037ms => 2.14116e+06 im/s * ============================================================ size:[32 x 32] type:5 continuous = false iterations:1000 nz=false pos=none cv::hasNonZero =>0 perf:0.618162ms => 1.6177e+06 im/s cv::countNonZero =>0 perf:0.468175ms => 2.13595e+06 im/s * ============================================================ size:[32 x 32] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[32 x 32] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[32 x 32] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[32 x 32] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[32 x 32] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[64 x 64] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[64 x 64] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[64 x 64] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[64 x 64] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[128 x 128] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[128 x 128] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[128 x 128] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[128 x 128] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[320 x 240] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[320 x 240] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[320 x 240] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[320 x 240] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[512 x 512] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[512 x 512] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[512 x 512] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[512 x 512] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[640 x 480] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[640 x 480] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[640 x 480] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[640 x 480] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1024 x 768] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1024 x 768] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1024 x 768] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1024 x 768] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:4 continuous = true iterations:1000 nz=true pos=end cv::hasNonZero =>1 perf:895.381ms => 1116.84 im/s cv::countNonZero =>1 perf:882.569ms => 1133.06 im/s * ============================================================ size:[2048 x 2048] type:4 continuous = true iterations:1000 nz=false pos=none cv::hasNonZero =>0 perf:899.53ms => 1111.69 im/s cv::countNonZero =>0 perf:870.894ms => 1148.24 im/s * ============================================================ size:[2048 x 2048] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[2048 x 2048] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:6 continuous = true iterations:1000 nz=true pos=end cv::hasNonZero =>1 perf:2018.92ms => 495.313 im/s cv::countNonZero =>1 perf:1966.37ms => 508.552 im/s * ============================================================ size:[2048 x 2048] type:6 continuous = true iterations:1000 nz=false pos=none cv::hasNonZero =>0 perf:2005.87ms => 498.537 im/s cv::countNonZero =>0 perf:1992.78ms => 501.812 im/s * ============================================================ size:[2048 x 2048] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[2048 x 2048] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[2048 x 2048] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[2048 x 2048] type:6 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:0 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:0 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:0 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:0 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:0 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:0 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:0 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:0 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:2 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:2 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:2 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:2 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:2 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:2 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:2 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:2 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:4 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:4 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:4 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:4 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:4 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:4 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:4 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:4 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:5 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:5 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:5 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:5 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:5 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:5 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:5 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:5 continuous = false iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:6 continuous = true iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:6 continuous = true iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:6 continuous = true iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:6 continuous = true iterations:1000 nz=false pos=none ============================================================ size:[1031 x 1000] type:6 continuous = false iterations:1000 nz=true pos=begin ============================================================ size:[1031 x 1000] type:6 continuous = false iterations:1000 nz=true pos=middle ============================================================ size:[1031 x 1000] type:6 continuous = false iterations:1000 nz=true pos=end ============================================================ size:[1031 x 1000] type:6 continuous = false iterations:1000 nz=false pos=none done ``` |
||
![]() |
6dd8a9b6ad
|
Merge pull request #13879 from chacha21:REDUCE_SUM2
add REDUCE_SUM2 #13879 proposal to add REDUCE_SUM2 to cv::reduce, an operation that sums up the square of elements |
||
![]() |
602caa9cd6
|
Merge pull request #21937 from Kumataro:4.x-fix-21911
* Fix warnings for clang15 * Fix warnings: Remove unnecessary code * Fix warnings: Remove unnecessary code |
||
![]() |
e16cb8b4a2
|
Merge pull request #21703 from rogday:transpose
Add n-dimensional transpose to core * add n-dimensional transpose to core * add performance test, write sequentially and address review comments |
||
![]() |
0e6a2c0491 | fix legacy constants | ||
![]() |
f044037ec5
|
Merge pull request #20733 from rogday:argmaxnd
Implement ArgMax and ArgMin * add reduceArgMax and reduceArgMin * fix review comments * address review concerns |
||
![]() |
c2ce3d927a
|
UMat usageFlags fixes opencv/opencv#19807
- corrects code to support non- USAGE_DEFAULT settings - accuracy, regression, perf test cases - not tested on the 3.x branch |
||
![]() |
68d15fc62e | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
f3f46096d6 |
Relax accuracy requirements in the OpenCL sqrt perf arithmetic test.
Also bring perf_imgproc CornerMinEigenVal accuracy requirements in line with the test_imgproc accuracy requirements on that test and fix indentation on the latter. Partially addresses issue #9821 |
||
![]() |
ad94d8cc4f
|
Merge pull request #19029 from diablodale:fix19004-memthreadstart
add thread-safe startup of fastMalloc and fastFree * add perf test core memory allocation * fix threading in isAlignedAllocationEnabled() * tweaks requested by maintainer |
||
![]() |
cf2a3c8e74 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
54063c40de |
core(ocl): options to control buffer access flags
- control using of clEnqueueMapBuffer or clEnqueueReadBuffer[Rect] - added benchmarks with OpenCL buffer access use cases |
||
![]() |
65573784c4 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
f2fe6f40c2 |
Merge pull request #15510 from seiko2plus:issue15506
* core: rework and optimize SIMD implementation of dotProd - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64) - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters - fix clang build on ppc64le - support wide universal intrinsics for dotProd_32s - remove raw SIMD and activate universal intrinsics for dotProd_8 - implement SIMD optimization for dotProd_s16&u16 - extend performance test data types of dotprod - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped) - optimize v_mul_expand(int32) on VSX * core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast this changes made depend on "terfendail" review |
||
![]() |
19a4b51371 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
b2135be594 |
fast_math: add extra perf/unit tests
Add a basic sanity test to verify the rounding functions work as expected. Likewise, extend the rounding performance test to cover the additional float -> int fast math functions. |
||
![]() |
332c37f332 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
f1f0f630c7 |
core: disable I/O perf test
- can be enable separately if needed - not stable (due storage I/O processing) |
||
![]() |
2e0150e601 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
00c9ab8c23 |
Merge pull request #13317 from terfendail:norm_wintr
* Added performance tests for hal::norm functions * Added sum of absolute differences intrinsic * norm implementation updated to use wide universal intrinsics * improve and fix v_reduce_sad on VSX |
||
![]() |
dca657a2fd | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
a39e0daacf | Utilize CV_UNUSED macro | ||
![]() |
73bfe68821 | Merge remote-tracking branch 'upstream/3.4' into merge-3.4 | ||
![]() |
80b62a41c6 |
Merge pull request #12411 from vpisarev:wide_convert
* rewrote Mat::convertTo() and convertScaleAbs() to wide universal intrinsics; added always-available and SIMD-optimized FP16<=>FP32 conversion * fixed compile warnings * fix some more compile errors * slightly relaxed accuracy threshold for int->float conversion (since we now do it using single-precision arithmetics, not double-precision) * fixed compile errors on iOS, Android and in the baseline C++ version (intrin_cpp.hpp) * trying to fix ARM-neon builds * trying to fix ARM-neon builds * trying to fix ARM-neon builds * trying to fix ARM-neon builds |
||
![]() |
54279523a3 |
Merge pull request #12437 from vpisarev:avx2_fixes
* trying to fix the custom AVX2 builder test failures (false alarms) * fixed compile error with CPU_BASELINE=AVX2 on x86; raised tolerance thresholds in a couple of tests * fixed compile error with CPU_BASELINE=AVX2 on x86; raised tolerance thresholds in a couple of tests * fixed compile error with CPU_BASELINE=AVX2 on x86; raised tolerance thresholds in a couple of tests * seemingly disabled false alarm warning in surf.cpp; increased tolerance thresholds in the tests for SolvePnP and in DNN/ENet |