opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-30 09:16:50 +08:00

Author	SHA1	Message	Date
Yuantao Feng	8207549638	Merge pull request #26991 from fengyuentau:4x/core/norm2hal_rvv core: improve norm of hal rvv #26991 Merge with https://github.com/opencv/opencv_extra/pull/1241 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-18 09:42:55 +03:00
GenshinImpactStarts	2090407002	Merge pull request #26999 from GenshinImpactStarts:polar_to_cart [HAL RVV] unify and impl polar_to_cart \| add perf test #26999 ### Summary 1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces. 2. Add `polarToCart` performance tests 3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar` 4. To achieve the 3rd point, the original implementation was moved, and some modifications were made. Tested through: ```sh opencv_test_core --gtest_filter="PolarToCart:Core_CartPolar_reverse" opencv_perf_core --gtest_filter="PolarToCart" --perf_min_samples=300 --perf_force_samples=300 ``` ### HAL performance test *UPDATE: Current implementation is no more depending on vlen. NOTE: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`. Vlen 256 (Muse Pi): ``` Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44 ``` ### Effect of Point 4 To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`). #### Reason for Changes: This function works as follows: $y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`. Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation. UPDATE*: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method. #### Test Result `scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. `scalar` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13 ``` `ui` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72 ``` AVX2 test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51 ``` A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-17 14:16:09 +03:00
GenshinImpactStarts	2a8d4b8e43	Merge pull request #27000 from GenshinImpactStarts:cart_to_polar [HAL RVV] reuse atan \| impl cart_to_polar \| add perf test #27000 Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces. Add `cartToPolar` performance tests. cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar. UPDATE: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears. Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. ```sh $ opencv_test_core --gtest_filter="CartToPolar:Core_CartPolar_reverse:Phase" $ opencv_perf_core --gtest_filter="CartToPolar:phase" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.106 0.059 1.80 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.155 0.070 2.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 4.188 2.317 1.81 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 6.593 2.889 2.28 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 12.600 7.057 1.79 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 19.860 8.797 2.26 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809 1.79 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398 2.30 phase32f::VectorLength::128 0.002 0.002 1.20 phase32f::VectorLength::1000 0.008 0.006 1.32 phase32f::VectorLength::131072 1.061 0.731 1.45 phase32f::VectorLength::524288 3.997 2.976 1.34 phase32f::VectorLength::1048576 8.001 5.959 1.34 phase64f::VectorLength::128 0.002 0.002 1.33 phase64f::VectorLength::1000 0.012 0.008 1.58 phase64f::VectorLength::131072 1.648 0.931 1.77 phase64f::VectorLength::524288 6.836 3.837 1.78 phase64f::VectorLength::1048576 14.060 7.540 1.86 ``` Test result before and after enabling UI on RVV: ``` Name of Test perf perf perf ui ui ui orig pr pr vs perf ui orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.141 0.106 1.33 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.187 0.155 1.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 5.990 4.188 1.43 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 8.370 6.593 1.27 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 18.214 12.600 1.45 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 25.365 19.860 1.28 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295 1.43 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573 1.27 phase32f::VectorLength::128 0.003 0.002 1.54 phase32f::VectorLength::1000 0.016 0.008 1.90 phase32f::VectorLength::131072 2.048 1.061 1.93 phase32f::VectorLength::524288 8.219 3.997 2.06 phase32f::VectorLength::1048576 16.426 8.001 2.05 phase64f::VectorLength::128 0.003 0.002 1.44 phase64f::VectorLength::1000 0.020 0.012 1.60 phase64f::VectorLength::131072 2.621 1.648 1.59 phase64f::VectorLength::524288 10.780 6.836 1.58 phase64f::VectorLength::1048576 22.723 14.060 1.62 ``` Test result before and after modifying UI on AVX2: ``` Name of Test perf perf perf avx2 avx2 avx2 orig pr pr vs perf avx2 orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.006 0.005 1.14 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.010 0.009 1.08 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 0.273 0.264 1.03 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 0.511 0.487 1.05 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 0.760 0.723 1.05 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 2.009 1.937 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509 1.04 phase32f::VectorLength::128 0.000 0.000 0.98 phase32f::VectorLength::1000 0.001 0.001 0.97 phase32f::VectorLength::131072 0.105 0.111 0.95 phase32f::VectorLength::524288 0.402 0.402 1.00 phase32f::VectorLength::1048576 0.775 0.767 1.01 phase64f::VectorLength::128 0.000 0.000 1.00 phase64f::VectorLength::1000 0.001 0.001 1.01 phase64f::VectorLength::131072 0.163 0.162 1.01 phase64f::VectorLength::524288 0.669 0.653 1.02 phase64f::VectorLength::1048576 1.660 1.634 1.02 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-13 15:56:56 +03:00
Alexander Smorkalov	b129abfdaa	Merge pull request #27055 from hanliutong:UI-loop-condition Fix some vectorized loop conditions.	2025-03-13 14:12:30 +03:00
Liutong HAN	fd62bd0991	Relax the loop condition to process the final batch.	2025-03-13 07:54:41 +00:00
GenshinImpactStarts	e30697fd42	Merge pull request #27002 from GenshinImpactStarts:magnitude [HAL RVV] impl magnitude \| add perf test #27002 Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces. UPDATE: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`. Perf test done on MUSE-PI. ```sh $ opencv_test_core --gtest_filter="Magnitude" $ opencv_perf_core --gtest_filter="Magnitude" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.029 0.016 1.75 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.057 0.036 1.57 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.063 0.648 1.64 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.261 1.530 1.48 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.261 2.118 1.54 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 6.802 4.682 1.45 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287 4.738 1.54 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334 1.47 ``` Test result before and after enabling UI: ``` Name of Test orig pr pr vs orig (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.032 0.029 1.11 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.067 0.057 1.17 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.228 1.063 1.16 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.786 2.261 1.23 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.762 3.261 1.15 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 8.549 6.802 1.26 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408 7.287 1.15 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226 1.24 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-13 08:34:11 +03:00
Yuantao Feng	eefa327f30	Merge pull request #27042 from fengyuentau:4x/core/normDiff_simd core: vectorize normDiff with universal intrinsics #27042 Merge with https://github.com/opencv/opencv_extra/pull/1242. Performance results on Desktop Intel i7-12700K, Apple M2, Jetson Orin and SpaceMIT K1: [perf-normDiff.zip](https://github.com/user-attachments/files/19178689/perf-normDiff.zip) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-12 16:43:10 +03:00
GenshinImpactStarts	0fed1fa184	fix exp, log \| enable ui for log \| strengthen test Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-03-07 17:11:26 +00:00
天音あめ	bb525fe91d	Merge pull request #26865 from amane-ame:dxt_hal_rvv Add RISC-V HAL implementation for cv::dft and cv::dct #26865 This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`. The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary. Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot. Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="DFT" $ opencv_perf_core --gtest_filter="dft:dct" --perf_min_samples=30 --perf_force_samples=30 ``` The head of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf) <img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-07 11:08:41 +03:00
Liutong HAN	97abffbdac	Merge pull request #27006 from hanliutong:rvv-fix-ui-1024 Fix issues in RISC-V Vector (RVV) Universal Intrinsic #27006 This PR aims to make `opencv_test_core` pass on RVV, via following two parts: 1. Fix bug in Universal Intrinsic when VLEN >= 512: - `max_nlanes` should be multiplied by 2, because we use LMUL=2 in RVV Universal Intrinsic since #26318. - Related tests are also expanded to match longer registers - Relax the precision threshold of `v_erf` to make the tests pass 2. Temporary fix #26936 - Disable 3 Universal Intrinsic code blocks on GCC - This is just a temporary fix until we figure out if it's our issue or GCC/something else's This patch is tested under the following conditions: - Compier: GCC 14.2, Clang 19.1.7 - Device: Muse-Pi (VLEN=256), QEMU (VLEN=512, 1024) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-04 16:49:59 +03:00
Alexander Smorkalov	1aa69292b0	Backported some CALL_HAL improvements from 5.x #26946	2025-03-03 16:22:48 +03:00
GenshinImpactStarts	33d632f85e	impl hal_rvv norm_hamming Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-02-25 02:31:02 +00:00
Daniil Anufriev	b5f5540e8a	Merge pull request #26886 from sk1er52:feature/exp64f Enable SIMD_SCALABLE for exp and sqrt #26886 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` CPU - Banana Pi k1, compiler - clang 18.1.4 ``` ``` Geometric mean (ms) Name of Test baseline hal ui hal ui vs vs baseline baseline (x-factor) (x-factor) Exp::ExpFixture::(127x61, 32FC1) 0.358 -- 0.033 -- 10.70 Exp::ExpFixture::(640x480, 32FC1) 14.304 -- 1.167 -- 12.26 Exp::ExpFixture::(1280x720, 32FC1) 42.785 -- 3.538 -- 12.09 Exp::ExpFixture::(1920x1080, 32FC1) 96.206 -- 7.927 -- 12.14 Exp::ExpFixture::(127x61, 64FC1) 0.433 0.050 0.098 8.59 4.40 Exp::ExpFixture::(640x480, 64FC1) 17.315 1.935 3.813 8.95 4.54 Exp::ExpFixture::(1280x720, 64FC1) 52.181 5.877 11.519 8.88 4.53 Exp::ExpFixture::(1920x1080, 64FC1) 117.082 13.157 25.854 8.90 4.53 ``` Additionally, this PR brings Sqrt optimization with UI: ``` Geometric mean (ms) Name of Test baseline ui ui vs baseline (x-factor) Sqrt::SqrtFixture::(127x61, 5, false) 0.111 0.027 4.11 Sqrt::SqrtFixture::(127x61, 6, false) 0.149 0.053 2.82 Sqrt::SqrtFixture::(640x480, 5, false) 4.374 0.967 4.52 Sqrt::SqrtFixture::(640x480, 6, false) 5.885 2.046 2.88 Sqrt::SqrtFixture::(1280x720, 5, false) 12.960 2.915 4.45 Sqrt::SqrtFixture::(1280x720, 6, false) 17.648 6.107 2.89 Sqrt::SqrtFixture::(1920x1080, 5, false) 29.178 6.524 4.47 Sqrt::SqrtFixture::(1920x1080, 6, false) 39.709 13.670 2.90 ``` Reference Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006. https://www.springer.com/gp/book/9780817643720	2025-02-21 17:36:54 +03:00
Yuantao Feng	e2803bee5c	Merge pull request #26885 from fengyuentau:4x/core/normalize_simd core: vectorize cv::normalize / cv::norm #26885 Checklist: \| \| normInf \| normL1 \| normL2 \| \| ---- \| ------- \| ------ \| ------ \| \| bool \| - \| - \| - \| \| 8u \| √ \| √ \| √ \| \| 8s \| √ \| √ \| √ \| \| 16u \| √ \| √ \| √ \| \| 16s \| √ \| √ \| √ \| \| 16f \| - \| - \| - \| \| 16bf \| - \| - \| - \| \| 32u \| - \| - \| - \| \| 32s \| √ \| √ \| √ \| \| 32f \| √ \| √ \| √ \| \| 64u \| - \| - \| - \| \| 64s \| - \| - \| - \| \| 64f \| √ \| √ \| √ \| *: Vectorization of data type bool, 16f, 16bf, 32u, 64u and 64s needs to be done on 5.x. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-21 13:49:11 +03:00
kyler1cartesis	d32d4da9a3	Merge pull request #26887 from kyler1cartesis:4.x invSqrt SIMD_SCALABLE implementation & HAL tests refactoring #26887 Enable CV_SIMD_SCALABLE for invSqrt. * Banana Pi BF3 (SpacemiT K1) RISC-V * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline simd simd scalable scalable vs baseline (x-factor) InvSqrtf::InvSqrtfFixture::(127x61, 32FC1) 0.163 0.051 3.23 InvSqrtf::InvSqrtfFixture::(127x61, 64FC1) 0.241 0.103 2.35 InvSqrtf::InvSqrtfFixture::(640x480, 32FC1) 6.460 1.893 3.41 InvSqrtf::InvSqrtfFixture::(640x480, 64FC1) 9.687 3.999 2.42 InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1) 19.292 5.701 3.38 InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1) 29.452 11.963 2.46 InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1) 43.326 12.805 3.38 InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1) 65.566 26.881 2.44 ```	2025-02-19 12:13:48 +03:00
Alexander Smorkalov	58e557d059	Merge pull request #26903 from asmorkalov:as/openvx_hal Migrate remaning OpenVX integrations to OpenVX HAL (core) #26903 Tested with OpenVX 1.2 & 1.3 sample implementation. Steps to build and test: ``` git clone git@github.com:KhronosGroup/OpenVX-sample-impl.git cd OpenVX-sample-impl python3 Build.py --os=Linux --conf=Release cd .. mkdir build cmake -DWITH_OPENVX=ON -DOPENVX_ROOT=/mnt/Projects/Projects/OpenVX-sample-impl/install/Linux/x64/Release/ ../opencv make -j8 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-14 11:55:20 +03:00
Alexander Smorkalov	8e65075c1e	Merge pull request #26895 from asmorkalov:as/mean_hal Use HAL for cv::mean function too	2025-02-12 12:02:14 +03:00
Alexander Smorkalov	7aaada4175	Use HAL for cv::mean function too.	2025-02-12 08:51:20 +03:00
Alexander Smorkalov	7eaddb8aa4	Merge pull request #26897 from shyama7004:typos-fix fix hal_replacement typos	2025-02-11 09:37:08 +03:00
shyama7004	a490623bf0	fix hal_replacement typos	2025-02-10 20:33:33 +05:30
Alexander Smorkalov	aeac913203	Merge pull request #26882 from shyama7004:nullptr-setidentity replace null literals with nullptr and optimize setidentity with std::fill	2025-02-07 17:18:49 +03:00
Alexander Smorkalov	740388b3ce	Merge pull request #26867 from shyama7004:fix-meanStdDev fix meanStdDev overflow for large images	2025-02-07 13:13:35 +03:00
shyama7004	987ba6504b	fix meanStdDev overflow for large images	2025-02-07 10:17:48 +03:00
shyama7004	32d3d54ca1	replace null literals with nullptr; optimize setidentity with std::fill for cv_64fc1	2025-02-06 23:48:23 +05:30
天音あめ	2e909c38dc	Merge pull request #26804 from amane-ame:norm_hal_rvv Add RISC-V HAL implementation for cv::norm and cv::normalize #26804 This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`. `cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this. Tested on MUSE-PI for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="Norm" $ opencv_perf_core --gtest_filter="norm" --perf_min_samples=300 --perf_force_samples=300 ``` The head of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf) <img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-06 19:34:54 +03:00
天音あめ	13b2caffe0	Merge pull request #26789 from amane-ame:minmax_hal_rvv Add RISC-V HAL implementation for minMaxIdx #26789 On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV). `1d701d1690/modules/core/src/minmax.cpp (L209-L214)` This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel. Tested on MUSE-PI for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="MinMaxLoc" $ opencv_perf_core --gtest_filter="minMaxLoc" ``` <img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-01-31 14:26:49 +03:00
Maxim Smolskiy	08a24ba2cf	Merge pull request #26846 from MaximSmolskiy:fix_bug_with_int64_support_for_FileStorage Fix bug with int64 support for FileStorage #26846 ### Pull Request Readiness Checklist Fix #26829, https://github.com/opencv/opencv-python/issues/1078 In current implementation of `int64` support raw size of recorded integer is variable (`4` or `8` bytes depending on value). But then we iterate over nodes we need to know it exact value `dfad11aae7/modules/core/src/persistence.cpp (L2596-L2609)` Bug is that `rawSize` method still return `4` for any integer. I haven't figured out a way how to get variable raw size for integer in this method. I made raw size for integer is constant and equal to `8`. Yes, after this patch memory consumption for integers will increase, but I don't know a better way to do it yet. At least this fixes bug and implementation becomes more correct See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2025-01-28 11:08:26 +03:00
eplankin	ae57c54d83	Merge pull request #26463 from eplankin:icv_update_2022.0.0 Update IPP integration #26463 Please merge together with https://github.com/opencv/opencv_3rdparty/pull/88 Supported IPP version was updated to IPP 2022.0.0 for Linux and Windows. 32-bit binaries are dropped since this release. Previous update: https://github.com/opencv/opencv/pull/25935	2025-01-27 17:02:36 +03:00
Alexander Smorkalov	6f24d755f2	Merge pull request #26798 from brad0:opencv_powerpc_elf_aux_info Add CMake checks for getauxval and elf_aux_info for POWER	2025-01-20 13:33:46 +03:00
Brad Smith	918196ec1b	Add CMake checks for getauxval and elf_aux_info for POWER - Change __unix__ check for feature detection. NetBSD does not have either API. - Adds support for OpenBSD/powerpc64.	2025-01-19 13:27:20 -05:00
Brad Smith	93023e1a68	OpenCL: OpenBSD build fix	2025-01-19 02:46:25 -05:00
Vincent Rabaud	bfb54aa691	Remove useless C headers	2025-01-13 16:34:28 +01:00
Skreg	f00814e38d	Merge pull request #26602 from shyama7004:minor-fix Improved dumpVector, cv::Rect operator<< and exceptions #26602 - Applied format for vector element formatting to ensure consistent and clear output representation. - Moved `operator<<` to the `cv` namespace to align with OpenCV's coding standards and improve maintainability. - Enhanced error handling by including detailed exception messages using `e.what()` for better debugging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-01-10 15:02:18 +03:00
Maksim Shabunin	0756dbfe3d	RISC-V: enabled intrinsics in dotProd, relaxed test thresholds	2024-12-24 00:58:54 +03:00
Pierre Chatelier	d77abeddd0	Merge pull request #26472 from chacha21:gpumatnd_step More convenient GpuMatND constructor #26472 Closes #26471 For convenience, GpuMatND can now accept a step.size() equal to size.size(), as long as the last step is equal to elemSize() - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-12-17 17:26:14 +03:00
Kumataro	260f511dfb	Merge pull request #26590 from Kumataro:fix26589 Support C++20 standard #26590 Close https://github.com/opencv/opencv/issues/26589 Related https://github.com/opencv/opencv_contrib/pull/3842 Related: https://github.com/opencv/opencv/issues/20269 - do not arithmetic enums and ( different enums or floating numeric) - remove unused variable ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-12-17 07:40:27 +03:00
Maksim Shabunin	c58b6bf11f	Fixed several cases of unaligned pointer cast	2024-12-02 16:15:23 +03:00
Rostislav Vasilikhin	64d3111377	Merge pull request #26459 from savuor:rv/hal_absdiff_scalar HAL added for absdiff(array, scalar) + related fixes #26459 ### This PR changes * HAL for `absdiff` when one of arguments is a scalar, including multichannel arrays and scalars * several channels support for HAL `addScalar` * proper data type check for `addScalar` when one of arguments is a scalar ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-11-19 10:35:49 +03:00
Alexander Smorkalov	1ff16cb551	Backport some of C API removal in core module implementation.	2024-11-14 11:24:00 +03:00
Rostislav Vasilikhin	67f07b16cb	Merge pull request #25624 from savuor:rv/hal_addscalar HAL added for add(array, scalar) #25624 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-11-13 08:33:19 +03:00
Alexander Smorkalov	ff639d11d4	Merge pull request #26451 from savuor:rv/fix_get_handle Build fix for opencl_core.cpp	2024-11-12 20:50:48 +03:00
Rostislav Vasilikhin	641f43dd48	build fix	2024-11-12 17:04:42 +01:00
Dmitry Kurtaev	c230841105	Merge pull request #26446 from dkurt:file_storage_empty_and_1d_mat * Change style of empty and 1d Mat in FileStorage * Remove misleading 1d Mat test	2024-11-12 17:51:10 +03:00
Dmitry Kurtaev	37c2af63f0	Merge pull request #26434 from dkurt:dk/int64_file_storage_4.x int64 data type support for FileStorage. 1d and empty Mat with exact dimensions #26434 ### Pull Request Readiness Checklist Port of https://github.com/opencv/opencv/pull/26399 to 4.x branch See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-11-11 14:13:33 +03:00
Maksim Shabunin	04818d6dd5	build: made environment access a separate feature	2024-10-30 18:37:22 +03:00
Maksim Shabunin	52100328d8	WinRT/UWP build: fix some specific warnings	2024-10-25 22:32:44 +03:00
Liutong HAN	35571be570	Merge pull request #26318 from hanliutong:rvv-intrin-m2 Use LMUL=2 in the RISC-V Vector (RVV) backend of Universal Intrinsic. #26318 The modification of this patch involves the RVV backend of Universal Intrinsic, replacing `LMUL=1` with `LMUL=2`. Now each Universal Intrinsic type actually corresponds to two RVV vector registers, and each Intrinsic function also operates two vector registers. Considering that algorithms written using Universal Intrinsic usually do not use the maximum number of registers, this can help the RVV backend utilize more register resources without modifying the algorithm implementation This patch is generally beneficial in performance. We compiled OpenCV with `Clang-19.1.1` and `GCC-14.2.0` , ran it on `CanMV-k230` and `Banana-Pi F3`. Then we have four scenarios on combinations of compilers and devices. In `opencv_perf_core`, there are 3363 cases, of which: - 901 (26.8%) cases achieved more than `5%` performance improvement in all four scenarios, and the average speedup of these test cases (compared to scalar) increased from `3.35x` to `4.35x` - 75 (2.2%) cases had more than `5%` performance loss in all four scenarios, indicating that these cases are better with `LMUL=1` instead of `LMUL=2`. This involves `Mat_Transform`, `hasNonZero`, `KMeans`, `meanStdDev`, `merge` and `norm2`. Among them, `Mat_Transform` only has performance degradation in a few cases (`8UC3`), and the actual execution time of `hasNonZero` is so short that it can be ignored. For `KMeans`, `meanStdDev`, `merge` and `norm2`, we should be able to use the HAL to optimize/restore their performance. (In fact, we have already done this for `merge` #26216 ) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-10-24 10:08:43 +03:00
Alexander Smorkalov	e026a5ad8a	Merge pull request #26281 from kallaballa:clgl_device_discovery Rewrote OpenCL-OpenGL-interop device discovery routine without extensions and with Apple support	2024-10-18 15:52:17 +03:00
kallaballa	3edcf410b6	more guarding	2024-10-11 02:18:14 +02:00
kallaballa	4cbb96b396	use new instead of malloc and guard it	2024-10-10 15:14:58 +02:00

1 2 3 4 5 ...

3121 Commits