opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-24 14:06:27 +08:00

Author	SHA1	Message	Date
Alexander Smorkalov	0a5352ee27	Merge pull request #27346 from asmorkalov:as/ipp_hal_sum New HAL entry for cv::sum and IPP adoption	2025-05-24 16:53:42 +03:00
Maxim Smolskiy	023d14ecc4	Merge pull request #27347 from MaximSmolskiy:improve-solveCubic-accuracy Improve solveCubic accuracy #27347 ### Pull Request Readiness Checklist Fix #27323 ``` 2e-13 * x^3 + x^2 - 2 * x + 1 = 0 -> x^3 + 5e12 * x^2 - 1e13 * x + 5e12 = 0 ``` The problem that coefficients have quite big magnitudes and current calculations are subject to round-off error ``` Q = (a1 * a1 - 3 * a2) * (1./9) R = (2 * a1 * a1 * a1 - 9 * a1 * a2 + 27 * a3) * (1./54) Qcubed = Q * Q * Q = a1^6/729 - (a1^4 a2)/81 + (a1^2 a2^2)/27 - a2^3/27 R * R = R^2 = a1^6/729 - (a1^4 a2)/81 + (a1^2 a2^2)/36 + (a1^3 a3)/27 - (a1 a2 a3)/6 + a3^2/4 d = Qcubed - R * R ``` Let `a1`, `a2`, `a3` have quite big same magnitudes, then we see that `Qcubed` and `R * R` have same terms `a1^6/729` and `-(a1^4 a2)/81` (which will be reduced in `d`), but they level out the other terms (these terms have `6`th and `5`th degree and other terms - less or equal than `4`th degree). So, if these terms will participate in the calculation, this will lead to a huge round-off error. But if we expand the expression, then round-off error should be less ``` d = Qcubed - R * R = 1/108 (a1^2 a2^2 - 4 a2^3 - 4 a1^3 a3 + 18 a1 a2 a3 - 27 a3^2) ``` See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2025-05-24 09:56:48 +03:00
Alexander Smorkalov	388b6dd81f	New HAL entry for cv::sum and IPP adoption.	2025-05-23 12:35:11 +03:00
Maksim Shabunin	e6fb6c290c	core: legacy intrin operators - fixed version warning condition	2025-05-22 21:00:49 +03:00
Yuantao Feng	c37f54aeed	Merge pull request #27343 from fengyuentau:4x/build/fix_more_warnings build: fix more warnings from recent gcc versions after #27337 #27343 More fixings after https://github.com/opencv/opencv/pull/27337 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-05-21 16:12:09 +03:00
Alexander Smorkalov	dc610867e1	Merge pull request #27342 from MaximSmolskiy:fix-bug-in-solvePoly-test Fix bug in solvePoly test	2025-05-21 11:19:01 +03:00
MaximSmolskiy	d4c4493413	Fix bug in solvePoly test	2025-05-21 09:31:43 +03:00
Yuantao Feng	166f76d224	Merge pull request #27337 from fengyuentau:4x/build/riscv/fix_warnings build: fix warnings from recent gcc versions #27337 This PR addresses the following found warnings: - [x] -Wmaybe-uninitialized - [x] -Wunused-variable - [x] -Wsign-compare Tested building with GCC 14.2 (RISC-V 64). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-05-21 09:28:29 +03:00
Maxim Smolskiy	d00738d97c	Merge pull request #27331 from MaximSmolskiy:add-test-for-solveCubic Add tests for solveCubic #27331 ### Pull Request Readiness Checklist Related to #27323 I found only randomized tests with number of roots always equal to `1` or `3`, `x^3 = 0` and some simple test for Java and Swift. Obviously, they don't cover all cases (implementation has strong branching and number of roots can be equal to `-1`, `0` and `2` additionally). So, I think it will be useful to try explicitly cover more cases (and implementation branches correspondingly) See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2025-05-21 08:36:35 +03:00
Maksim Shabunin	a9b298eb47	Restored legacy intrinsics operators in a separate header	2025-05-17 16:44:07 +03:00
Vincent Rabaud	9201ca1af1	Merge pull request #27321 from vrabaud:norm Make sure to not access outside normDiffTabMake sure to not access outside normDiffTab #27321 If the norm is outside the array (e.g. Hamming), memory is read outside of the array, which does not matter because the invalid pointer is not used oustide of the function (e.g. the Hamming path is taken) but it triggers the sanitizer. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-05-17 09:59:08 +03:00
Alexander Smorkalov	f3cffcd85d	Merge pull request #27322 from vrabaud:zeros Add missing Mat_<_Tp>::zeros(int _ndims, const int* _sizes)	2025-05-17 09:55:05 +03:00
Vincent Rabaud	1a624efc0f	Add missing Mat_<_Tp>::zeros(int _ndims, const int* _sizes)	2025-05-16 10:50:15 +02:00
ruisv	9ab3a249c2	remove private.cuda.hpp:158 space	2025-05-07 11:46:43 +08:00
ruisv	8a2903c190	CUDA 12.9 support: build NppStreamContext manually	2025-05-06 23:47:12 +08:00
Alexander Smorkalov	c248d47110	Merge pull request #27268 from Kumataro:fix27267 doc: hal: replace C++ operators with wrapper functions	2025-05-05 09:20:46 +03:00
Alexander Alekhin	7a9ce585f0	core(ocl): fix POWN OpenCL implementation	2025-05-01 20:57:23 +00:00
Kumataro	37be2a2a68	doc: hal: replace C++ operators with wrapper functions	2025-04-30 05:40:16 +09:00
Yuantao Feng	2fb786532a	Merge pull request #27257 from fengyuentau:4x/hal_rvv/flip_opt hal_rvv: further optimized flip #27257 Checklist: - [x] flipX - [x] flipY - [x] flipXY ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-04-26 11:08:29 +03:00
adsha-quic	edccfa7961	Merge pull request #27184 from CodeLinaro:gemm_fastcv_hal FastCV gemm hal #27184 FastCV hal for gemm 32f ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-04-25 11:07:26 +03:00
Yuantao Feng	325e59bd4c	Merge pull request #27229 from fengyuentau:4x/hal_rvv/transpose HAL: implemented cv_hal_transpose in hal_rvv #27229 Checklists: - [x] transpose2d_8u - [x] transpose2d_16u - [ ] ~transpose2d_8uC3~ - [x] transpose2d_32s - [ ] ~transpose2d_16uC3~ - [x] transpose2d_32sC2 - [ ] ~transpose_32sC3~ - [ ] ~transpose_32sC4~ - [ ] ~transpose_32sC6~ - [ ] ~transpose_32sC8~ - [ ] ~inplace transpose~ ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-04-22 11:03:26 +03:00
Alexander Smorkalov	fa7a0c1e12	Migrated IPP impl for flip and transpose to HAL.	2025-04-10 08:51:12 +03:00
Alexander Smorkalov	78662ac085	Transfer IPP polarToCart to HAL.	2025-04-09 19:13:18 +03:00
Alexander Smorkalov	e826a41eeb	Merge pull request #27202 from asmorkalov:as/drop_ipp_lut Dropped inefficient (disabled) IPP integration for LUT.	2025-04-08 09:07:52 +03:00
Alexander Smorkalov	8f74086d3f	Drop commented out convertTo impl with IPP.	2025-04-07 14:18:37 +03:00
Alexander Smorkalov	91e078be93	Dropped inefficient (disabled) IPP integration for LUT.	2025-04-07 14:11:13 +03:00
Yuantao Feng	1b3db545a3	Merge pull request #27145 from fengyuentau:4x/core/copyMask-simd core: further vectorize copyTo with mask #27145 Merge with https://github.com/opencv/opencv_extra/pull/1247. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-04-07 10:56:02 +03:00
Alexander Smorkalov	0b3155980a	Merge pull request #25394 from Gao-HaoYuan:in_place_convertTo Added reinterpret() method to Mat to convert meta-data without actual data conversion	2025-04-04 10:40:33 +03:00
Yuantao Feng	ec1cbe294a	Merge pull request #27162 from fengyuentau:4x/hal_rvv/copyMask HAL: added copyToMask and implemented in hal_rvv #27162 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-31 10:49:37 +03:00
Ryan Wong	afc7c0a89c	Merge pull request #27154 from kinchungwong:logging_callback_simple_c User-defined logger callback, C-style. #27154 This is a competing PR, an alternative to #27140 Both functions accept C-style pointer to static functions. Both functions allow restoring the OpenCV built-in implementation by passing in a nullptr. - replaceWriteLogMessage - replaceWriteLogMessageEx This implementation is not compatible with C++ log handler objects. This implementation has minimal thread safety, in the sense that the function pointer are stored and read atomically. But otherwise, the user-defined static functions must accept calls at all times, even after having been deregistered, because some log calls may have started before deregistering. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-30 16:17:07 +03:00
Yuantao Feng	a2a2f37ebb	Merge pull request #27115 from fengyuentau:4x/hal_rvv/normDiff core: refactored normDiff in hal_rvv and extended with support of more data types #27115 Merge wtih https://github.com/opencv/opencv_extra/pull/1246. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-25 07:59:59 +03:00
Alexander Smorkalov	a77623a32b	Move IPP minMaxIdx to HAL.	2025-03-24 09:21:22 +03:00
Alexander Smorkalov	0944f7ad26	Merge pull request #27128 from asmorkalov:as/ipp_norm Move IPP norm and normDiff to HAL #27128 Continues https://github.com/opencv/opencv/pull/26880 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-24 09:17:22 +03:00
Alexander Smorkalov	01ef38dcad	Merge pull request #26880 from asmorkalov:as/ipp_hal Initial version of IPP-based HAL for x86 and x86_64 platforms #26880 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-22 09:31:42 +03:00
Yuantao Feng	8207549638	Merge pull request #26991 from fengyuentau:4x/core/norm2hal_rvv core: improve norm of hal rvv #26991 Merge with https://github.com/opencv/opencv_extra/pull/1241 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-18 09:42:55 +03:00
GenshinImpactStarts	2090407002	Merge pull request #26999 from GenshinImpactStarts:polar_to_cart [HAL RVV] unify and impl polar_to_cart \| add perf test #26999 ### Summary 1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces. 2. Add `polarToCart` performance tests 3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar` 4. To achieve the 3rd point, the original implementation was moved, and some modifications were made. Tested through: ```sh opencv_test_core --gtest_filter="PolarToCart:Core_CartPolar_reverse" opencv_perf_core --gtest_filter="PolarToCart" --perf_min_samples=300 --perf_force_samples=300 ``` ### HAL performance test *UPDATE: Current implementation is no more depending on vlen. NOTE: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`. Vlen 256 (Muse Pi): ``` Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44 ``` ### Effect of Point 4 To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`). #### Reason for Changes: This function works as follows: $y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`. Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation. UPDATE*: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method. #### Test Result `scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. `scalar` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13 ``` `ui` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72 ``` AVX2 test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51 ``` A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-17 14:16:09 +03:00
GenshinImpactStarts	2a8d4b8e43	Merge pull request #27000 from GenshinImpactStarts:cart_to_polar [HAL RVV] reuse atan \| impl cart_to_polar \| add perf test #27000 Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces. Add `cartToPolar` performance tests. cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar. UPDATE: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears. Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. ```sh $ opencv_test_core --gtest_filter="CartToPolar:Core_CartPolar_reverse:Phase" $ opencv_perf_core --gtest_filter="CartToPolar:phase" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.106 0.059 1.80 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.155 0.070 2.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 4.188 2.317 1.81 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 6.593 2.889 2.28 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 12.600 7.057 1.79 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 19.860 8.797 2.26 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809 1.79 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398 2.30 phase32f::VectorLength::128 0.002 0.002 1.20 phase32f::VectorLength::1000 0.008 0.006 1.32 phase32f::VectorLength::131072 1.061 0.731 1.45 phase32f::VectorLength::524288 3.997 2.976 1.34 phase32f::VectorLength::1048576 8.001 5.959 1.34 phase64f::VectorLength::128 0.002 0.002 1.33 phase64f::VectorLength::1000 0.012 0.008 1.58 phase64f::VectorLength::131072 1.648 0.931 1.77 phase64f::VectorLength::524288 6.836 3.837 1.78 phase64f::VectorLength::1048576 14.060 7.540 1.86 ``` Test result before and after enabling UI on RVV: ``` Name of Test perf perf perf ui ui ui orig pr pr vs perf ui orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.141 0.106 1.33 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.187 0.155 1.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 5.990 4.188 1.43 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 8.370 6.593 1.27 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 18.214 12.600 1.45 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 25.365 19.860 1.28 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295 1.43 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573 1.27 phase32f::VectorLength::128 0.003 0.002 1.54 phase32f::VectorLength::1000 0.016 0.008 1.90 phase32f::VectorLength::131072 2.048 1.061 1.93 phase32f::VectorLength::524288 8.219 3.997 2.06 phase32f::VectorLength::1048576 16.426 8.001 2.05 phase64f::VectorLength::128 0.003 0.002 1.44 phase64f::VectorLength::1000 0.020 0.012 1.60 phase64f::VectorLength::131072 2.621 1.648 1.59 phase64f::VectorLength::524288 10.780 6.836 1.58 phase64f::VectorLength::1048576 22.723 14.060 1.62 ``` Test result before and after modifying UI on AVX2: ``` Name of Test perf perf perf avx2 avx2 avx2 orig pr pr vs perf avx2 orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.006 0.005 1.14 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.010 0.009 1.08 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 0.273 0.264 1.03 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 0.511 0.487 1.05 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 0.760 0.723 1.05 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 2.009 1.937 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509 1.04 phase32f::VectorLength::128 0.000 0.000 0.98 phase32f::VectorLength::1000 0.001 0.001 0.97 phase32f::VectorLength::131072 0.105 0.111 0.95 phase32f::VectorLength::524288 0.402 0.402 1.00 phase32f::VectorLength::1048576 0.775 0.767 1.01 phase64f::VectorLength::128 0.000 0.000 1.00 phase64f::VectorLength::1000 0.001 0.001 1.01 phase64f::VectorLength::131072 0.163 0.162 1.01 phase64f::VectorLength::524288 0.669 0.653 1.02 phase64f::VectorLength::1048576 1.660 1.634 1.02 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-13 15:56:56 +03:00
Alexander Smorkalov	b129abfdaa	Merge pull request #27055 from hanliutong:UI-loop-condition Fix some vectorized loop conditions.	2025-03-13 14:12:30 +03:00
Liutong HAN	fd62bd0991	Relax the loop condition to process the final batch.	2025-03-13 07:54:41 +00:00
GenshinImpactStarts	e30697fd42	Merge pull request #27002 from GenshinImpactStarts:magnitude [HAL RVV] impl magnitude \| add perf test #27002 Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces. UPDATE: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`. Perf test done on MUSE-PI. ```sh $ opencv_test_core --gtest_filter="Magnitude" $ opencv_perf_core --gtest_filter="Magnitude" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.029 0.016 1.75 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.057 0.036 1.57 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.063 0.648 1.64 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.261 1.530 1.48 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.261 2.118 1.54 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 6.802 4.682 1.45 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287 4.738 1.54 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334 1.47 ``` Test result before and after enabling UI: ``` Name of Test orig pr pr vs orig (x-factor) Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.032 0.029 1.11 Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.067 0.057 1.17 Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.228 1.063 1.16 Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.786 2.261 1.23 Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.762 3.261 1.15 Magnitude::MagnitudeFixture::(1280x720, 64FC1) 8.549 6.802 1.26 Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408 7.287 1.15 Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226 1.24 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-13 08:34:11 +03:00
Alexander Smorkalov	49ab8121b7	Merge pull request #27050 from hanliutong:rvv-fix-27003 RISC-V: Fix #27003.	2025-03-12 17:32:19 +03:00
Yuantao Feng	eefa327f30	Merge pull request #27042 from fengyuentau:4x/core/normDiff_simd core: vectorize normDiff with universal intrinsics #27042 Merge with https://github.com/opencv/opencv_extra/pull/1242. Performance results on Desktop Intel i7-12700K, Apple M2, Jetson Orin and SpaceMIT K1: [perf-normDiff.zip](https://github.com/user-attachments/files/19178689/perf-normDiff.zip) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-12 16:43:10 +03:00
Liutong HAN	2969b67bd7	Fix 27003.	2025-03-12 12:15:05 +00:00
Suleyman TURKMEN	656038346b	Merge pull request #26441 from sturkmen72:upd_tutorials Update tutorials #26441 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-11 16:17:21 +03:00
GenshinImpactStarts	0fed1fa184	fix exp, log \| enable ui for log \| strengthen test Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-03-07 17:11:26 +00:00
GenshinImpactStarts	524d8ae01c	impl exp and log \| add log perf test Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-03-07 17:11:26 +00:00
天音あめ	bb525fe91d	Merge pull request #26865 from amane-ame:dxt_hal_rvv Add RISC-V HAL implementation for cv::dft and cv::dct #26865 This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`. The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary. Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot. Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="DFT" $ opencv_perf_core --gtest_filter="dft:dct" --perf_min_samples=30 --perf_force_samples=30 ``` The head of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf) <img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-07 11:08:41 +03:00
GenshinImpactStarts	57a78cb9df	Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv Impl hal_rvv LUT \| Add more LUT test #26941 Implement through the existing `cv_hal_lut` interfaces. Add more LUT accuracy and performance tests: - Accuracy test: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust. - Performance test: Multi-channel input and multi-channel table tests are added. Perf test done on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_LUT" $ opencv_perf_core --gtest_filter="SizePrm_LUT" --perf_min_samples=300 --perf_force_samples=300 ``` ```sh Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) LUT::SizePrm::320x240 0.248 0.249 0.052 1.00 4.74 LUT::SizePrm::640x480 0.277 0.275 0.085 1.01 3.28 LUT::SizePrm::1920x1080 0.950 0.947 0.634 1.00 1.50 LUT_multi2::SizePrm::320x240 2.051 2.045 2.049 1.00 1.00 LUT_multi2::SizePrm::640x480 2.128 2.134 2.125 1.00 1.00 LUT_multi2::SizePrm::1920x1080 7.397 7.380 7.390 1.00 1.00 LUT_multi::SizePrm::320x240 0.715 0.747 0.154 0.96 4.64 LUT_multi::SizePrm::640x480 0.741 0.766 0.257 0.97 2.88 LUT_multi::SizePrm::1920x1080 2.766 2.765 1.925 1.00 1.44 ``` This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions: - For the U8C1 table type, the optimization works when `vlen >= 256` - For U16C1, it works when `vlen >= 512` - For U32C1, it works when `vlen >= 1024` Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. This patch does not implement optimizations for multi-channel tables. Previous attempts: 1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version. 2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version. 3. Using `vluxei8` as a general solution does not show any performance improvement. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-06 11:17:00 +03:00
Liutong HAN	97abffbdac	Merge pull request #27006 from hanliutong:rvv-fix-ui-1024 Fix issues in RISC-V Vector (RVV) Universal Intrinsic #27006 This PR aims to make `opencv_test_core` pass on RVV, via following two parts: 1. Fix bug in Universal Intrinsic when VLEN >= 512: - `max_nlanes` should be multiplied by 2, because we use LMUL=2 in RVV Universal Intrinsic since #26318. - Related tests are also expanded to match longer registers - Relax the precision threshold of `v_erf` to make the tests pass 2. Temporary fix #26936 - Disable 3 Universal Intrinsic code blocks on GCC - This is just a temporary fix until we figure out if it's our issue or GCC/something else's This patch is tested under the following conditions: - Compier: GCC 14.2, Clang 19.1.7 - Device: Muse-Pi (VLEN=256), QEMU (VLEN=512, 1024) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-04 16:49:59 +03:00
Alexander Smorkalov	1aa69292b0	Backported some CALL_HAL improvements from 5.x #26946	2025-03-03 16:22:48 +03:00

1 2 3 4 5 ...

5614 Commits