Extract all HALs from 3rdparty to dedicated folder. #27252
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
FastCV gemm hal #27184
FastCV hal for gemm 32f
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
HAL: implemented cv_hal_transpose in hal_rvv #27229
Checklists:
- [x] transpose2d_8u
- [x] transpose2d_16u
- [ ] ~transpose2d_8uC3~
- [x] transpose2d_32s
- [ ] ~transpose2d_16uC3~
- [x] transpose2d_32sC2
- [ ] ~transpose_32sC3~
- [ ] ~transpose_32sC4~
- [ ] ~transpose_32sC6~
- [ ] ~transpose_32sC8~
- [ ] ~inplace transpose~
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
HAL: implemented cv_hal_dotProduct in hal_rvv #27201
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Optimize gaussian blur performance in FastCV HAL #27217
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Parallel_for in box Filter and support for 32f box filter in Fastcv hal #27182
Added parallel_for in box filter hal and support for 32f box filter
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Adding latest FastCV static libs
updated libs PR: [opencv/opencv_3rdparty/pull/94](https://github.com/opencv/opencv_3rdparty/pull/94)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
HAL: implemented cv_hal_div* and cv_hal_recip* in hal_rvv #27175
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
HAL: added copyToMask and implemented in hal_rvv #27162
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::resize #27160
This patch implements `cv_hal_resize` using native intrinsics, optimizing the performance of `cv::resize` for `CV_INTER_NEAREST/CV_INTER_NEAREST_EXACT/CV_INTER_LINEAR/CV_INTER_LINEAR_EXACT/CV_INTER_AREA` modes.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.1.
```
$ ./opencv_test_imgproc --gtest_filter="*Resize*:*resize*"
$ ./opencv_perf_imgproc --gtest_filter="*Resize*:*resize*" --perf_min_samples=300 --perf_force_samples=300
```
View the full perf table here: [hal_rvv_resize.pdf](https://github.com/user-attachments/files/19480756/hal_rvv_resize.pdf)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::warp series #27119
This patch implements `cv_hal_remap`, `cv_hal_warpAffine` and `cv_hal_warpPerspective` using native intrinsics, optimizing the performance of `cv::remap/cv::warpAffine/cv::warpPerspective` for `CV_HAL_INTER_NEAREST/CV_HAL_INTER_LINEAR/CV_HAL_INTER_CUBIC/CV_HAL_INTER_LANCZOS4` modes.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*Remap*:*Warp*"
$ ./opencv_perf_imgproc --gtest_filter="*Remap*:*remap*:*Warp*" --perf_min_samples=200 --perf_force_samples=200
```
View the full perf table here: [hal_rvv_warp.pdf](https://github.com/user-attachments/files/19403718/hal_rvv_warp.pdf)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
core: refactored normDiff in hal_rvv and extended with support of more data types #27115
Merge wtih https://github.com/opencv/opencv_extra/pull/1246.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Move IPP norm and normDiff to HAL #27128
Continues https://github.com/opencv/opencv/pull/26880
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Initial version of IPP-based HAL for x86 and x86_64 platforms #26880
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix RISC-V HAL solve/SVD and BGRtoLab #27046Closes#27044.
Also suppressed some warnings in other HAL.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::blur series #27097
This patch implements `cv_hal_gaussianBlurBinomial`, `cv_hal_medianBlur`, `cv_hal_boxFilter` and `cv_hal_bilateralFilter` using native intrinsics, optimizing the performance of `cv::GaussianBlur/cv::medianBlur/cv::boxFilter/cv::bilateralFilter` for `3x3/5x5` kernels.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*Filter*:*Blur*"
$ ./opencv_perf_imgproc --gtest_filter="*gauss*:*box*:*Bilateral*:*median*" --perf_min_samples=2000 --perf_force_samples=2000
```
View the full perf table here: [hal_rvv_blur.pdf](https://github.com/user-attachments/files/19335582/hal_rvv_blur.pdf)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::moments #27096
This patch implements `cv_hal_imageMoments` using native intrinsics, optimizing the performance of `cv::moments` for data types `CV_16U/CV_16S/CV_32F/CV_64F`.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*Moments*"
$ ./opencv_perf_imgproc --gtest_filter="*Moments*" --perf_min_samples=1000 --perf_force_samples=1000
```

### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
core: improve norm of hal rvv #26991
Merge with https://github.com/opencv/opencv_extra/pull/1241
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::threshold and cv::adaptiveThreshold #27072
This patch implements `cv_hal_threshold_otsu` and `cv_hal_adaptiveThreshold` using native intrinsics, optimizing the performance of `cv::threshold(THRESH_OTSU)` and `cv::adaptiveThreshold`.
Since UI is as fast as HAL `cv_hal_rvv::threshold::threshold` so `cv_hal_threshold` is not redirected, but this part of HAL is keeped because `cv_hal_threshold_otsu` depends on it.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*thresh*:*Thresh*"
$ ./opencv_perf_imgproc --gtest_filter="*otsu*:*adaptiveThreshold*" --perf_min_samples=1000 --perf_force_samples=1000
```

### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[HAL RVV] unify and impl polar_to_cart | add perf test #26999
### Summary
1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces.
2. Add `polarToCart` performance tests
3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar`
4. To achieve the 3rd point, the original implementation was moved, and some modifications were made.
Tested through:
```sh
opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*"
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300
```
### HAL performance test
***UPDATE***: Current implementation is no more depending on vlen.
**NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`.
Vlen 256 (Muse Pi):
```
Name of Test scalar ui rvv ui rvv
vs vs
scalar scalar
(x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44
```
### Effect of Point 4
To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`).
#### Reason for Changes:
This function works as follows:
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$.
However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`.
Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.
***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.
#### Test Result
`scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.
`scalar` test:
```
Name of Test orig pr pr
vs
orig
(x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13
```
`ui` test:
```
Name of Test orig pr pr
vs
orig
(x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72
```
AVX2 test:
```
Name of Test orig pr pr
vs
orig
(x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11
PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74
PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22
PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78
PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29
PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51
```
A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[RVV HAL] Add copyright and replace '#pragma once'. #27056
Add copyright and in RVV HAL, since other companies or teams may join the development and add their copyright.
And the '#pragma once' are replaced.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[HAL RVV] reuse atan | impl cart_to_polar | add perf test #27000
Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces.
Add `cartToPolar` performance tests.
cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar.
**UPDATE**: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears.
Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.
```sh
$ opencv_test_core --gtest_filter="*CartToPolar*:*Core_CartPolar_reverse*:*Phase*"
$ opencv_perf_core --gtest_filter="*CartToPolar*:*phase*" --perf_min_samples=300 --perf_force_samples=300
```
Test result between enabled UI and HAL:
```
Name of Test ui rvv rvv
vs
ui
(x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.106 0.059 1.80
CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.155 0.070 2.20
CartToPolar::CartToPolarFixture::(640x480, 32FC1) 4.188 2.317 1.81
CartToPolar::CartToPolarFixture::(640x480, 64FC1) 6.593 2.889 2.28
CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 12.600 7.057 1.79
CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 19.860 8.797 2.26
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809 1.79
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398 2.30
phase32f::VectorLength::128 0.002 0.002 1.20
phase32f::VectorLength::1000 0.008 0.006 1.32
phase32f::VectorLength::131072 1.061 0.731 1.45
phase32f::VectorLength::524288 3.997 2.976 1.34
phase32f::VectorLength::1048576 8.001 5.959 1.34
phase64f::VectorLength::128 0.002 0.002 1.33
phase64f::VectorLength::1000 0.012 0.008 1.58
phase64f::VectorLength::131072 1.648 0.931 1.77
phase64f::VectorLength::524288 6.836 3.837 1.78
phase64f::VectorLength::1048576 14.060 7.540 1.86
```
Test result before and after enabling UI on RVV:
```
Name of Test perf perf perf
ui ui ui
orig pr pr
vs
perf
ui
orig
(x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.141 0.106 1.33
CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.187 0.155 1.20
CartToPolar::CartToPolarFixture::(640x480, 32FC1) 5.990 4.188 1.43
CartToPolar::CartToPolarFixture::(640x480, 64FC1) 8.370 6.593 1.27
CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 18.214 12.600 1.45
CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 25.365 19.860 1.28
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295 1.43
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573 1.27
phase32f::VectorLength::128 0.003 0.002 1.54
phase32f::VectorLength::1000 0.016 0.008 1.90
phase32f::VectorLength::131072 2.048 1.061 1.93
phase32f::VectorLength::524288 8.219 3.997 2.06
phase32f::VectorLength::1048576 16.426 8.001 2.05
phase64f::VectorLength::128 0.003 0.002 1.44
phase64f::VectorLength::1000 0.020 0.012 1.60
phase64f::VectorLength::131072 2.621 1.648 1.59
phase64f::VectorLength::524288 10.780 6.836 1.58
phase64f::VectorLength::1048576 22.723 14.060 1.62
```
Test result before and after modifying UI on AVX2:
```
Name of Test perf perf perf
avx2 avx2 avx2
orig pr pr
vs
perf
avx2
orig
(x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.006 0.005 1.14
CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.010 0.009 1.08
CartToPolar::CartToPolarFixture::(640x480, 32FC1) 0.273 0.264 1.03
CartToPolar::CartToPolarFixture::(640x480, 64FC1) 0.511 0.487 1.05
CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 0.760 0.723 1.05
CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 2.009 1.937 1.04
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923 1.04
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509 1.04
phase32f::VectorLength::128 0.000 0.000 0.98
phase32f::VectorLength::1000 0.001 0.001 0.97
phase32f::VectorLength::131072 0.105 0.111 0.95
phase32f::VectorLength::524288 0.402 0.402 1.00
phase32f::VectorLength::1048576 0.775 0.767 1.01
phase64f::VectorLength::128 0.000 0.000 1.00
phase64f::VectorLength::1000 0.001 0.001 1.01
phase64f::VectorLength::131072 0.163 0.162 1.01
phase64f::VectorLength::524288 0.669 0.653 1.02
phase64f::VectorLength::1048576 1.660 1.634 1.02
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[HAL RVV] impl magnitude | add perf test #27002
Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces.
**UPDATE**: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`.
Perf test done on MUSE-PI.
```sh
$ opencv_test_core --gtest_filter="*Magnitude*"
$ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300
```
Test result between enabled UI and HAL:
```
Name of Test ui rvv rvv
vs
ui
(x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.029 0.016 1.75
Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.057 0.036 1.57
Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.063 0.648 1.64
Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.261 1.530 1.48
Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.261 2.118 1.54
Magnitude::MagnitudeFixture::(1280x720, 64FC1) 6.802 4.682 1.45
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287 4.738 1.54
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334 1.47
```
Test result before and after enabling UI:
```
Name of Test orig pr pr
vs
orig
(x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.032 0.029 1.11
Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.067 0.057 1.17
Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.228 1.063 1.16
Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.786 2.261 1.23
Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.762 3.261 1.15
Magnitude::MagnitudeFixture::(1280x720, 64FC1) 8.549 6.802 1.26
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408 7.287 1.15
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226 1.24
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[HAL RVV] impl sqrt and invSqrt #27015
Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`.
Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.
In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead.
BTW, I think this patch can directly replace RVV's UI.
**UPDATE**: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended.
```sh
$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*"
$ opencv_perf_core --gtest_filter="SqrtFixture.*" --perf_min_samples=300 --perf_force_samples=300
```
CanMV K230:
```
Name of Test ui rvv rvv
vs
ui
(x-factor)
Sqrt::SqrtFixture::(127x61, 5, false) 0.052 0.027 1.96
Sqrt::SqrtFixture::(127x61, 5, true) 0.101 0.026 3.80
Sqrt::SqrtFixture::(127x61, 6, false) 0.106 0.059 1.79
Sqrt::SqrtFixture::(127x61, 6, true) 0.207 0.058 3.55
Sqrt::SqrtFixture::(640x480, 5, false) 1.988 0.956 2.08
Sqrt::SqrtFixture::(640x480, 5, true) 3.920 0.948 4.13
Sqrt::SqrtFixture::(640x480, 6, false) 4.179 2.342 1.78
Sqrt::SqrtFixture::(640x480, 6, true) 8.220 2.290 3.59
Sqrt::SqrtFixture::(1280x720, 5, false) 5.969 2.881 2.07
Sqrt::SqrtFixture::(1280x720, 5, true) 11.731 2.857 4.11
Sqrt::SqrtFixture::(1280x720, 6, false) 12.533 7.031 1.78
Sqrt::SqrtFixture::(1280x720, 6, true) 24.643 6.917 3.56
Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483 2.07
Sqrt::SqrtFixture::(1920x1080, 5, true) 26.379 6.436 4.10
Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833 1.78
Sqrt::SqrtFixture::(1920x1080, 6, true) 55.434 15.565 3.56
```
MUSE-PI:
```
GCC | clang
Name of Test ui rvv rvv | ui rvv rvv
vs | vs
ui | ui
(x-factor) | (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false) 0.027 0.018 1.46 | 0.027 0.016 1.65
Sqrt::SqrtFixture::(127x61, 5, true) 0.050 0.017 2.98 | 0.050 0.017 2.99
Sqrt::SqrtFixture::(127x61, 6, false) 0.053 0.031 1.72 | 0.052 0.032 1.64
Sqrt::SqrtFixture::(127x61, 6, true) 0.100 0.030 3.31 | 0.101 0.035 2.86
Sqrt::SqrtFixture::(640x480, 5, false) 0.955 0.483 1.98 | 0.959 0.499 1.92
Sqrt::SqrtFixture::(640x480, 5, true) 1.873 0.489 3.83 | 1.873 0.520 3.60
Sqrt::SqrtFixture::(640x480, 6, false) 2.027 1.163 1.74 | 2.037 1.218 1.67
Sqrt::SqrtFixture::(640x480, 6, true) 3.961 1.153 3.44 | 3.961 1.341 2.95
Sqrt::SqrtFixture::(1280x720, 5, false) 2.916 1.538 1.90 | 2.912 1.598 1.82
Sqrt::SqrtFixture::(1280x720, 5, true) 5.735 1.534 3.74 | 5.726 1.661 3.45
Sqrt::SqrtFixture::(1280x720, 6, false) 6.121 3.585 1.71 | 6.109 3.725 1.64
Sqrt::SqrtFixture::(1280x720, 6, true) 12.059 3.501 3.44 | 12.053 4.080 2.95
Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540 3.535 1.85 | 6.540 3.643 1.80
Sqrt::SqrtFixture::(1920x1080, 5, true) 12.943 3.445 3.76 | 12.908 3.706 3.48
Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062 1.70 | 13.711 8.376 1.64
Sqrt::SqrtFixture::(1920x1080, 6, true) 27.011 7.989 3.38 | 27.115 9.245 2.93
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[Refactor](HAL RVV): Consolidate Helpers for Code Reusability #26977
This PR introduces a new helper file with utility types and templates to standardize function interfaces. This refactor allows us to avoid duplicate code when types differ but logic remains the same.
The `flip` and `minmax` implementations have been updated to use the new generic helpers, replacing the previously defined, redundant classes.
Due to the large number of functions, not all interfaces are unified yet. Future development can extend the types as needed. While the usage of function templates is currently limited, this will ease future development.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake