[HAL RVV] impl magnitude | add perf test #27002
Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces.
**UPDATE**: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`.
Perf test done on MUSE-PI.
```sh
$ opencv_test_core --gtest_filter="*Magnitude*"
$ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300
```
Test result between enabled UI and HAL:
```
Name of Test ui rvv rvv
vs
ui
(x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.029 0.016 1.75
Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.057 0.036 1.57
Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.063 0.648 1.64
Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.261 1.530 1.48
Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.261 2.118 1.54
Magnitude::MagnitudeFixture::(1280x720, 64FC1) 6.802 4.682 1.45
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287 4.738 1.54
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334 1.47
```
Test result before and after enabling UI:
```
Name of Test orig pr pr
vs
orig
(x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1) 0.032 0.029 1.11
Magnitude::MagnitudeFixture::(127x61, 64FC1) 0.067 0.057 1.17
Magnitude::MagnitudeFixture::(640x480, 32FC1) 1.228 1.063 1.16
Magnitude::MagnitudeFixture::(640x480, 64FC1) 2.786 2.261 1.23
Magnitude::MagnitudeFixture::(1280x720, 32FC1) 3.762 3.261 1.15
Magnitude::MagnitudeFixture::(1280x720, 64FC1) 8.549 6.802 1.26
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408 7.287 1.15
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226 1.24
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Make sure there are enough channels to check for opacity #27040
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Test for in-memory animation encoding and decoding #27013
Tests for https://github.com/opencv/opencv/pull/26964
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Find contours speedup #26834
It is an attempt, as suggested by #26775, to restore lost speed when migrating `findContours()` implementation from C to C++
The patch adds an "Arena" (a pool) of pre-allocated memory so that contours points (and TreeNodes) can be picked from the Arena.
The code of `findContours()` is mostly unchanged, the arena usage being implicit through a utility class Arena::Item that provides C++ overloaded operators and construct/destruct logic.
As mentioned in #26775, the contour points are allocated and released in order, and can be represented by ranges of indices in their arena. No range subset will be released and drill a hole, that's why the internal representation as a range of indices makes sense.
The TreeNodes use another Arena class that does not comply to that range logic.
Currently, there is a significant improvement of the run-time on the test mentioned in #26775, but it is still far from the `findContours_legacy()` performance.
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Added optional mask to cv::threshold #26842
Proposal for #26777
To avoid code duplication, and keep performance when no mask is used, inner implementation always propagate the const cv::Mat& mask, but they use a template<bool useMask> parameter that let the compiler optimize out unnecessary tests when the mask is not to be used.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
core: vectorize normDiff with universal intrinsics #27042
Merge with https://github.com/opencv/opencv_extra/pull/1242.
Performance results on Desktop Intel i7-12700K, Apple M2, Jetson Orin and SpaceMIT K1:
[perf-normDiff.zip](https://github.com/user-attachments/files/19178689/perf-normDiff.zip)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix Aruco marker incorrect detection near image edge #26968
### Pull Request Readiness Checklist
Fix#26922
As I understood the algorithm, at the first stage we search for the contours of the marker several times (adaptive threshold with different windows sizes). Therefore, for the same marker, we get several contours (inner and outer with different sizes due to the different windows sizes). In the second stage, we group the contours for the same marker into one group, from which we take the largest contour as the best candidate (which should best match the border of the marker).
The problem is that using the `minDistanceToBorder` parameter, we discard contours at the first stage. Thus, we discard the best candidates most appropriate to the marker border, and inner contours may remain, representing a significantly smaller marker border (which we observe in the issue).
But if we use the `minDistanceToBorder` parameter to discard the best candidate of the group at the second stage, then there will be no such problems and we will completely discard markers located too close to the border of the image.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
[HAL RVV] impl sqrt and invSqrt #27015
Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`.
Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.
In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead.
BTW, I think this patch can directly replace RVV's UI.
**UPDATE**: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended.
```sh
$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*"
$ opencv_perf_core --gtest_filter="SqrtFixture.*" --perf_min_samples=300 --perf_force_samples=300
```
CanMV K230:
```
Name of Test ui rvv rvv
vs
ui
(x-factor)
Sqrt::SqrtFixture::(127x61, 5, false) 0.052 0.027 1.96
Sqrt::SqrtFixture::(127x61, 5, true) 0.101 0.026 3.80
Sqrt::SqrtFixture::(127x61, 6, false) 0.106 0.059 1.79
Sqrt::SqrtFixture::(127x61, 6, true) 0.207 0.058 3.55
Sqrt::SqrtFixture::(640x480, 5, false) 1.988 0.956 2.08
Sqrt::SqrtFixture::(640x480, 5, true) 3.920 0.948 4.13
Sqrt::SqrtFixture::(640x480, 6, false) 4.179 2.342 1.78
Sqrt::SqrtFixture::(640x480, 6, true) 8.220 2.290 3.59
Sqrt::SqrtFixture::(1280x720, 5, false) 5.969 2.881 2.07
Sqrt::SqrtFixture::(1280x720, 5, true) 11.731 2.857 4.11
Sqrt::SqrtFixture::(1280x720, 6, false) 12.533 7.031 1.78
Sqrt::SqrtFixture::(1280x720, 6, true) 24.643 6.917 3.56
Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483 2.07
Sqrt::SqrtFixture::(1920x1080, 5, true) 26.379 6.436 4.10
Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833 1.78
Sqrt::SqrtFixture::(1920x1080, 6, true) 55.434 15.565 3.56
```
MUSE-PI:
```
GCC | clang
Name of Test ui rvv rvv | ui rvv rvv
vs | vs
ui | ui
(x-factor) | (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false) 0.027 0.018 1.46 | 0.027 0.016 1.65
Sqrt::SqrtFixture::(127x61, 5, true) 0.050 0.017 2.98 | 0.050 0.017 2.99
Sqrt::SqrtFixture::(127x61, 6, false) 0.053 0.031 1.72 | 0.052 0.032 1.64
Sqrt::SqrtFixture::(127x61, 6, true) 0.100 0.030 3.31 | 0.101 0.035 2.86
Sqrt::SqrtFixture::(640x480, 5, false) 0.955 0.483 1.98 | 0.959 0.499 1.92
Sqrt::SqrtFixture::(640x480, 5, true) 1.873 0.489 3.83 | 1.873 0.520 3.60
Sqrt::SqrtFixture::(640x480, 6, false) 2.027 1.163 1.74 | 2.037 1.218 1.67
Sqrt::SqrtFixture::(640x480, 6, true) 3.961 1.153 3.44 | 3.961 1.341 2.95
Sqrt::SqrtFixture::(1280x720, 5, false) 2.916 1.538 1.90 | 2.912 1.598 1.82
Sqrt::SqrtFixture::(1280x720, 5, true) 5.735 1.534 3.74 | 5.726 1.661 3.45
Sqrt::SqrtFixture::(1280x720, 6, false) 6.121 3.585 1.71 | 6.109 3.725 1.64
Sqrt::SqrtFixture::(1280x720, 6, true) 12.059 3.501 3.44 | 12.053 4.080 2.95
Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540 3.535 1.85 | 6.540 3.643 1.80
Sqrt::SqrtFixture::(1920x1080, 5, true) 12.943 3.445 3.76 | 12.908 3.706 3.48
Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062 1.70 | 13.711 8.376 1.64
Sqrt::SqrtFixture::(1920x1080, 6, true) 27.011 7.989 3.38 | 27.115 9.245 2.93
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Update tutorials #26441
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Threshold otsu doc update #27039
PR for #27038
(I had already done that, but encounters git madness after branch renaming)
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[Refactor](HAL RVV): Consolidate Helpers for Code Reusability #26977
This PR introduces a new helper file with utility types and templates to standardize function interfaces. This refactor allows us to avoid duplicate code when types differ but logic remains the same.
The `flip` and `minmax` implementations have been updated to use the new generic helpers, replacing the previously defined, redundant classes.
Due to the large number of functions, not all interfaces are unified yet. Future development can extend the types as needed. While the usage of function templates is currently limited, this will ease future development.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix assert failure in Sobel test when enable FastCV #27033
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::cvtColor #27007
This patch implements the following functions in RVV_HAL using native intrinsics, optimizing the performance of `cv::cvtColor` for all possible data types and modes (except for `COLOR_Bayer`, `COLOR_YUV2GRAY_420` and `COLOR_mRGBA`, as these modes have no HAL interface):
```
cv_hal_cvtBGRtoBGR
cv_hal_cvtBGRtoBGR5x5
cv_hal_cvtBGR5x5toBGR
cv_hal_cvtBGRtoGray
cv_hal_cvtGraytoBGR
cv_hal_cvtBGR5x5toGray
cv_hal_cvtGraytoBGR5x5
cv_hal_cvtBGRtoYUV
cv_hal_cvtYUVtoBGR
cv_hal_cvtBGRtoXYZ
cv_hal_cvtXYZtoBGR
cv_hal_cvtBGRtoHSV
cv_hal_cvtHSVtoBGR
cv_hal_cvtBGRtoLab
cv_hal_cvtLabtoBGR
cv_hal_cvtTwoPlaneYUVtoBGR
cv_hal_cvtBGRtoTwoPlaneYUV
cv_hal_cvtThreePlaneYUVtoBGR
cv_hal_cvtBGRtoThreePlaneYUV
cv_hal_cvtOnePlaneYUVtoBGR
cv_hal_cvtOnePlaneBGRtoYUV
```
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*Color*-*Bayer*"
$ ./opencv_perf_imgproc --gtest_filter="*Color*-*Bayer*" --gtest_also_run_disabled_tests --perf_min_samples=100 --perf_force_samples=100
```
View the full perf table here: [hal_rvv_color.pdf](https://github.com/user-attachments/files/19055417/hal_rvv_color.pdf)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Add RISC-V HAL implementation for cv::solve #26892
This patch implements `cv_hal_LU/cv_hal_Cholesky/cv_hal_SVD/cv_hal_QR` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::solve` with method `DECOMP_LU/DECOMP_SVD/DECOMP_CHOLESKY/DECOMP_QR` and data types `32FC1/64FC1`.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_core --gtest_filter="*Solve*:*SVD*:*Cholesky*"
$ ./opencv_perf_core --gtest_filter="*SolveTest*" --perf_min_samples=100 --perf_force_samples=100
```
The tail of the perf table is shown below since the table is too long.
View the full perf table here: [hal_rvv_solve.pdf](https://github.com/user-attachments/files/18725067/hal_rvv_solve.pdf)
<img width="1078" alt="Untitled" src="https://github.com/user-attachments/assets/c01d849c-f000-4bcc-bfe0-a302d6605d9e" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::dft and cv::dct #26865
This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`.
The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary.
Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30
```
The head of the perf table is shown below since the table is too long.
View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf)
<img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake