Add RISC-V HAL implementation for cv::dft and cv::dct #26865
This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`.
The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary.
Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30
```
The head of the perf table is shown below since the table is too long.
View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf)
<img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Impl hal_rvv LUT | Add more LUT test #26941
Implement through the existing `cv_hal_lut` interfaces.
Add more LUT accuracy and performance tests:
- **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust.
- **Performance test**: Multi-channel input and multi-channel table tests are added.
Perf test done on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)
```sh
$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300
```
```sh
Geometric mean (ms)
Name of Test scalar ui rvv ui rvv
vs vs
scalar scalar
(x-factor) (x-factor)
LUT::SizePrm::320x240 0.248 0.249 0.052 1.00 4.74
LUT::SizePrm::640x480 0.277 0.275 0.085 1.01 3.28
LUT::SizePrm::1920x1080 0.950 0.947 0.634 1.00 1.50
LUT_multi2::SizePrm::320x240 2.051 2.045 2.049 1.00 1.00
LUT_multi2::SizePrm::640x480 2.128 2.134 2.125 1.00 1.00
LUT_multi2::SizePrm::1920x1080 7.397 7.380 7.390 1.00 1.00
LUT_multi::SizePrm::320x240 0.715 0.747 0.154 0.96 4.64
LUT_multi::SizePrm::640x480 0.741 0.766 0.257 0.97 2.88
LUT_multi::SizePrm::1920x1080 2.766 2.765 1.925 1.00 1.44
```
This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:
- For the U8C1 table type, the optimization works when `vlen >= 256`
- For U16C1, it works when `vlen >= 512`
- For U32C1, it works when `vlen >= 1024`
Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.
This patch does not implement optimizations for multi-channel tables.
Previous attempts:
1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
3. Using `vluxei8` as a general solution does not show any performance improvement.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
APNG encoding optimization #26849
related #26840
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix issues in RISC-V Vector (RVV) Universal Intrinsic #27006
This PR aims to make `opencv_test_core` pass on RVV, via following two parts:
1. Fix bug in Universal Intrinsic when VLEN >= 512:
- `max_nlanes` should be multiplied by 2, because we use LMUL=2 in RVV Universal Intrinsic since #26318.
- Related tests are also expanded to match longer registers
- Relax the precision threshold of `v_erf` to make the tests pass
2. Temporary fix #26936
- Disable 3 Universal Intrinsic code blocks on GCC
- This is just a temporary fix until we figure out if it's our issue or GCC/something else's
This patch is tested under the following conditions:
- Compier: GCC 14.2, Clang 19.1.7
- Device: Muse-Pi (VLEN=256), QEMU (VLEN=512, 1024)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::pyrDown and cv::pyrUp #26958
This patch implements `cv_hal_pyrdown/cv_hal_pyrup` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::pyrDown`, `cv::pyrUp` and `cv::buildPyramids` with data types `{8U,16S,32F} x {C1,C2,C3,C4,Cn}`.
Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
```
$ ./opencv_test_imgproc --gtest_filter="*pyr*:*Pyr*"
$ ./opencv_perf_imgproc --gtest_filter="*pyr*:*Pyr*" --perf_min_samples=300 --perf_force_samples=300
```
<img width="1112" alt="Untitled" src="https://github.com/user-attachments/assets/235a9fba-0d29-434e-8a10-498212bac657" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Optimize undistort points #26988
Skips unnecessary rotation with identity matrix if no R or P mats are given.
---------
Co-authored-by: Daniel <daniel@mail.de>
Fix Logical defect in FilterSpecklesImpl #26996
Fixes : #24963
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix getPerspectiveTransform for singular case #26926
### Pull Request Readiness Checklist
Fix#26916
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Some minor fixes#26992
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Imgcodecs: gif: support Disposal Method #26930
Close https://github.com/opencv/opencv/issues/26924
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
videoio: print test params instead of indexes #26948
_videoio_ test names changed - use string instead of index.
E.g. `videoio_read.threads/0` is now `videoio_read.threads/h264_0_RAW`.
It allows to filter tests independently of the platform.
**Notes:**
- not all tests has been updated - only simpler ones and those which have varying parameters depending on platform
Add a test related IMWRITE_PNG_COMPRESSION parameter #26973
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Impl RISC-V HAL for cv::flip | Add perf test for flip #26943
Implement through the existing `cv_hal_flip` interfaces.
Add perf test for `cv::flip`.
The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
- U8C1: basic situation
- U8C3: unaligned element size
- U8C4: large element size
Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)
```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```
```
Geometric mean (ms)
Name of Test scalar ui rvv ui rvv
vs vs
scalar scalar
(x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 0.026 0.033 0.031 0.81 0.84
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 0.206 0.212 0.091 0.97 2.26
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 0.185 0.189 0.082 0.98 2.25
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 0.070 0.084 0.084 0.83 0.83
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 0.616 0.612 0.235 1.01 2.62
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 0.587 0.603 0.204 0.97 2.88
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 0.263 0.110 0.109 2.40 2.41
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 0.930 0.831 0.316 1.12 2.95
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 1.175 1.129 0.313 1.04 3.75
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 0.303 0.118 0.111 2.57 2.73
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 0.949 0.836 0.405 1.14 2.34
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 0.784 0.783 0.409 1.00 1.92
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 1.084 0.360 0.355 3.01 3.06
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 3.768 3.348 1.364 1.13 2.76
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 4.361 4.473 1.296 0.97 3.37
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 1.252 0.469 0.451 2.67 2.78
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 5.732 5.220 1.303 1.10 4.40
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 5.041 5.105 1.203 0.99 4.19
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 2.382 0.903 0.903 2.64 2.64
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606 7.508 2.581 1.15 3.33
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 8.421 8.535 2.219 0.99 3.80
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 6.312 2.416 2.429 2.61 2.60
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761 1.12 2.29
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 25.373 25.500 13.382 1.00 1.90
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 7.620 3.204 3.115 2.38 2.45
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976 1.12 2.53
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 28.831 29.094 14.919 0.99 1.93
```
The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Enable SIMD_SCALABLE for exp and sqrt #26886
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
```
CPU - Banana Pi k1, compiler - clang 18.1.4
```
```
Geometric mean (ms)
Name of Test baseline hal ui hal ui
vs vs
baseline baseline
(x-factor) (x-factor)
Exp::ExpFixture::(127x61, 32FC1) 0.358 -- 0.033 -- 10.70
Exp::ExpFixture::(640x480, 32FC1) 14.304 -- 1.167 -- 12.26
Exp::ExpFixture::(1280x720, 32FC1) 42.785 -- 3.538 -- 12.09
Exp::ExpFixture::(1920x1080, 32FC1) 96.206 -- 7.927 -- 12.14
Exp::ExpFixture::(127x61, 64FC1) 0.433 0.050 0.098 8.59 4.40
Exp::ExpFixture::(640x480, 64FC1) 17.315 1.935 3.813 8.95 4.54
Exp::ExpFixture::(1280x720, 64FC1) 52.181 5.877 11.519 8.88 4.53
Exp::ExpFixture::(1920x1080, 64FC1) 117.082 13.157 25.854 8.90 4.53
```
Additionally, this PR brings Sqrt optimization with UI:
```
Geometric mean (ms)
Name of Test baseline ui ui
vs
baseline
(x-factor)
Sqrt::SqrtFixture::(127x61, 5, false) 0.111 0.027 4.11
Sqrt::SqrtFixture::(127x61, 6, false) 0.149 0.053 2.82
Sqrt::SqrtFixture::(640x480, 5, false) 4.374 0.967 4.52
Sqrt::SqrtFixture::(640x480, 6, false) 5.885 2.046 2.88
Sqrt::SqrtFixture::(1280x720, 5, false) 12.960 2.915 4.45
Sqrt::SqrtFixture::(1280x720, 6, false) 17.648 6.107 2.89
Sqrt::SqrtFixture::(1920x1080, 5, false) 29.178 6.524 4.47
Sqrt::SqrtFixture::(1920x1080, 6, false) 39.709 13.670 2.90
```
Reference
Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006.
https://www.springer.com/gp/book/9780817643720
core: vectorize cv::normalize / cv::norm #26885
Checklist:
| | normInf | normL1 | normL2 |
| ---- | ------- | ------ | ------ |
| bool | - | - | - |
| 8u | √ | √ | √ |
| 8s | √ | √ | √ |
| 16u | √ | √ | √ |
| 16s | √ | √ | √ |
| 16f | - | - | - |
| 16bf | - | - | - |
| 32u | - | - | - |
| 32s | √ | √ | √ |
| 32f | √ | √ | √ |
| 64u | - | - | - |
| 64s | - | - | - |
| 64f | √ | √ | √ |
*: Vectorization of data type bool, 16f, 16bf, 32u, 64u and 64s needs to be done on 5.x.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Added trackers factory with pre-loaded dnn models #26875
Replaces https://github.com/opencv/opencv/pull/26295
Allows to substitute custom models or initialize tracker from in-memory model.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Migrate remaning OpenVX integrations to OpenVX HAL (core) #26903
Tested with OpenVX 1.2 & 1.3 sample implementation.
Steps to build and test:
```
git clone git@github.com:KhronosGroup/OpenVX-sample-impl.git
cd OpenVX-sample-impl
python3 Build.py --os=Linux --conf=Release
cd ..
mkdir build
cmake -DWITH_OPENVX=ON -DOPENVX_ROOT=/mnt/Projects/Projects/OpenVX-sample-impl/install/Linux/x64/Release/ ../opencv
make -j8
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake