Impl RISC-V HAL for cv::flip | Add perf test for flip #26943
Implement through the existing `cv_hal_flip` interfaces.
Add perf test for `cv::flip`.
The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
- U8C1: basic situation
- U8C3: unaligned element size
- U8C4: large element size
Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)
```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```
```
Geometric mean (ms)
Name of Test scalar ui rvv ui rvv
vs vs
scalar scalar
(x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 0.026 0.033 0.031 0.81 0.84
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 0.206 0.212 0.091 0.97 2.26
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 0.185 0.189 0.082 0.98 2.25
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 0.070 0.084 0.084 0.83 0.83
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 0.616 0.612 0.235 1.01 2.62
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 0.587 0.603 0.204 0.97 2.88
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 0.263 0.110 0.109 2.40 2.41
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 0.930 0.831 0.316 1.12 2.95
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 1.175 1.129 0.313 1.04 3.75
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 0.303 0.118 0.111 2.57 2.73
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 0.949 0.836 0.405 1.14 2.34
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 0.784 0.783 0.409 1.00 1.92
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 1.084 0.360 0.355 3.01 3.06
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 3.768 3.348 1.364 1.13 2.76
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 4.361 4.473 1.296 0.97 3.37
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 1.252 0.469 0.451 2.67 2.78
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 5.732 5.220 1.303 1.10 4.40
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 5.041 5.105 1.203 0.99 4.19
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 2.382 0.903 0.903 2.64 2.64
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606 7.508 2.581 1.15 3.33
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 8.421 8.535 2.219 0.99 3.80
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 6.312 2.416 2.429 2.61 2.60
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761 1.12 2.29
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 25.373 25.500 13.382 1.00 1.90
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 7.620 3.204 3.115 2.38 2.45
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976 1.12 2.53
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 28.831 29.094 14.919 0.99 1.93
```
The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[HAL] split8u RVV 1.0 #26884
### Pull Request Readiness Checklist
* Banana Pi BF3 (SpacemiT K1)
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)
```
Geometric mean (ms)
Name of Test baseline hal hal
ui vs
baseline
ui
(x-factor)
split::Size_Depth_Channels::(127x61, 8UC1, 2) 0.012 0.004 3.12
split::Size_Depth_Channels::(127x61, 8UC1, 3) 0.019 0.006 2.91
split::Size_Depth_Channels::(127x61, 8UC1, 4) 0.028 0.011 2.64
split::Size_Depth_Channels::(127x61, 8UC1, 5) 0.067 0.033 2.02
split::Size_Depth_Channels::(127x61, 8UC1, 6) 0.084 0.040 2.11
split::Size_Depth_Channels::(127x61, 8UC1, 7) 0.103 0.055 1.88
split::Size_Depth_Channels::(127x61, 8UC1, 8) 0.113 0.032 3.50
split::Size_Depth_Channels::(640x480, 8UC1, 2) 0.454 0.179 2.54
split::Size_Depth_Channels::(640x480, 8UC1, 3) 0.677 0.298 2.27
split::Size_Depth_Channels::(640x480, 8UC1, 4) 0.901 0.410 2.20
split::Size_Depth_Channels::(640x480, 8UC1, 5) 3.781 3.010 1.26
split::Size_Depth_Channels::(640x480, 8UC1, 6) 4.886 4.009 1.22
split::Size_Depth_Channels::(640x480, 8UC1, 7) 5.777 4.770 1.21
split::Size_Depth_Channels::(640x480, 8UC1, 8) 4.596 1.330 3.46
split::Size_Depth_Channels::(1280x720, 8UC1, 2) 1.377 0.709 1.94
split::Size_Depth_Channels::(1280x720, 8UC1, 3) 2.091 1.034 2.02
split::Size_Depth_Channels::(1280x720, 8UC1, 4) 2.744 1.573 1.74
split::Size_Depth_Channels::(1280x720, 8UC1, 5) 9.542 6.284 1.52
split::Size_Depth_Channels::(1280x720, 8UC1, 6) 11.114 7.850 1.42
split::Size_Depth_Channels::(1280x720, 8UC1, 7) 14.083 11.879 1.19
split::Size_Depth_Channels::(1280x720, 8UC1, 8) 13.524 3.865 3.50
split::Size_Depth_Channels::(1920x1080, 8UC1, 2) 3.108 1.395 2.23
split::Size_Depth_Channels::(1920x1080, 8UC1, 3) 4.659 2.128 2.19
split::Size_Depth_Channels::(1920x1080, 8UC1, 4) 6.127 2.818 2.17
split::Size_Depth_Channels::(1920x1080, 8UC1, 5) 26.733 16.625 1.61
split::Size_Depth_Channels::(1920x1080, 8UC1, 6) 31.242 22.414 1.39
split::Size_Depth_Channels::(1920x1080, 8UC1, 7) 35.968 27.658 1.30
split::Size_Depth_Channels::(1920x1080, 8UC1, 8) 29.997 8.655 3.47
```
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for cv::norm and cv::normalize #26804
This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`.
`cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this.
Tested on MUSE-PI for both gcc 14.2 and clang 20.0.
```
$ opencv_test_core --gtest_filter="*Norm*"
$ opencv_perf_core --gtest_filter="*norm*" --perf_min_samples=300 --perf_force_samples=300
```
The head of the perf table is shown below since the table is too long.
View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf)
<img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for minMaxIdx #26789
On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV).
1d701d1690/modules/core/src/minmax.cpp (L209-L214)
This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel.
Tested on MUSE-PI for both gcc 14.2 and clang 20.0.
```
$ opencv_test_core --gtest_filter="*MinMaxLoc*"
$ opencv_perf_core --gtest_filter="*minMaxLoc*"
```
<img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" />
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add RISC-V HAL implementation for meanStdDev #26624
`meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`.
This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`.
This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.
```
$ opencv_test_core --gtest_filter="*MeanStdDev*"
$ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev*
```

### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake