Commit Graph

19 Commits

Author SHA1 Message Date
GenshinImpactStarts
60de3ff24f
Merge pull request #27015 from GenshinImpactStarts:sqrt
[HAL RVV] impl sqrt and invSqrt #27015

Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`.

Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.

In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead. 

BTW, I think this patch can directly replace RVV's UI.

**UPDATE**: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended.

```sh
$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*"
$ opencv_perf_core --gtest_filter="SqrtFixture.*" --perf_min_samples=300 --perf_force_samples=300
```

CanMV K230:
```
              Name of Test                 ui    rvv      rvv    
                                                           vs    
                                                           ui    
                                                       (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.052  0.027     1.96   
Sqrt::SqrtFixture::(127x61, 5, true)     0.101  0.026     3.80   
Sqrt::SqrtFixture::(127x61, 6, false)    0.106  0.059     1.79   
Sqrt::SqrtFixture::(127x61, 6, true)     0.207  0.058     3.55   
Sqrt::SqrtFixture::(640x480, 5, false)   1.988  0.956     2.08   
Sqrt::SqrtFixture::(640x480, 5, true)    3.920  0.948     4.13   
Sqrt::SqrtFixture::(640x480, 6, false)   4.179  2.342     1.78   
Sqrt::SqrtFixture::(640x480, 6, true)    8.220  2.290     3.59   
Sqrt::SqrtFixture::(1280x720, 5, false)  5.969  2.881     2.07   
Sqrt::SqrtFixture::(1280x720, 5, true)   11.731 2.857     4.11   
Sqrt::SqrtFixture::(1280x720, 6, false)  12.533 7.031     1.78   
Sqrt::SqrtFixture::(1280x720, 6, true)   24.643 6.917     3.56   
Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483     2.07   
Sqrt::SqrtFixture::(1920x1080, 5, true)  26.379 6.436     4.10   
Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833    1.78   
Sqrt::SqrtFixture::(1920x1080, 6, true)  55.434 15.565    3.56   
```

MUSE-PI:
```
                                                 GCC              |        clang            
              Name of Test                 ui    rvv      rvv     |   ui    rvv      rvv    
                                                           vs     |                   vs    
                                                           ui     |                   ui    
                                                       (x-factor) |               (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.027  0.018     1.46    | 0.027  0.016     1.65   
Sqrt::SqrtFixture::(127x61, 5, true)     0.050  0.017     2.98    | 0.050  0.017     2.99   
Sqrt::SqrtFixture::(127x61, 6, false)    0.053  0.031     1.72    | 0.052  0.032     1.64   
Sqrt::SqrtFixture::(127x61, 6, true)     0.100  0.030     3.31    | 0.101  0.035     2.86   
Sqrt::SqrtFixture::(640x480, 5, false)   0.955  0.483     1.98    | 0.959  0.499     1.92   
Sqrt::SqrtFixture::(640x480, 5, true)    1.873  0.489     3.83    | 1.873  0.520     3.60   
Sqrt::SqrtFixture::(640x480, 6, false)   2.027  1.163     1.74    | 2.037  1.218     1.67   
Sqrt::SqrtFixture::(640x480, 6, true)    3.961  1.153     3.44    | 3.961  1.341     2.95   
Sqrt::SqrtFixture::(1280x720, 5, false)  2.916  1.538     1.90    | 2.912  1.598     1.82   
Sqrt::SqrtFixture::(1280x720, 5, true)   5.735  1.534     3.74    | 5.726  1.661     3.45   
Sqrt::SqrtFixture::(1280x720, 6, false)  6.121  3.585     1.71    | 6.109  3.725     1.64   
Sqrt::SqrtFixture::(1280x720, 6, true)   12.059 3.501     3.44    | 12.053 4.080     2.95   
Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540  3.535     1.85    | 6.540  3.643     1.80   
Sqrt::SqrtFixture::(1920x1080, 5, true)  12.943 3.445     3.76    | 12.908 3.706     3.48   
Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062     1.70    | 13.711 8.376     1.64   
Sqrt::SqrtFixture::(1920x1080, 6, true)  27.011 7.989     3.38    | 27.115 9.245     2.93   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 08:34:27 +03:00
Alexander Smorkalov
a48e78cdfc
Merge pull request #27026 from amane-ame/filter_hal_rvv
Add RISC-V HAL implementation for cv::filter series
2025-03-11 16:09:45 +03:00
GenshinImpactStarts
524d8ae01c impl exp and log | add log perf test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 17:11:26 +00:00
天音あめ
e89e2fd7ea
Merge pull request #27007 from amane-ame:color_hal_rvv
Add RISC-V HAL implementation for cv::cvtColor #27007

This patch implements the following functions in RVV_HAL using native intrinsics, optimizing the performance of `cv::cvtColor` for all possible data types and modes (except for `COLOR_Bayer`, `COLOR_YUV2GRAY_420` and `COLOR_mRGBA`, as these modes have no HAL interface):

```
cv_hal_cvtBGRtoBGR
cv_hal_cvtBGRtoBGR5x5
cv_hal_cvtBGR5x5toBGR
cv_hal_cvtBGRtoGray
cv_hal_cvtGraytoBGR
cv_hal_cvtBGR5x5toGray
cv_hal_cvtGraytoBGR5x5
cv_hal_cvtBGRtoYUV
cv_hal_cvtYUVtoBGR
cv_hal_cvtBGRtoXYZ
cv_hal_cvtXYZtoBGR
cv_hal_cvtBGRtoHSV
cv_hal_cvtHSVtoBGR
cv_hal_cvtBGRtoLab
cv_hal_cvtLabtoBGR
cv_hal_cvtTwoPlaneYUVtoBGR
cv_hal_cvtBGRtoTwoPlaneYUV
cv_hal_cvtThreePlaneYUVtoBGR
cv_hal_cvtBGRtoThreePlaneYUV
cv_hal_cvtOnePlaneYUVtoBGR
cv_hal_cvtOnePlaneBGRtoYUV
```

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Color*-*Bayer*"
$ ./opencv_perf_imgproc --gtest_filter="*Color*-*Bayer*" --gtest_also_run_disabled_tests --perf_min_samples=100 --perf_force_samples=100
```

View the full perf table here: [hal_rvv_color.pdf](https://github.com/user-attachments/files/19055417/hal_rvv_color.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
2025-03-07 11:24:48 +03:00
天音あめ
00956d5c15
Merge pull request #26892 from amane-ame:solve_hal_rvv
Add RISC-V HAL implementation for cv::solve #26892

This patch implements `cv_hal_LU/cv_hal_Cholesky/cv_hal_SVD/cv_hal_QR` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::solve` with method `DECOMP_LU/DECOMP_SVD/DECOMP_CHOLESKY/DECOMP_QR` and data types `32FC1/64FC1`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_core --gtest_filter="*Solve*:*SVD*:*Cholesky*"
$ ./opencv_perf_core --gtest_filter="*SolveTest*" --perf_min_samples=100 --perf_force_samples=100
```

The tail of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_solve.pdf](https://github.com/user-attachments/files/18725067/hal_rvv_solve.pdf)

<img width="1078" alt="Untitled" src="https://github.com/user-attachments/assets/c01d849c-f000-4bcc-bfe0-a302d6605d9e" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:14:09 +03:00
天音あめ
bb525fe91d
Merge pull request #26865 from amane-ame:dxt_hal_rvv
Add RISC-V HAL implementation for cv::dft and cv::dct #26865

This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`.

The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary.

Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf)

<img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:08:41 +03:00
GenshinImpactStarts
57a78cb9df
Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv
Impl hal_rvv LUT | Add more LUT test #26941 

Implement through the existing `cv_hal_lut` interfaces.

Add more LUT accuracy and performance tests:
- **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust.
- **Performance test**: Multi-channel input and multi-channel table tests are added.

Perf test done on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)


```sh

$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300
```
```sh
Geometric mean (ms)

         Name of Test          scalar   ui    rvv       ui        rvv    
                                                        vs         vs    
                                                      scalar     scalar  
                                                    (x-factor) (x-factor)
LUT::SizePrm::320x240          0.248  0.249  0.052     1.00       4.74   
LUT::SizePrm::640x480          0.277  0.275  0.085     1.01       3.28   
LUT::SizePrm::1920x1080        0.950  0.947  0.634     1.00       1.50   
LUT_multi2::SizePrm::320x240   2.051  2.045  2.049     1.00       1.00   
LUT_multi2::SizePrm::640x480   2.128  2.134  2.125     1.00       1.00   
LUT_multi2::SizePrm::1920x1080 7.397  7.380  7.390     1.00       1.00   
LUT_multi::SizePrm::320x240    0.715  0.747  0.154     0.96       4.64   
LUT_multi::SizePrm::640x480    0.741  0.766  0.257     0.97       2.88   
LUT_multi::SizePrm::1920x1080  2.766  2.765  1.925     1.00       1.44  
```

This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:  
- For the U8C1 table type, the optimization works when `vlen >= 256`
- For U16C1, it works when `vlen >= 512`
- For U32C1, it works when `vlen >= 1024`

Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

This patch does not implement optimizations for multi-channel tables.

Previous attempts:
1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
3. Using `vluxei8` as a general solution does not show any performance improvement.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-06 11:17:00 +03:00
amane-ame
83104bed32 Add Filter2D.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-06 14:10:06 +08:00
天音あめ
cbcfd772ce
Merge pull request #26958 from amane-ame:pyramids_hal_rvv
Add RISC-V HAL implementation for cv::pyrDown and cv::pyrUp #26958

This patch implements `cv_hal_pyrdown/cv_hal_pyrup` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::pyrDown`, `cv::pyrUp` and `cv::buildPyramids` with data types `{8U,16S,32F} x {C1,C2,C3,C4,Cn}`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*pyr*:*Pyr*"
$ ./opencv_perf_imgproc --gtest_filter="*pyr*:*Pyr*" --perf_min_samples=300 --perf_force_samples=300
```

<img width="1112" alt="Untitled" src="https://github.com/user-attachments/assets/235a9fba-0d29-434e-8a10-498212bac657" />


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-04 15:41:15 +03:00
GenshinImpactStarts
33d632f85e impl hal_rvv norm_hamming
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-02-25 02:31:02 +00:00
GenshinImpactStarts
6a6a5a765d
Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv
Impl RISC-V HAL for cv::flip | Add perf test for flip #26943 

Implement through the existing `cv_hal_flip` interfaces.

Add perf test for `cv::flip`.

The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
    - U8C1: basic situation
    - U8C3: unaligned element size
    - U8C4: large element size

Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)

```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```

```
Geometric mean (ms)

                     Name of Test                       scalar   ui    rvv       ui        rvv    
                                                                                 vs         vs    
                                                                               scalar     scalar  
                                                                             (x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)    0.026  0.033  0.031     0.81       0.84   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)   0.206  0.212  0.091     0.97       2.26   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)    0.185  0.189  0.082     0.98       2.25   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)    0.070  0.084  0.084     0.83       0.83   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)   0.616  0.612  0.235     1.01       2.62   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)    0.587  0.603  0.204     0.97       2.88   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)    0.263  0.110  0.109     2.40       2.41   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)   0.930  0.831  0.316     1.12       2.95   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)    1.175  1.129  0.313     1.04       3.75   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)    0.303  0.118  0.111     2.57       2.73   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)   0.949  0.836  0.405     1.14       2.34   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)    0.784  0.783  0.409     1.00       1.92   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)    1.084  0.360  0.355     3.01       3.06   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)   3.768  3.348  1.364     1.13       2.76   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)    4.361  4.473  1.296     0.97       3.37   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)    1.252  0.469  0.451     2.67       2.78   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)   5.732  5.220  1.303     1.10       4.40   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)    5.041  5.105  1.203     0.99       4.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)  2.382  0.903  0.903     2.64       2.64   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606  7.508  2.581     1.15       3.33   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)  8.421  8.535  2.219     0.99       3.80   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)  6.312  2.416  2.429     2.61       2.60   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761    1.12       2.29   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)  25.373 25.500 13.382    1.00       1.90   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)  7.620  3.204  3.115     2.38       2.45   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976    1.12       2.53   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)  28.831 29.094 14.919    0.99       1.93   
```

The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-24 08:56:23 +03:00
lve-gh
d8c2f0bcdf
Merge pull request #26884 from lve-gh:split8u_rvv_hal
[HAL] split8u RVV 1.0 #26884

### Pull Request Readiness Checklist
* Banana Pi BF3 (SpacemiT K1)
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)
```
Geometric mean (ms)

                  Name of Test                   baseline  hal      hal
                                                    ui               vs
                                                                  baseline 
                                                                     ui
                                                                 (x-factor)
split::Size_Depth_Channels::(127x61, 8UC1, 2)     0.012   0.004     3.12   
split::Size_Depth_Channels::(127x61, 8UC1, 3)     0.019   0.006     2.91   
split::Size_Depth_Channels::(127x61, 8UC1, 4)     0.028   0.011     2.64   
split::Size_Depth_Channels::(127x61, 8UC1, 5)     0.067   0.033     2.02   
split::Size_Depth_Channels::(127x61, 8UC1, 6)     0.084   0.040     2.11   
split::Size_Depth_Channels::(127x61, 8UC1, 7)     0.103   0.055     1.88   
split::Size_Depth_Channels::(127x61, 8UC1, 8)     0.113   0.032     3.50   
split::Size_Depth_Channels::(640x480, 8UC1, 2)    0.454   0.179     2.54   
split::Size_Depth_Channels::(640x480, 8UC1, 3)    0.677   0.298     2.27   
split::Size_Depth_Channels::(640x480, 8UC1, 4)    0.901   0.410     2.20   
split::Size_Depth_Channels::(640x480, 8UC1, 5)    3.781   3.010     1.26   
split::Size_Depth_Channels::(640x480, 8UC1, 6)    4.886   4.009     1.22   
split::Size_Depth_Channels::(640x480, 8UC1, 7)    5.777   4.770     1.21   
split::Size_Depth_Channels::(640x480, 8UC1, 8)    4.596   1.330     3.46   
split::Size_Depth_Channels::(1280x720, 8UC1, 2)   1.377   0.709     1.94   
split::Size_Depth_Channels::(1280x720, 8UC1, 3)   2.091   1.034     2.02   
split::Size_Depth_Channels::(1280x720, 8UC1, 4)   2.744   1.573     1.74   
split::Size_Depth_Channels::(1280x720, 8UC1, 5)   9.542   6.284     1.52   
split::Size_Depth_Channels::(1280x720, 8UC1, 6)   11.114  7.850     1.42   
split::Size_Depth_Channels::(1280x720, 8UC1, 7)   14.083  11.879    1.19   
split::Size_Depth_Channels::(1280x720, 8UC1, 8)   13.524  3.865     3.50   
split::Size_Depth_Channels::(1920x1080, 8UC1, 2)  3.108   1.395     2.23   
split::Size_Depth_Channels::(1920x1080, 8UC1, 3)  4.659   2.128     2.19   
split::Size_Depth_Channels::(1920x1080, 8UC1, 4)  6.127   2.818     2.17   
split::Size_Depth_Channels::(1920x1080, 8UC1, 5)  26.733  16.625    1.61   
split::Size_Depth_Channels::(1920x1080, 8UC1, 6)  31.242  22.414    1.39   
split::Size_Depth_Channels::(1920x1080, 8UC1, 7)  35.968  27.658    1.30   
split::Size_Depth_Channels::(1920x1080, 8UC1, 8)  29.997  8.655     3.47
```
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-11 17:57:05 +03:00
天音あめ
2e909c38dc
Merge pull request #26804 from amane-ame:norm_hal_rvv
Add RISC-V HAL implementation for cv::norm and cv::normalize #26804

This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`.

`cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*Norm*"
$ opencv_perf_core --gtest_filter="*norm*" --perf_min_samples=300 --perf_force_samples=300
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf)

<img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-06 19:34:54 +03:00
天音あめ
13b2caffe0
Merge pull request #26789 from amane-ame:minmax_hal_rvv
Add RISC-V HAL implementation for minMaxIdx #26789

On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV).

1d701d1690/modules/core/src/minmax.cpp (L209-L214)

This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*MinMaxLoc*"
$ opencv_perf_core --gtest_filter="*minMaxLoc*"
```
<img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-31 14:26:49 +03:00
Horror Proton
86241653a7 Add RISC-V HAL implementation for cv::phase 2025-01-29 12:07:59 +08:00
Liutong HAN
3fbaad36d7
Merge pull request #26624 from hanliutong:rvv-mean
Add RISC-V HAL implementation for meanStdDev #26624

`meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`.

This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`.

This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.
```
$ opencv_test_core --gtest_filter="*MeanStdDev*"
$ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev*
```

![1734077611879](https://github.com/user-attachments/assets/71c85c9d-1db1-470d-81d1-bf546e27ad86)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-18 22:19:02 +03:00
Liutong HAN
8a36f119ce Add the HAL implementation for the merge function on RISC-V Vector 2024-09-29 13:39:53 +00:00
Maxim Milashchenko
786726719f
Merge pull request #25793 from MaximMilashchenko:hal_rvv
Fixed build error hal_rvv_071 #25793

Fixed bug with enabling vector header when vector extension is disabled (RVV=OFF) in hal_rvv_071
2024-06-28 09:00:16 +03:00
Maxim Milashchenko
adcb070396
Merge pull request #25307 from MaximMilashchenko:halrvv071
* added hal for cv_hal_cvtBGRtoBGR rvv 0.7.1
2024-06-06 15:31:59 +03:00