Commit Graph

903 Commits

Author SHA1 Message Date
Alexander Smorkalov
50072f8d4f
Merge pull request #27089 from amane-ame:hist_hal_rvv
Add RISC-V HAL implementation for cv::equalizeHist
2025-03-20 12:21:00 +03:00
天音あめ
46fbe1895a
Merge pull request #27096 from amane-ame:moments_hal_rvv
Add RISC-V HAL implementation for cv::moments #27096

This patch implements `cv_hal_imageMoments` using native intrinsics, optimizing the performance of `cv::moments` for data types `CV_16U/CV_16S/CV_32F/CV_64F`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Moments*"
$ ./opencv_perf_imgproc --gtest_filter="*Moments*" --perf_min_samples=1000 --perf_force_samples=1000
```

![image](https://github.com/user-attachments/assets/0efbae10-c022-4f15-a81c-682514cdb372)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-20 10:50:06 +03:00
amane-ame
b902a8e792 Add equalize_hist.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-18 15:53:05 +08:00
Yuantao Feng
8207549638
Merge pull request #26991 from fengyuentau:4x/core/norm2hal_rvv
core: improve norm of hal rvv #26991

Merge with https://github.com/opencv/opencv_extra/pull/1241

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-18 09:42:55 +03:00
天音あめ
0142231e4d
Merge pull request #27072 from amane-ame:thresh_hal_rvv
Add RISC-V HAL implementation for cv::threshold and cv::adaptiveThreshold #27072

This patch implements `cv_hal_threshold_otsu` and `cv_hal_adaptiveThreshold` using native intrinsics, optimizing the performance of `cv::threshold(THRESH_OTSU)` and `cv::adaptiveThreshold`.

Since UI is as fast as HAL `cv_hal_rvv::threshold::threshold` so `cv_hal_threshold` is not redirected, but this part of HAL is keeped because `cv_hal_threshold_otsu` depends on it.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*thresh*:*Thresh*"
$ ./opencv_perf_imgproc --gtest_filter="*otsu*:*adaptiveThreshold*" --perf_min_samples=1000 --perf_force_samples=1000
```

![image](https://github.com/user-attachments/assets/4bb953f8-8589-4af1-8f1c-99e2c506be3c)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-18 09:24:00 +03:00
GenshinImpactStarts
2090407002
Merge pull request #26999 from GenshinImpactStarts:polar_to_cart
[HAL RVV] unify and impl polar_to_cart | add perf test #26999

### Summary

1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces.
2. Add `polarToCart` performance tests
3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar`
4. To achieve the 3rd point, the original implementation was moved, and some modifications were made.

Tested through:
```sh
opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" 
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300
```

### HAL performance test

***UPDATE***: Current implementation is no more depending on vlen.

**NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`.

Vlen 256 (Muse Pi):
```
                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44   
```

### Effect of Point 4

To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`).

#### Reason for Changes:

This function works as follows:  
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. 

However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`.

Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.

***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

#### Test Result

`scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

`scalar` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.333   0.294     1.13   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.385   0.403     0.96   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   14.749  12.343     1.19   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   19.419  16.743     1.16   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  44.155  37.822     1.17   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  62.108  50.358     1.23   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011  85.769     1.15   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874    1.13   
```

`ui` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72   
```

AVX2 test:
```
                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51   
```

A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-17 14:16:09 +03:00
Alexander Smorkalov
0a39f98bee
Merge pull request #27067 from amane-ame:sepfilter_optimize
Optimize RISC-V HAL cv::sepFilter
2025-03-17 09:21:33 +03:00
Liutong HAN
6eaaaa410e
Merge pull request #27056 from hanliutong:rvv-hal-copyright
[RVV HAL] Add copyright and replace '#pragma once'. #27056

Add copyright and in RVV HAL, since other companies or teams may join the development and add their copyright.

And the '#pragma once' are replaced.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-15 17:25:31 +03:00
amane-ame
2c16f3b7d2 Optimize cv::sepFilter.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-14 18:33:57 +08:00
GenshinImpactStarts
2a8d4b8e43
Merge pull request #27000 from GenshinImpactStarts:cart_to_polar
[HAL RVV] reuse atan | impl cart_to_polar | add perf test #27000

Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces.

Add `cartToPolar` performance tests.

cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar.

**UPDATE**: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears.

Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

```sh
$ opencv_test_core --gtest_filter="*CartToPolar*:*Core_CartPolar_reverse*:*Phase*" 
$ opencv_perf_core --gtest_filter="*CartToPolar*:*phase*" --perf_min_samples=300 --perf_force_samples=300
```

Test result between enabled UI and HAL:
```
                   Name of Test                       ui    rvv      rvv    
                                                                      vs    
                                                                      ui    
                                                                  (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.106  0.059     1.80   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.155  0.070     2.20   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   4.188  2.317     1.81   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   6.593  2.889     2.28   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  12.600 7.057     1.79   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  19.860 8.797     2.26   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809    1.79   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398    2.30   
phase32f::VectorLength::128                         0.002  0.002     1.20   
phase32f::VectorLength::1000                        0.008  0.006     1.32   
phase32f::VectorLength::131072                      1.061  0.731     1.45   
phase32f::VectorLength::524288                      3.997  2.976     1.34   
phase32f::VectorLength::1048576                     8.001  5.959     1.34   
phase64f::VectorLength::128                         0.002  0.002     1.33   
phase64f::VectorLength::1000                        0.012  0.008     1.58   
phase64f::VectorLength::131072                      1.648  0.931     1.77   
phase64f::VectorLength::524288                      6.836  3.837     1.78   
phase64f::VectorLength::1048576                     14.060 7.540     1.86   
```

Test result before and after enabling UI on RVV:
```
                   Name of Test                      perf   perf     perf   
                                                      ui     ui       ui    
                                                     orig    pr       pr    
                                                                      vs    
                                                                     perf   
                                                                      ui    
                                                                     orig   
                                                                  (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.141  0.106     1.33   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.187  0.155     1.20   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   5.990  4.188     1.43   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   8.370  6.593     1.27   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  18.214 12.600    1.45   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  25.365 19.860    1.28   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295    1.43   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573    1.27   
phase32f::VectorLength::128                         0.003  0.002     1.54   
phase32f::VectorLength::1000                        0.016  0.008     1.90   
phase32f::VectorLength::131072                      2.048  1.061     1.93   
phase32f::VectorLength::524288                      8.219  3.997     2.06   
phase32f::VectorLength::1048576                     16.426 8.001     2.05   
phase64f::VectorLength::128                         0.003  0.002     1.44   
phase64f::VectorLength::1000                        0.020  0.012     1.60   
phase64f::VectorLength::131072                      2.621  1.648     1.59   
phase64f::VectorLength::524288                      10.780 6.836     1.58   
phase64f::VectorLength::1048576                     22.723 14.060    1.62   
```

Test result before and after modifying UI on AVX2:
```
                   Name of Test                     perf  perf     perf   
                                                    avx2  avx2     avx2   
                                                    orig   pr       pr    
                                                                    vs    
                                                                   perf   
                                                                   avx2   
                                                                   orig   
                                                                (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.006 0.005    1.14   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.010 0.009    1.08   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   0.273 0.264    1.03   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   0.511 0.487    1.05   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  0.760 0.723    1.05   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  2.009 1.937    1.04   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923    1.04   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509    1.04   
phase32f::VectorLength::128                         0.000 0.000    0.98   
phase32f::VectorLength::1000                        0.001 0.001    0.97   
phase32f::VectorLength::131072                      0.105 0.111    0.95   
phase32f::VectorLength::524288                      0.402 0.402    1.00   
phase32f::VectorLength::1048576                     0.775 0.767    1.01   
phase64f::VectorLength::128                         0.000 0.000    1.00   
phase64f::VectorLength::1000                        0.001 0.001    1.01   
phase64f::VectorLength::131072                      0.163 0.162    1.01   
phase64f::VectorLength::524288                      0.669 0.653    1.02   
phase64f::VectorLength::1048576                     1.660 1.634    1.02   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-13 15:56:56 +03:00
GenshinImpactStarts
e30697fd42
Merge pull request #27002 from GenshinImpactStarts:magnitude
[HAL RVV] impl magnitude | add perf test #27002

Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces.

**UPDATE**: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`.

Perf test done on MUSE-PI.

```sh
$ opencv_test_core --gtest_filter="*Magnitude*"
$ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300
```

Test result between enabled UI and HAL:
```
                 Name of Test                     ui    rvv      rvv    
                                                                  vs    
                                                                  ui    
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.029  0.016     1.75   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.057  0.036     1.57   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.063  0.648     1.64   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.261  1.530     1.48   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.261  2.118     1.54   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  6.802  4.682     1.45   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287  4.738     1.54   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334    1.47   
```

Test result before and after enabling UI:
```
                 Name of Test                    orig    pr       pr    
                                                                  vs    
                                                                 orig   
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.032  0.029     1.11   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.067  0.057     1.17   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.228  1.063     1.16   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.786  2.261     1.23   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.762  3.261     1.15   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  8.549  6.802     1.26   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408  7.287     1.15   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226    1.24   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-13 08:34:11 +03:00
GenshinImpactStarts
60de3ff24f
Merge pull request #27015 from GenshinImpactStarts:sqrt
[HAL RVV] impl sqrt and invSqrt #27015

Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`.

Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.

In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead. 

BTW, I think this patch can directly replace RVV's UI.

**UPDATE**: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended.

```sh
$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*"
$ opencv_perf_core --gtest_filter="SqrtFixture.*" --perf_min_samples=300 --perf_force_samples=300
```

CanMV K230:
```
              Name of Test                 ui    rvv      rvv    
                                                           vs    
                                                           ui    
                                                       (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.052  0.027     1.96   
Sqrt::SqrtFixture::(127x61, 5, true)     0.101  0.026     3.80   
Sqrt::SqrtFixture::(127x61, 6, false)    0.106  0.059     1.79   
Sqrt::SqrtFixture::(127x61, 6, true)     0.207  0.058     3.55   
Sqrt::SqrtFixture::(640x480, 5, false)   1.988  0.956     2.08   
Sqrt::SqrtFixture::(640x480, 5, true)    3.920  0.948     4.13   
Sqrt::SqrtFixture::(640x480, 6, false)   4.179  2.342     1.78   
Sqrt::SqrtFixture::(640x480, 6, true)    8.220  2.290     3.59   
Sqrt::SqrtFixture::(1280x720, 5, false)  5.969  2.881     2.07   
Sqrt::SqrtFixture::(1280x720, 5, true)   11.731 2.857     4.11   
Sqrt::SqrtFixture::(1280x720, 6, false)  12.533 7.031     1.78   
Sqrt::SqrtFixture::(1280x720, 6, true)   24.643 6.917     3.56   
Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483     2.07   
Sqrt::SqrtFixture::(1920x1080, 5, true)  26.379 6.436     4.10   
Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833    1.78   
Sqrt::SqrtFixture::(1920x1080, 6, true)  55.434 15.565    3.56   
```

MUSE-PI:
```
                                                 GCC              |        clang            
              Name of Test                 ui    rvv      rvv     |   ui    rvv      rvv    
                                                           vs     |                   vs    
                                                           ui     |                   ui    
                                                       (x-factor) |               (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.027  0.018     1.46    | 0.027  0.016     1.65   
Sqrt::SqrtFixture::(127x61, 5, true)     0.050  0.017     2.98    | 0.050  0.017     2.99   
Sqrt::SqrtFixture::(127x61, 6, false)    0.053  0.031     1.72    | 0.052  0.032     1.64   
Sqrt::SqrtFixture::(127x61, 6, true)     0.100  0.030     3.31    | 0.101  0.035     2.86   
Sqrt::SqrtFixture::(640x480, 5, false)   0.955  0.483     1.98    | 0.959  0.499     1.92   
Sqrt::SqrtFixture::(640x480, 5, true)    1.873  0.489     3.83    | 1.873  0.520     3.60   
Sqrt::SqrtFixture::(640x480, 6, false)   2.027  1.163     1.74    | 2.037  1.218     1.67   
Sqrt::SqrtFixture::(640x480, 6, true)    3.961  1.153     3.44    | 3.961  1.341     2.95   
Sqrt::SqrtFixture::(1280x720, 5, false)  2.916  1.538     1.90    | 2.912  1.598     1.82   
Sqrt::SqrtFixture::(1280x720, 5, true)   5.735  1.534     3.74    | 5.726  1.661     3.45   
Sqrt::SqrtFixture::(1280x720, 6, false)  6.121  3.585     1.71    | 6.109  3.725     1.64   
Sqrt::SqrtFixture::(1280x720, 6, true)   12.059 3.501     3.44    | 12.053 4.080     2.95   
Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540  3.535     1.85    | 6.540  3.643     1.80   
Sqrt::SqrtFixture::(1920x1080, 5, true)  12.943 3.445     3.76    | 12.908 3.706     3.48   
Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062     1.70    | 13.711 8.376     1.64   
Sqrt::SqrtFixture::(1920x1080, 6, true)  27.011 7.989     3.38    | 27.115 9.245     2.93   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 08:34:27 +03:00
Alexander Smorkalov
a48e78cdfc
Merge pull request #27026 from amane-ame/filter_hal_rvv
Add RISC-V HAL implementation for cv::filter series
2025-03-11 16:09:45 +03:00
amane-ame
2dd72201af Remove CV_ASSERT.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-11 18:37:58 +08:00
amane-ame
d9ec808b15 Use the macro from interface.h.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-11 17:44:55 +08:00
Alexander Smorkalov
4be88e934f
Merge pull request #27010 from GenshinImpactStarts/exp_log
[HAL RVV] impl exp and log | add log perf test
2025-03-11 10:51:03 +03:00
Alexander Smorkalov
3236436892
Merge pull request #27036 from CodeLinaro:xuezha_3rdPost
Fix gaussianBlur5x5 performance regression
2025-03-10 18:21:20 +03:00
Xue Zhang
accebdecf7 Fix gaussianBlur5x5 performance regression 2025-03-10 16:16:56 +05:30
Alexander Smorkalov
316b5d7b08
Merge pull request #27031 from sturkmen72:libjpeg-turbo_ver_3.1.0
Libjpeg-turbo update to version 3.1.0
2025-03-10 13:44:00 +03:00
amane-ame
54da5c3e77 Add some algorithm comments.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-10 16:42:58 +08:00
GenshinImpactStarts
830d031213
Merge pull request #26977 from GenshinImpactStarts:helper_hal_rvv
[Refactor](HAL RVV): Consolidate Helpers for Code Reusability #26977

This PR introduces a new helper file with utility types and templates to standardize function interfaces. This refactor allows us to avoid duplicate code when types differ but logic remains the same.

The `flip` and `minmax` implementations have been updated to use the new generic helpers, replacing the previously defined, redundant classes.

Due to the large number of functions, not all interfaces are unified yet. Future development can extend the types as needed. While the usage of function templates is currently limited, this will ease future development.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-10 10:36:48 +03:00
amane-ame
02253dd76b Copy cv::borderInterpolate from core.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-10 15:26:41 +08:00
quic-xuezha
797068853f
Merge pull request #27033 from CodeLinaro:xuezha_3rdPost
Fix assert failure in Sobel test when enable FastCV #27033

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-10 10:24:28 +03:00
Suleyman TURKMEN
6d161c25ef Update libjpeg-turbo version:3.1.0 2025-03-09 00:02:20 +03:00
GenshinImpactStarts
0fed1fa184 fix exp, log | enable ui for log | strengthen test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 17:11:26 +00:00
GenshinImpactStarts
524d8ae01c impl exp and log | add log perf test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 17:11:26 +00:00
amane-ame
e06502a254 Add Morph for MORPH_ERODE and MORPH_DILATE.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-08 00:35:50 +08:00
amane-ame
a2d784b6f5 Add sepFilter.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 20:56:04 +08:00
天音あめ
e89e2fd7ea
Merge pull request #27007 from amane-ame:color_hal_rvv
Add RISC-V HAL implementation for cv::cvtColor #27007

This patch implements the following functions in RVV_HAL using native intrinsics, optimizing the performance of `cv::cvtColor` for all possible data types and modes (except for `COLOR_Bayer`, `COLOR_YUV2GRAY_420` and `COLOR_mRGBA`, as these modes have no HAL interface):

```
cv_hal_cvtBGRtoBGR
cv_hal_cvtBGRtoBGR5x5
cv_hal_cvtBGR5x5toBGR
cv_hal_cvtBGRtoGray
cv_hal_cvtGraytoBGR
cv_hal_cvtBGR5x5toGray
cv_hal_cvtGraytoBGR5x5
cv_hal_cvtBGRtoYUV
cv_hal_cvtYUVtoBGR
cv_hal_cvtBGRtoXYZ
cv_hal_cvtXYZtoBGR
cv_hal_cvtBGRtoHSV
cv_hal_cvtHSVtoBGR
cv_hal_cvtBGRtoLab
cv_hal_cvtLabtoBGR
cv_hal_cvtTwoPlaneYUVtoBGR
cv_hal_cvtBGRtoTwoPlaneYUV
cv_hal_cvtThreePlaneYUVtoBGR
cv_hal_cvtBGRtoThreePlaneYUV
cv_hal_cvtOnePlaneYUVtoBGR
cv_hal_cvtOnePlaneBGRtoYUV
```

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Color*-*Bayer*"
$ ./opencv_perf_imgproc --gtest_filter="*Color*-*Bayer*" --gtest_also_run_disabled_tests --perf_min_samples=100 --perf_force_samples=100
```

View the full perf table here: [hal_rvv_color.pdf](https://github.com/user-attachments/files/19055417/hal_rvv_color.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
2025-03-07 11:24:48 +03:00
天音あめ
00956d5c15
Merge pull request #26892 from amane-ame:solve_hal_rvv
Add RISC-V HAL implementation for cv::solve #26892

This patch implements `cv_hal_LU/cv_hal_Cholesky/cv_hal_SVD/cv_hal_QR` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::solve` with method `DECOMP_LU/DECOMP_SVD/DECOMP_CHOLESKY/DECOMP_QR` and data types `32FC1/64FC1`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_core --gtest_filter="*Solve*:*SVD*:*Cholesky*"
$ ./opencv_perf_core --gtest_filter="*SolveTest*" --perf_min_samples=100 --perf_force_samples=100
```

The tail of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_solve.pdf](https://github.com/user-attachments/files/18725067/hal_rvv_solve.pdf)

<img width="1078" alt="Untitled" src="https://github.com/user-attachments/assets/c01d849c-f000-4bcc-bfe0-a302d6605d9e" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:14:09 +03:00
天音あめ
bb525fe91d
Merge pull request #26865 from amane-ame:dxt_hal_rvv
Add RISC-V HAL implementation for cv::dft and cv::dct #26865

This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`.

The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary.

Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf)

<img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:08:41 +03:00
GenshinImpactStarts
57a78cb9df
Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv
Impl hal_rvv LUT | Add more LUT test #26941 

Implement through the existing `cv_hal_lut` interfaces.

Add more LUT accuracy and performance tests:
- **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust.
- **Performance test**: Multi-channel input and multi-channel table tests are added.

Perf test done on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)


```sh

$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300
```
```sh
Geometric mean (ms)

         Name of Test          scalar   ui    rvv       ui        rvv    
                                                        vs         vs    
                                                      scalar     scalar  
                                                    (x-factor) (x-factor)
LUT::SizePrm::320x240          0.248  0.249  0.052     1.00       4.74   
LUT::SizePrm::640x480          0.277  0.275  0.085     1.01       3.28   
LUT::SizePrm::1920x1080        0.950  0.947  0.634     1.00       1.50   
LUT_multi2::SizePrm::320x240   2.051  2.045  2.049     1.00       1.00   
LUT_multi2::SizePrm::640x480   2.128  2.134  2.125     1.00       1.00   
LUT_multi2::SizePrm::1920x1080 7.397  7.380  7.390     1.00       1.00   
LUT_multi::SizePrm::320x240    0.715  0.747  0.154     0.96       4.64   
LUT_multi::SizePrm::640x480    0.741  0.766  0.257     0.97       2.88   
LUT_multi::SizePrm::1920x1080  2.766  2.765  1.925     1.00       1.44  
```

This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:  
- For the U8C1 table type, the optimization works when `vlen >= 256`
- For U16C1, it works when `vlen >= 512`
- For U32C1, it works when `vlen >= 1024`

Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

This patch does not implement optimizations for multi-channel tables.

Previous attempts:
1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
3. Using `vluxei8` as a general solution does not show any performance improvement.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-06 11:17:00 +03:00
amane-ame
83104bed32 Add Filter2D.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-06 14:10:06 +08:00
天音あめ
cbcfd772ce
Merge pull request #26958 from amane-ame:pyramids_hal_rvv
Add RISC-V HAL implementation for cv::pyrDown and cv::pyrUp #26958

This patch implements `cv_hal_pyrdown/cv_hal_pyrup` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::pyrDown`, `cv::pyrUp` and `cv::buildPyramids` with data types `{8U,16S,32F} x {C1,C2,C3,C4,Cn}`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*pyr*:*Pyr*"
$ ./opencv_perf_imgproc --gtest_filter="*pyr*:*Pyr*" --perf_min_samples=300 --perf_force_samples=300
```

<img width="1112" alt="Untitled" src="https://github.com/user-attachments/assets/235a9fba-0d29-434e-8a10-498212bac657" />


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-04 15:41:15 +03:00
GenshinImpactStarts
33d632f85e impl hal_rvv norm_hamming
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-02-25 02:31:02 +00:00
GenshinImpactStarts
6a6a5a765d
Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv
Impl RISC-V HAL for cv::flip | Add perf test for flip #26943 

Implement through the existing `cv_hal_flip` interfaces.

Add perf test for `cv::flip`.

The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
    - U8C1: basic situation
    - U8C3: unaligned element size
    - U8C4: large element size

Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)

```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```

```
Geometric mean (ms)

                     Name of Test                       scalar   ui    rvv       ui        rvv    
                                                                                 vs         vs    
                                                                               scalar     scalar  
                                                                             (x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)    0.026  0.033  0.031     0.81       0.84   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)   0.206  0.212  0.091     0.97       2.26   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)    0.185  0.189  0.082     0.98       2.25   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)    0.070  0.084  0.084     0.83       0.83   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)   0.616  0.612  0.235     1.01       2.62   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)    0.587  0.603  0.204     0.97       2.88   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)    0.263  0.110  0.109     2.40       2.41   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)   0.930  0.831  0.316     1.12       2.95   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)    1.175  1.129  0.313     1.04       3.75   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)    0.303  0.118  0.111     2.57       2.73   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)   0.949  0.836  0.405     1.14       2.34   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)    0.784  0.783  0.409     1.00       1.92   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)    1.084  0.360  0.355     3.01       3.06   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)   3.768  3.348  1.364     1.13       2.76   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)    4.361  4.473  1.296     0.97       3.37   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)    1.252  0.469  0.451     2.67       2.78   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)   5.732  5.220  1.303     1.10       4.40   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)    5.041  5.105  1.203     0.99       4.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)  2.382  0.903  0.903     2.64       2.64   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606  7.508  2.581     1.15       3.33   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)  8.421  8.535  2.219     0.99       3.80   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)  6.312  2.416  2.429     2.61       2.60   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761    1.12       2.29   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)  25.373 25.500 13.382    1.00       1.90   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)  7.620  3.204  3.115     2.38       2.45   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976    1.12       2.53   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)  28.831 29.094 14.919    0.99       1.93   
```

The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-24 08:56:23 +03:00
Dmitry Kurtaev
7a2b048c92
Merge pull request #26923 from dkurt:merge_rvv_opt
Further optimization of cv::merge RVV HAL for 8U and 16S #26923

### Pull Request Readiness Checklist


* Banana Pi BF3 (SpacemiT K1) RISC-V
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)

```
Geometric mean (ms)

                     Name of Test                       baseline   pr       pr
                                                         merge              vs    
                                                                         baseline
                                                                          merge
                                                                        (x-factor)
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)      0.013   0.003     3.76   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)      0.020   0.006     3.46   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)      0.026   0.010     2.61   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 5)      0.043   0.028     1.56   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 6)      0.054   0.035     1.53   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 7)      0.065   0.050     1.30   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 8)      0.070   0.036     1.95   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)     0.015   0.008     1.82   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)     0.022   0.015     1.48   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)     0.029   0.018     1.63   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 5)     0.067   0.044     1.54   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 6)     0.088   0.056     1.58   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 7)     0.104   0.076     1.38   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 8)     0.116   0.065     1.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)     0.421   0.176     2.39   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)     0.792   0.284     2.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)     1.090   0.370     2.95   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 5)     1.835   1.399     1.31   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 6)     2.389   1.776     1.35   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 7)     3.000   2.471     1.21   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 8)     3.178   2.104     1.51   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)    0.490   0.377     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)    1.348   0.602     2.24   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)    1.827   0.813     2.25   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 5)    3.283   2.692     1.22   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 6)    4.922   3.334     1.48   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 7)    5.725   4.399     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 8)    6.278   4.748     1.32   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)    1.267   0.603     2.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)    2.394   0.934     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)    3.236   1.434     2.26   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 5)    5.398   4.345     1.24   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 6)    7.127   5.459     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 7)    8.590   7.298     1.18   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 8)    9.360   6.152     1.52   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)   1.482   1.242     1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)   4.008   1.817     2.21   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)   6.079   2.468     2.46   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 5)   11.300  8.644     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 6)   15.125  12.126    1.25   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 7)   17.555  14.804    1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 8)   18.890  14.163    1.33   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)   2.910   1.326     2.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)   5.351   1.997     2.68   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)   7.290   2.629     2.77   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 5)   12.426  9.611     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 6)   16.453  12.162    1.35   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 7)   19.420  16.190    1.20   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 8)   20.588  13.699    1.50   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)  3.400   2.640     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)  8.986   3.952     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)  11.972  5.273     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 5)  20.544  17.996    1.14   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 6)  28.677  22.086    1.30   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 7)  32.958  27.713    1.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 8)  36.499  27.439    1.33
```

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-20 17:28:28 +03:00
Vincent Rabaud
a6bfd87943 Bump openjp2 to v2.5.3
This should quiet some fuzzer bugs
2025-02-20 10:46:33 +03:00
Alexander Smorkalov
acc9084044 Move OpenVX integrations to imgproc to OpenVX HAL
Covered functions:
- medianBlur
- Sobel
- Canny
- pyrDown
- BoxFilter
- equalizeHist
- GaussianBlur
- remap
- threshold
2025-02-15 09:55:37 +03:00
Alexander Smorkalov
1de6e20463 Move OpenVX implementation for FAST to HAL. 2025-02-14 17:47:48 +03:00
Alexander Smorkalov
58e557d059
Merge pull request #26903 from asmorkalov:as/openvx_hal
Migrate remaning OpenVX integrations to OpenVX HAL (core) #26903

Tested with OpenVX 1.2 & 1.3 sample implementation.

Steps to build and test:
```
git clone git@github.com:KhronosGroup/OpenVX-sample-impl.git
cd OpenVX-sample-impl
python3 Build.py --os=Linux --conf=Release
cd ..
mkdir build
cmake -DWITH_OPENVX=ON -DOPENVX_ROOT=/mnt/Projects/Projects/OpenVX-sample-impl/install/Linux/x64/Release/ ../opencv
make -j8
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-14 11:55:20 +03:00
Alexander Smorkalov
5921aae2b3 Switch to static instance of FastCV on Linux. 2025-02-13 15:58:25 +03:00
lve-gh
d8c2f0bcdf
Merge pull request #26884 from lve-gh:split8u_rvv_hal
[HAL] split8u RVV 1.0 #26884

### Pull Request Readiness Checklist
* Banana Pi BF3 (SpacemiT K1)
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)
```
Geometric mean (ms)

                  Name of Test                   baseline  hal      hal
                                                    ui               vs
                                                                  baseline 
                                                                     ui
                                                                 (x-factor)
split::Size_Depth_Channels::(127x61, 8UC1, 2)     0.012   0.004     3.12   
split::Size_Depth_Channels::(127x61, 8UC1, 3)     0.019   0.006     2.91   
split::Size_Depth_Channels::(127x61, 8UC1, 4)     0.028   0.011     2.64   
split::Size_Depth_Channels::(127x61, 8UC1, 5)     0.067   0.033     2.02   
split::Size_Depth_Channels::(127x61, 8UC1, 6)     0.084   0.040     2.11   
split::Size_Depth_Channels::(127x61, 8UC1, 7)     0.103   0.055     1.88   
split::Size_Depth_Channels::(127x61, 8UC1, 8)     0.113   0.032     3.50   
split::Size_Depth_Channels::(640x480, 8UC1, 2)    0.454   0.179     2.54   
split::Size_Depth_Channels::(640x480, 8UC1, 3)    0.677   0.298     2.27   
split::Size_Depth_Channels::(640x480, 8UC1, 4)    0.901   0.410     2.20   
split::Size_Depth_Channels::(640x480, 8UC1, 5)    3.781   3.010     1.26   
split::Size_Depth_Channels::(640x480, 8UC1, 6)    4.886   4.009     1.22   
split::Size_Depth_Channels::(640x480, 8UC1, 7)    5.777   4.770     1.21   
split::Size_Depth_Channels::(640x480, 8UC1, 8)    4.596   1.330     3.46   
split::Size_Depth_Channels::(1280x720, 8UC1, 2)   1.377   0.709     1.94   
split::Size_Depth_Channels::(1280x720, 8UC1, 3)   2.091   1.034     2.02   
split::Size_Depth_Channels::(1280x720, 8UC1, 4)   2.744   1.573     1.74   
split::Size_Depth_Channels::(1280x720, 8UC1, 5)   9.542   6.284     1.52   
split::Size_Depth_Channels::(1280x720, 8UC1, 6)   11.114  7.850     1.42   
split::Size_Depth_Channels::(1280x720, 8UC1, 7)   14.083  11.879    1.19   
split::Size_Depth_Channels::(1280x720, 8UC1, 8)   13.524  3.865     3.50   
split::Size_Depth_Channels::(1920x1080, 8UC1, 2)  3.108   1.395     2.23   
split::Size_Depth_Channels::(1920x1080, 8UC1, 3)  4.659   2.128     2.19   
split::Size_Depth_Channels::(1920x1080, 8UC1, 4)  6.127   2.818     2.17   
split::Size_Depth_Channels::(1920x1080, 8UC1, 5)  26.733  16.625    1.61   
split::Size_Depth_Channels::(1920x1080, 8UC1, 6)  31.242  22.414    1.39   
split::Size_Depth_Channels::(1920x1080, 8UC1, 7)  35.968  27.658    1.30   
split::Size_Depth_Channels::(1920x1080, 8UC1, 8)  29.997  8.655     3.47
```
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-11 17:57:05 +03:00
天音あめ
2e909c38dc
Merge pull request #26804 from amane-ame:norm_hal_rvv
Add RISC-V HAL implementation for cv::norm and cv::normalize #26804

This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`.

`cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*Norm*"
$ opencv_perf_core --gtest_filter="*norm*" --perf_min_samples=300 --perf_force_samples=300
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf)

<img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-06 19:34:54 +03:00
Alexander Smorkalov
b7663086fb Do not rely on cv namespace in HAL. 2025-02-06 10:00:28 +03:00
天音あめ
13b2caffe0
Merge pull request #26789 from amane-ame:minmax_hal_rvv
Add RISC-V HAL implementation for minMaxIdx #26789

On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV).

1d701d1690/modules/core/src/minmax.cpp (L209-L214)

This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*MinMaxLoc*"
$ opencv_perf_core --gtest_filter="*minMaxLoc*"
```
<img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-31 14:26:49 +03:00
Horror Proton
86241653a7 Add RISC-V HAL implementation for cv::phase 2025-01-29 12:07:59 +08:00
eplankin
ae57c54d83
Merge pull request #26463 from eplankin:icv_update_2022.0.0
Update IPP integration #26463

Please merge together with https://github.com/opencv/opencv_3rdparty/pull/88
Supported IPP version was updated to IPP 2022.0.0 for Linux and Windows. 32-bit binaries are dropped since this release.

Previous update: https://github.com/opencv/opencv/pull/25935
2025-01-27 17:02:36 +03:00
Kumataro
3e1fafefbe
Merge pull request #26802 from Kumataro:fix26801
3rdparty:ittnotify: update to v3.25.4 #26802

Close https://github.com/opencv/opencv/issues/26801
See https://github.com/opencv/opencv/pull/26797

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-20 10:54:13 +03:00
Alexander Smorkalov
9c33baebbd
Merge pull request #26675 from hanliutong:rvv-hal-fix
Add test cases and fix bugs in the RISC-V Vector HAL.
2024-12-29 18:09:21 +03:00