opencv

mirror/opencv

Fork 0

mirror of https://github.com/opencv/opencv.git synced 2025-06-17 23:51:16 +08:00

Commit Graph

Author	SHA1	Message	Date
Yuantao Feng	2fb786532a	Merge pull request #27257 from fengyuentau:4x/hal_rvv/flip_opt hal_rvv: further optimized flip #27257 Checklist: - [x] flipX - [x] flipY - [x] flipXY ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-04-26 11:08:29 +03:00
GenshinImpactStarts	6a6a5a765d	Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv Impl RISC-V HAL for cv::flip \| Add perf test for flip #26943 Implement through the existing `cv_hal_flip` interfaces. Add perf test for `cv::flip`. The reason why select these args for testing: - size: copied from perf_lut - type: - U8C1: basic situation - U8C3: unaligned element size - U8C4: large element size Tested on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest." $ opencv_perf_core --gtest_filter="Size_MatType_FlipCode" --perf_min_samples=300 --perf_force_samples=300 ``` ``` Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 0.026 0.033 0.031 0.81 0.84 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 0.206 0.212 0.091 0.97 2.26 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 0.185 0.189 0.082 0.98 2.25 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 0.070 0.084 0.084 0.83 0.83 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 0.616 0.612 0.235 1.01 2.62 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 0.587 0.603 0.204 0.97 2.88 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 0.263 0.110 0.109 2.40 2.41 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 0.930 0.831 0.316 1.12 2.95 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 1.175 1.129 0.313 1.04 3.75 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 0.303 0.118 0.111 2.57 2.73 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 0.949 0.836 0.405 1.14 2.34 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 0.784 0.783 0.409 1.00 1.92 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 1.084 0.360 0.355 3.01 3.06 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 3.768 3.348 1.364 1.13 2.76 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 4.361 4.473 1.296 0.97 3.37 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 1.252 0.469 0.451 2.67 2.78 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 5.732 5.220 1.303 1.10 4.40 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 5.041 5.105 1.203 0.99 4.19 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 2.382 0.903 0.903 2.64 2.64 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606 7.508 2.581 1.15 3.33 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 8.421 8.535 2.219 0.99 3.80 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 6.312 2.416 2.429 2.61 2.60 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761 1.12 2.29 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 25.373 25.500 13.382 1.00 1.90 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 7.620 3.204 3.115 2.38 2.45 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976 1.12 2.53 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 28.831 29.094 14.919 0.99 1.93 ``` The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-24 08:56:23 +03:00

Author

SHA1

Message

Date

Yuantao Feng

2fb786532a

Merge pull request #27257 from fengyuentau:4x/hal_rvv/flip_opt

hal_rvv: further optimized flip #27257

Checklist:
- [x] flipX
- [x] flipY
- [x] flipXY

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

2025-04-26 11:08:29 +03:00

GenshinImpactStarts

6a6a5a765d

Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv

Impl RISC-V HAL for cv::flip | Add perf test for flip #26943 

Implement through the existing `cv_hal_flip` interfaces.

Add perf test for `cv::flip`.

The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
    - U8C1: basic situation
    - U8C3: unaligned element size
    - U8C4: large element size

Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)

```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```

```
Geometric mean (ms)

                     Name of Test                       scalar   ui    rvv       ui        rvv    
                                                                                 vs         vs    
                                                                               scalar     scalar  
                                                                             (x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)    0.026  0.033  0.031     0.81       0.84   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)   0.206  0.212  0.091     0.97       2.26   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)    0.185  0.189  0.082     0.98       2.25   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)    0.070  0.084  0.084     0.83       0.83   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)   0.616  0.612  0.235     1.01       2.62   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)    0.587  0.603  0.204     0.97       2.88   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)    0.263  0.110  0.109     2.40       2.41   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)   0.930  0.831  0.316     1.12       2.95   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)    1.175  1.129  0.313     1.04       3.75   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)    0.303  0.118  0.111     2.57       2.73   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)   0.949  0.836  0.405     1.14       2.34   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)    0.784  0.783  0.409     1.00       1.92   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)    1.084  0.360  0.355     3.01       3.06   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)   3.768  3.348  1.364     1.13       2.76   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)    4.361  4.473  1.296     0.97       3.37   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)    1.252  0.469  0.451     2.67       2.78   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)   5.732  5.220  1.303     1.10       4.40   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)    5.041  5.105  1.203     0.99       4.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)  2.382  0.903  0.903     2.64       2.64   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606  7.508  2.581     1.15       3.33   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)  8.421  8.535  2.219     0.99       3.80   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)  6.312  2.416  2.429     2.61       2.60   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761    1.12       2.29   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)  25.373 25.500 13.382    1.00       1.90   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)  7.620  3.204  3.115     2.38       2.45   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976    1.12       2.53   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)  28.831 29.094 14.919    0.99       1.93   
```

The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

2025-02-24 08:56:23 +03:00

2 Commits