opencv

mirror of https://github.com/opencv/opencv.git synced 2025-06-25 05:32:14 +08:00

Author	SHA1	Message	Date
GenshinImpactStarts	60de3ff24f	Merge pull request #27015 from GenshinImpactStarts:sqrt [HAL RVV] impl sqrt and invSqrt #27015 Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`. Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared. In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead. BTW, I think this patch can directly replace RVV's UI. UPDATE: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended. ```sh $ opencv_test_core --gtest_filter="Core_HAL/mathfuncs." $ opencv_perf_core --gtest_filter="SqrtFixture." --perf_min_samples=300 --perf_force_samples=300 ``` CanMV K230: ``` Name of Test ui rvv rvv vs ui (x-factor) Sqrt::SqrtFixture::(127x61, 5, false) 0.052 0.027 1.96 Sqrt::SqrtFixture::(127x61, 5, true) 0.101 0.026 3.80 Sqrt::SqrtFixture::(127x61, 6, false) 0.106 0.059 1.79 Sqrt::SqrtFixture::(127x61, 6, true) 0.207 0.058 3.55 Sqrt::SqrtFixture::(640x480, 5, false) 1.988 0.956 2.08 Sqrt::SqrtFixture::(640x480, 5, true) 3.920 0.948 4.13 Sqrt::SqrtFixture::(640x480, 6, false) 4.179 2.342 1.78 Sqrt::SqrtFixture::(640x480, 6, true) 8.220 2.290 3.59 Sqrt::SqrtFixture::(1280x720, 5, false) 5.969 2.881 2.07 Sqrt::SqrtFixture::(1280x720, 5, true) 11.731 2.857 4.11 Sqrt::SqrtFixture::(1280x720, 6, false) 12.533 7.031 1.78 Sqrt::SqrtFixture::(1280x720, 6, true) 24.643 6.917 3.56 Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483 2.07 Sqrt::SqrtFixture::(1920x1080, 5, true) 26.379 6.436 4.10 Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833 1.78 Sqrt::SqrtFixture::(1920x1080, 6, true) 55.434 15.565 3.56 ``` MUSE-PI: ``` GCC \| clang Name of Test ui rvv rvv \| ui rvv rvv vs \| vs ui \| ui (x-factor) \| (x-factor) Sqrt::SqrtFixture::(127x61, 5, false) 0.027 0.018 1.46 \| 0.027 0.016 1.65 Sqrt::SqrtFixture::(127x61, 5, true) 0.050 0.017 2.98 \| 0.050 0.017 2.99 Sqrt::SqrtFixture::(127x61, 6, false) 0.053 0.031 1.72 \| 0.052 0.032 1.64 Sqrt::SqrtFixture::(127x61, 6, true) 0.100 0.030 3.31 \| 0.101 0.035 2.86 Sqrt::SqrtFixture::(640x480, 5, false) 0.955 0.483 1.98 \| 0.959 0.499 1.92 Sqrt::SqrtFixture::(640x480, 5, true) 1.873 0.489 3.83 \| 1.873 0.520 3.60 Sqrt::SqrtFixture::(640x480, 6, false) 2.027 1.163 1.74 \| 2.037 1.218 1.67 Sqrt::SqrtFixture::(640x480, 6, true) 3.961 1.153 3.44 \| 3.961 1.341 2.95 Sqrt::SqrtFixture::(1280x720, 5, false) 2.916 1.538 1.90 \| 2.912 1.598 1.82 Sqrt::SqrtFixture::(1280x720, 5, true) 5.735 1.534 3.74 \| 5.726 1.661 3.45 Sqrt::SqrtFixture::(1280x720, 6, false) 6.121 3.585 1.71 \| 6.109 3.725 1.64 Sqrt::SqrtFixture::(1280x720, 6, true) 12.059 3.501 3.44 \| 12.053 4.080 2.95 Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540 3.535 1.85 \| 6.540 3.643 1.80 Sqrt::SqrtFixture::(1920x1080, 5, true) 12.943 3.445 3.76 \| 12.908 3.706 3.48 Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062 1.70 \| 13.711 8.376 1.64 Sqrt::SqrtFixture::(1920x1080, 6, true) 27.011 7.989 3.38 \| 27.115 9.245 2.93 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-12 08:34:27 +03:00
Alexander Smorkalov	a48e78cdfc	Merge pull request #27026 from amane-ame/filter_hal_rvv Add RISC-V HAL implementation for cv::filter series	2025-03-11 16:09:45 +03:00
GenshinImpactStarts	524d8ae01c	impl exp and log \| add log perf test Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-03-07 17:11:26 +00:00
天音あめ	e89e2fd7ea	Merge pull request #27007 from amane-ame:color_hal_rvv Add RISC-V HAL implementation for cv::cvtColor #27007 This patch implements the following functions in RVV_HAL using native intrinsics, optimizing the performance of `cv::cvtColor` for all possible data types and modes (except for `COLOR_Bayer`, `COLOR_YUV2GRAY_420` and `COLOR_mRGBA`, as these modes have no HAL interface): ``` cv_hal_cvtBGRtoBGR cv_hal_cvtBGRtoBGR5x5 cv_hal_cvtBGR5x5toBGR cv_hal_cvtBGRtoGray cv_hal_cvtGraytoBGR cv_hal_cvtBGR5x5toGray cv_hal_cvtGraytoBGR5x5 cv_hal_cvtBGRtoYUV cv_hal_cvtYUVtoBGR cv_hal_cvtBGRtoXYZ cv_hal_cvtXYZtoBGR cv_hal_cvtBGRtoHSV cv_hal_cvtHSVtoBGR cv_hal_cvtBGRtoLab cv_hal_cvtLabtoBGR cv_hal_cvtTwoPlaneYUVtoBGR cv_hal_cvtBGRtoTwoPlaneYUV cv_hal_cvtThreePlaneYUVtoBGR cv_hal_cvtBGRtoThreePlaneYUV cv_hal_cvtOnePlaneYUVtoBGR cv_hal_cvtOnePlaneBGRtoYUV ``` Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ ./opencv_test_imgproc --gtest_filter="Color-Bayer" $ ./opencv_perf_imgproc --gtest_filter="Color-Bayer" --gtest_also_run_disabled_tests --perf_min_samples=100 --perf_force_samples=100 ``` View the full perf table here: [hal_rvv_color.pdf](https://github.com/user-attachments/files/19055417/hal_rvv_color.pdf) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable	2025-03-07 11:24:48 +03:00
天音あめ	00956d5c15	Merge pull request #26892 from amane-ame:solve_hal_rvv Add RISC-V HAL implementation for cv::solve #26892 This patch implements `cv_hal_LU/cv_hal_Cholesky/cv_hal_SVD/cv_hal_QR` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::solve` with method `DECOMP_LU/DECOMP_SVD/DECOMP_CHOLESKY/DECOMP_QR` and data types `32FC1/64FC1`. Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ ./opencv_test_core --gtest_filter="Solve:SVD:Cholesky" $ ./opencv_perf_core --gtest_filter="SolveTest" --perf_min_samples=100 --perf_force_samples=100 ``` The tail of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_solve.pdf](https://github.com/user-attachments/files/18725067/hal_rvv_solve.pdf) <img width="1078" alt="Untitled" src="https://github.com/user-attachments/assets/c01d849c-f000-4bcc-bfe0-a302d6605d9e" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-07 11:14:09 +03:00
天音あめ	bb525fe91d	Merge pull request #26865 from amane-ame:dxt_hal_rvv Add RISC-V HAL implementation for cv::dft and cv::dct #26865 This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`. The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary. Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot. Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="DFT" $ opencv_perf_core --gtest_filter="dft:dct" --perf_min_samples=30 --perf_force_samples=30 ``` The head of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf) <img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-07 11:08:41 +03:00
GenshinImpactStarts	57a78cb9df	Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv Impl hal_rvv LUT \| Add more LUT test #26941 Implement through the existing `cv_hal_lut` interfaces. Add more LUT accuracy and performance tests: - Accuracy test: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust. - Performance test: Multi-channel input and multi-channel table tests are added. Perf test done on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_LUT" $ opencv_perf_core --gtest_filter="SizePrm_LUT" --perf_min_samples=300 --perf_force_samples=300 ``` ```sh Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) LUT::SizePrm::320x240 0.248 0.249 0.052 1.00 4.74 LUT::SizePrm::640x480 0.277 0.275 0.085 1.01 3.28 LUT::SizePrm::1920x1080 0.950 0.947 0.634 1.00 1.50 LUT_multi2::SizePrm::320x240 2.051 2.045 2.049 1.00 1.00 LUT_multi2::SizePrm::640x480 2.128 2.134 2.125 1.00 1.00 LUT_multi2::SizePrm::1920x1080 7.397 7.380 7.390 1.00 1.00 LUT_multi::SizePrm::320x240 0.715 0.747 0.154 0.96 4.64 LUT_multi::SizePrm::640x480 0.741 0.766 0.257 0.97 2.88 LUT_multi::SizePrm::1920x1080 2.766 2.765 1.925 1.00 1.44 ``` This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions: - For the U8C1 table type, the optimization works when `vlen >= 256` - For U16C1, it works when `vlen >= 512` - For U32C1, it works when `vlen >= 1024` Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. This patch does not implement optimizations for multi-channel tables. Previous attempts: 1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version. 2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version. 3. Using `vluxei8` as a general solution does not show any performance improvement. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-06 11:17:00 +03:00
amane-ame	83104bed32	Add Filter2D. Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-03-06 14:10:06 +08:00
天音あめ	cbcfd772ce	Merge pull request #26958 from amane-ame:pyramids_hal_rvv Add RISC-V HAL implementation for cv::pyrDown and cv::pyrUp #26958 This patch implements `cv_hal_pyrdown/cv_hal_pyrup` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::pyrDown`, `cv::pyrUp` and `cv::buildPyramids` with data types `{8U,16S,32F} x {C1,C2,C3,C4,Cn}`. Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0. ``` $ ./opencv_test_imgproc --gtest_filter="pyr:Pyr" $ ./opencv_perf_imgproc --gtest_filter="pyr:Pyr" --perf_min_samples=300 --perf_force_samples=300 ``` <img width="1112" alt="Untitled" src="https://github.com/user-attachments/assets/235a9fba-0d29-434e-8a10-498212bac657" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-03-04 15:41:15 +03:00
GenshinImpactStarts	33d632f85e	impl hal_rvv norm_hamming Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>	2025-02-25 02:31:02 +00:00
GenshinImpactStarts	6a6a5a765d	Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv Impl RISC-V HAL for cv::flip \| Add perf test for flip #26943 Implement through the existing `cv_hal_flip` interfaces. Add perf test for `cv::flip`. The reason why select these args for testing: - size: copied from perf_lut - type: - U8C1: basic situation - U8C3: unaligned element size - U8C4: large element size Tested on - MUSE-PI (vlen=256) - Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024) ```sh $ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest." $ opencv_perf_core --gtest_filter="Size_MatType_FlipCode" --perf_min_samples=300 --perf_force_samples=300 ``` ``` Geometric mean (ms) Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X) 0.026 0.033 0.031 0.81 0.84 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY) 0.206 0.212 0.091 0.97 2.26 flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y) 0.185 0.189 0.082 0.98 2.25 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X) 0.070 0.084 0.084 0.83 0.83 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY) 0.616 0.612 0.235 1.01 2.62 flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y) 0.587 0.603 0.204 0.97 2.88 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X) 0.263 0.110 0.109 2.40 2.41 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY) 0.930 0.831 0.316 1.12 2.95 flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y) 1.175 1.129 0.313 1.04 3.75 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X) 0.303 0.118 0.111 2.57 2.73 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY) 0.949 0.836 0.405 1.14 2.34 flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y) 0.784 0.783 0.409 1.00 1.92 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X) 1.084 0.360 0.355 3.01 3.06 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY) 3.768 3.348 1.364 1.13 2.76 flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y) 4.361 4.473 1.296 0.97 3.37 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X) 1.252 0.469 0.451 2.67 2.78 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY) 5.732 5.220 1.303 1.10 4.40 flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y) 5.041 5.105 1.203 0.99 4.19 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X) 2.382 0.903 0.903 2.64 2.64 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606 7.508 2.581 1.15 3.33 flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y) 8.421 8.535 2.219 0.99 3.80 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X) 6.312 2.416 2.429 2.61 2.60 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761 1.12 2.29 flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y) 25.373 25.500 13.382 1.00 1.90 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X) 7.620 3.204 3.115 2.38 2.45 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976 1.12 2.53 flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y) 28.831 29.094 14.919 0.99 1.93 ``` The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-24 08:56:23 +03:00
lve-gh	d8c2f0bcdf	Merge pull request #26884 from lve-gh:split8u_rvv_hal [HAL] split8u RVV 1.0 #26884 ### Pull Request Readiness Checklist * Banana Pi BF3 (SpacemiT K1) * Compiler: Syntacore Clang 18.1.4 (build 2024.12) ``` Geometric mean (ms) Name of Test baseline hal hal ui vs baseline ui (x-factor) split::Size_Depth_Channels::(127x61, 8UC1, 2) 0.012 0.004 3.12 split::Size_Depth_Channels::(127x61, 8UC1, 3) 0.019 0.006 2.91 split::Size_Depth_Channels::(127x61, 8UC1, 4) 0.028 0.011 2.64 split::Size_Depth_Channels::(127x61, 8UC1, 5) 0.067 0.033 2.02 split::Size_Depth_Channels::(127x61, 8UC1, 6) 0.084 0.040 2.11 split::Size_Depth_Channels::(127x61, 8UC1, 7) 0.103 0.055 1.88 split::Size_Depth_Channels::(127x61, 8UC1, 8) 0.113 0.032 3.50 split::Size_Depth_Channels::(640x480, 8UC1, 2) 0.454 0.179 2.54 split::Size_Depth_Channels::(640x480, 8UC1, 3) 0.677 0.298 2.27 split::Size_Depth_Channels::(640x480, 8UC1, 4) 0.901 0.410 2.20 split::Size_Depth_Channels::(640x480, 8UC1, 5) 3.781 3.010 1.26 split::Size_Depth_Channels::(640x480, 8UC1, 6) 4.886 4.009 1.22 split::Size_Depth_Channels::(640x480, 8UC1, 7) 5.777 4.770 1.21 split::Size_Depth_Channels::(640x480, 8UC1, 8) 4.596 1.330 3.46 split::Size_Depth_Channels::(1280x720, 8UC1, 2) 1.377 0.709 1.94 split::Size_Depth_Channels::(1280x720, 8UC1, 3) 2.091 1.034 2.02 split::Size_Depth_Channels::(1280x720, 8UC1, 4) 2.744 1.573 1.74 split::Size_Depth_Channels::(1280x720, 8UC1, 5) 9.542 6.284 1.52 split::Size_Depth_Channels::(1280x720, 8UC1, 6) 11.114 7.850 1.42 split::Size_Depth_Channels::(1280x720, 8UC1, 7) 14.083 11.879 1.19 split::Size_Depth_Channels::(1280x720, 8UC1, 8) 13.524 3.865 3.50 split::Size_Depth_Channels::(1920x1080, 8UC1, 2) 3.108 1.395 2.23 split::Size_Depth_Channels::(1920x1080, 8UC1, 3) 4.659 2.128 2.19 split::Size_Depth_Channels::(1920x1080, 8UC1, 4) 6.127 2.818 2.17 split::Size_Depth_Channels::(1920x1080, 8UC1, 5) 26.733 16.625 1.61 split::Size_Depth_Channels::(1920x1080, 8UC1, 6) 31.242 22.414 1.39 split::Size_Depth_Channels::(1920x1080, 8UC1, 7) 35.968 27.658 1.30 split::Size_Depth_Channels::(1920x1080, 8UC1, 8) 29.997 8.655 3.47 ``` See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2025-02-11 17:57:05 +03:00
天音あめ	2e909c38dc	Merge pull request #26804 from amane-ame:norm_hal_rvv Add RISC-V HAL implementation for cv::norm and cv::normalize #26804 This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`. `cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this. Tested on MUSE-PI for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="Norm" $ opencv_perf_core --gtest_filter="norm" --perf_min_samples=300 --perf_force_samples=300 ``` The head of the perf table is shown below since the table is too long. View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf) <img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-02-06 19:34:54 +03:00
天音あめ	13b2caffe0	Merge pull request #26789 from amane-ame:minmax_hal_rvv Add RISC-V HAL implementation for minMaxIdx #26789 On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV). `1d701d1690/modules/core/src/minmax.cpp (L209-L214)` This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel. Tested on MUSE-PI for both gcc 14.2 and clang 20.0. ``` $ opencv_test_core --gtest_filter="MinMaxLoc" $ opencv_perf_core --gtest_filter="minMaxLoc" ``` <img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" /> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2025-01-31 14:26:49 +03:00
Horror Proton	86241653a7	Add RISC-V HAL implementation for cv::phase	2025-01-29 12:07:59 +08:00
Liutong HAN	3fbaad36d7	Merge pull request #26624 from hanliutong:rvv-mean Add RISC-V HAL implementation for meanStdDev #26624 `meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`. This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`. This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1. ``` $ opencv_test_core --gtest_filter="MeanStdDev" $ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev* ``` ![1734077611879](https://github.com/user-attachments/assets/71c85c9d-1db1-470d-81d1-bf546e27ad86) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-12-18 22:19:02 +03:00
Liutong HAN	8a36f119ce	Add the HAL implementation for the merge function on RISC-V Vector	2024-09-29 13:39:53 +00:00
Maxim Milashchenko	786726719f	Merge pull request #25793 from MaximMilashchenko:hal_rvv Fixed build error hal_rvv_071 #25793 Fixed bug with enabling vector header when vector extension is disabled (RVV=OFF) in hal_rvv_071	2024-06-28 09:00:16 +03:00
Maxim Milashchenko	adcb070396	Merge pull request #25307 from MaximMilashchenko:halrvv071 * added hal for cv_hal_cvtBGRtoBGR rvv 0.7.1	2024-06-06 15:31:59 +03:00

19 Commits