opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-22 12:17:04 +08:00

History

GenshinImpactStarts 2090407002 Merge pull request #26999 from GenshinImpactStarts:polar_to_cart [HAL RVV] unify and impl polar_to_cart \| add perf test #26999 ### Summary 1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces. 2. Add `polarToCart` performance tests 3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar` 4. To achieve the 3rd point, the original implementation was moved, and some modifications were made. Tested through: ```sh opencv_test_core --gtest_filter="PolarToCart:Core_CartPolar_reverse" opencv_perf_core --gtest_filter="PolarToCart" --perf_min_samples=300 --perf_force_samples=300 ``` ### HAL performance test *UPDATE: Current implementation is no more depending on vlen. NOTE: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`. Vlen 256 (Muse Pi): ``` Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44 ``` ### Effect of Point 4 To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`). #### Reason for Changes: This function works as follows: $y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`. Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation. UPDATE*: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method. #### Test Result `scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. `scalar` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13 ``` `ui` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72 ``` AVX2 test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51 ``` A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake		2025-03-17 14:16:09 +03:00
..
cuda	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
opencl	Merge pull request #26115 from savuor:rv/flip_ocl_dtypes	2024-09-06 08:26:00 +03:00
perf_abs.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_addWeighted.cpp	Merge pull request #12411 from vpisarev:wide_convert	2018-09-06 19:36:59 +03:00
perf_allocation.cpp	Merge pull request #23109 from seanm:misc-warnings	2023-10-06 13:33:21 +03:00
perf_arithm.cpp	impl exp and log \| add log perf test	2025-03-07 17:11:26 +00:00
perf_bitwise.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_compare.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_convertTo.cpp	Merge pull request #12411 from vpisarev:wide_convert	2018-09-06 19:36:59 +03:00
perf_cvround.cpp	fast_math: add extra perf/unit tests	2019-08-07 14:59:46 -05:00
perf_dft.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_dot.cpp	Merge pull request #15510 from seiko2plus:issue15506	2019-10-07 22:01:35 +03:00
perf_flip.cpp	Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv	2025-02-24 08:56:23 +03:00
perf_inRange.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_io_base64.cpp	core: disable I/O perf test	2019-02-27 18:07:45 +03:00
perf_lut.cpp	Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv	2025-03-06 11:17:00 +03:00
perf_main.cpp	Merge pull request #11897 from Jakub-Golinowski:hpx_backend	2018-08-31 16:23:26 +03:00
perf_mat.cpp	Utilize CV_UNUSED macro	2018-09-07 20:33:52 +09:00
perf_math.cpp	Merge pull request #26999 from GenshinImpactStarts:polar_to_cart	2025-03-17 14:16:09 +03:00
perf_merge.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_minmaxloc.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_norm.cpp	Merge pull request #27042 from fengyuentau:4x/core/normDiff_simd	2025-03-12 16:43:10 +03:00
perf_precomp.hpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_reduce.cpp	Remove useless C headers	2025-01-13 16:34:28 +01:00
perf_sort.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00
perf_split.cpp	Merge pull request #12437 from vpisarev:avx2_fixes	2018-09-06 18:56:55 +03:00
perf_stat.cpp	Merge pull request #22947 from chacha21:hasNonZero	2023-06-09 13:37:20 +03:00
perf_umat.cpp	ts: refactor OpenCV tests	2018-02-03 19:39:47 +00:00