Rewrite Universal Intrinsic code: features2d and calib3d module. #24301
The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.
This is the modification to the features2d module and calib3d module.
Test with clang 16 and QEMU v7.0.0. `AP3P.ctheta1p_nan_23607` failed beacuse of a small calculation error. But this patch does not touch the relevant code, and this error always reproduce on QEMU, regardless of whether the patch is applied or not. I think we can ignore it
```
[ RUN ] AP3P.ctheta1p_nan_23607
/home/hanliutong/project/opencv/modules/calib3d/test/test_solvepnp_ransac.cpp:2319: Failure
Expected: (cvtest::norm(res.colRange(0, 2), expected, NORM_INF)) <= (3e-16), actual: 3.33067e-16 vs 3e-16
[ FAILED ] AP3P.ctheta1p_nan_23607 (26 ms)
...
[==========] 148 tests from 64 test cases ran. (1147114 ms total)
[ PASSED ] 147 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] AP3P.ctheta1p_nan_23607
```
Note: There are 2 test cases failed with GCC 13.2.1 without this patch, seems like there are someting wrong with RVV part on GCC.
```
[----------] Global test environment tear-down
[==========] 148 tests from 64 test cases ran. (1511399 ms total)
[ PASSED ] 146 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] Calib3d_StereoSGBM.regression
[ FAILED ] Calib3d_StereoSGBM_HH4.regression
```
The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter).
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
[GSoC] New universal intrinsic backend for RVV
* Add new rvv backend (partially implemented).
* Modify the framework of Universal Intrinsic.
* Add CV_SIMD macro guards to current UI code.
* Use vlanes() instead of nlanes.
* Modify the UI test.
* Enable the new RVV (scalable) backend.
* Remove whitespace.
* Rename and some others modify.
* Update intrin.hpp but still not work on AVX/SSE
* Update conditional compilation macros.
* Use static variable for vlanes.
* Use max_nlanes for array defining.
* use hasSIMD128 rather than calling checkHardwareSupport
* add SIMD check in spartialgradient.cpp
* add SIMD check in stereosgbm.cpp
* add SIMD check in canny.cpp
With a test image set of 2800x1400 bytes on a Intel Core i7 5960X this improves runtime of MODE_HH with about 10%. (this particular replaced code segment is approx 3 times faster than the non-SSE2 variant). I was able to reduce runtime by 130 ms by this simple fix.
The second part of the SSE2 optimized part could probably be optimized further by using shift SSE2 operations, but I imagine this would improve performance 10-20 ms at best.
IPP_VERSION_MAJOR * 100 + IPP_VERSION_MINOR*10 + IPP_VERSION_UPDATE
to manage changes between updates more easily.
IPP_DISABLE_BLOCK was added to ease tracking of disabled IPP functions;
New mode is approximately 2-3 times faster than MODE_SGBM
with minimal degradation in quality and uses universal
HAL intrinsics. A performance test was added. The accuracy
test was updated to support the new mode.
IPP can be switched on and off on runtime;
Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;
TS modifications for implementations control;