opencv

mirror of https://github.com/opencv/opencv.git synced 2024-11-24 11:10:21 +08:00

History

HAN Liutong 0dd7769bb1 Merge pull request #23980 from hanliutong:rewrite-core Rewrite Universal Intrinsic code by using new API: Core module. #23980 The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API. The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885. Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are: 1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited - ./modules/core/src/stat.simd.hpp - ./modules/core/src/matrix_transform.cpp - ./modules/core/src/matmul.simd.hpp 2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly. - ./modules/core/src/mathfuncs_core.simd.hpp ```cpp struct v_atan_f32 { explicit v_atan_f32(const float& scale) { ... } v_float32 compute(const v_float32& y, const v_float32& x) { ... } ... v_float32 val90; // sizeless type can not used in a class v_float32 val180; v_float32 val360; v_float32 s; }; ``` 3. The API interface does not support/does not match - ./modules/core/src/norm.cpp Use `v_popcount`, ~~waiting for #23966~~ Fixed - ./modules/core/src/has_non_zero.simd.hpp Use illegal Universal Intrinsic API: For float type, there is no logical operation `\|`. Further discussion needed ```cpp /** @brief Bitwise OR Only for integer types. / template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator\|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator\|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); ``` ```cpp #if CV_SIMD typedef v_float32 v_type; const v_type v_zero = vx_setzero_f32(); constexpr const int unrollCount = 8; int step = v_type::nlanes unrollCount; int len0 = len & -step; const float* srcSimdEnd = src+len0; int countSIMD = static_cast<int>((srcSimdEnd-src)/step); while(!res && countSIMD--) { v_type v0 = vx_load(src); src += v_type::nlanes; v_type v1 = vx_load(src); src += v_type::nlanes; .... src += v_type::nlanes; v0 \|= v1; //Illegal ? .... //res = v_check_any(((v0 \| v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ res = !v_check_all(((v0 \| v4) == v_zero)); } v_cleanup(); #endif ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake		2023-08-11 08:33:33 +03:00
..
calib3d	Merge pull request #24035 from vrabaud:calibration	2023-07-27 19:36:33 +03:00
core	Merge pull request #23980 from hanliutong:rewrite-core	2023-08-11 08:33:33 +03:00
dnn	Merge pull request #24122 from fengyuentau:remove_tengine	2023-08-09 09:26:02 +03:00
features2d	Merge remote-tracking branch 'origin/3.4' into merge-3.4	2023-05-24 14:37:48 +03:00
flann	Merge pull request #24028 from VadimLevin:dev/vlevin/fix-flann-python-bindings	2023-07-21 12:44:56 +03:00
gapi	Merge pull request #24059 from TolyaTalamanov:at/add-onnx-cuda-execution-provider	2023-08-02 14:13:07 +03:00
highgui	Use OpenCV logging instead of std::cerr.	2023-07-19 10:49:54 +03:00
imgcodecs	Use OpenCV logging instead of std::cerr.	2023-07-19 10:49:54 +03:00
imgproc	Merge pull request #24042 from vrabaud:circle	2023-07-26 20:00:22 +03:00
java	build: w/a compiler warnings for GCC 11-12 and Clang 13, reduce build output	2023-07-10 11:27:59 +03:00
js	if browser supports wasm but only asm.js path provided use asm.js as fallback	2023-06-17 09:38:57 +03:00
ml	Merge remote-tracking branch 'origin/3.4' into merge-3.4	2023-04-21 10:55:04 +03:00
objc	Backport 5.x: Support for module names that start from digit in ObjC bindings generator.	2023-05-25 11:45:59 +03:00
objdetect	update aruco bytesList docs	2023-07-13 13:50:07 +03:00
photo	Deprecated convertTypeStr and made new variant that also takes the buffer size	2023-04-26 09:48:15 -04:00
python	cuda: Fix GpuMat::copyTo and GpuMat::converTo python bindings	2023-08-01 15:09:37 +03:00
stitching	Merge pull request #23740 from Peekabooc:4.x	2023-06-09 13:40:02 +03:00
ts	cuda: add SkipTestException handling	2023-07-17 18:03:40 +03:00
video	Warning supression fix for XCode 13.1 and newer. Backport #23203	2023-02-06 11:12:05 +03:00
videoio	Merge pull request #24133 from alexlyulkov:al/fixed-msmf-webcam	2023-08-10 11:48:38 +03:00
world	cmake: VERSION_GREATER_EQUAL is not supported in CMake 3.5.1	2022-12-26 17:41:53 +00:00