Commit Graph

119 Commits

Author SHA1 Message Date
HAN Liutong
0dd7769bb1
Merge pull request #23980 from hanliutong:rewrite-core
Rewrite Universal Intrinsic code by using new API: Core module. #23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for #23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-08-11 08:33:33 +03:00
Alexander Alekhin
1339ebaa84 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-03-26 16:00:28 +00:00
Maksim Shabunin
593996216f cartToPolar/polarToCart: disable inplace mode 2022-03-21 16:06:12 +03:00
Alexander Alekhin
cbfd38bd41 core: rework code locality
- to reduce binaries size of FFmpeg Windows wrapper
- MinGW linker doesn't support -ffunction-sections (used for FFmpeg Windows wrapper)
- move code to improve locality with its used dependencies
- move UMat::dot() to matmul.dispatch.cpp (Mat::dot() is already there)
- move UMat::inv() to lapack.cpp
- move UMat::mul() to arithm.cpp
- move UMat:eye() to matrix_operations.cpp (near setIdentity() implementation)
- move normalize(): convert_scale.cpp => norm.cpp
- move convertAndUnrollScalar(): arithm.cpp => copy.cpp
- move scalarToRawData(): array.cpp => copy.cpp
- move transpose(): matrix_operations.cpp => matrix_transform.cpp
- move flip(), rotate(): copy.cpp => matrix_transform.cpp (rotate90 uses flip and transpose)
- add 'OPENCV_CORE_EXCLUDE_C_API' CMake variable to exclude compilation of C-API functions from the core module
- matrix_wrap.cpp: add compile-time checks for CUDA/OpenGL calls
- the steps above allow to reduce FFmpeg wrapper size for ~1.5Mb (initial size of OpenCV part is about 3Mb)

backport is done to improve merge experience (less conflicts)
backport of commit: 65eb946756
2021-03-02 23:24:28 +00:00
Alexander Alekhin
65eb946756 core: rework code locality
- to reduce binaries size of FFmpeg Windows wrapper
- MinGW linker doesn't support -ffunction-sections (used for FFmpeg Windows wrapper)
- move code to improve locality with its used dependencies
- move UMat::dot() to matmul.dispatch.cpp (Mat::dot() is already there)
- move UMat::inv() to lapack.cpp
- move UMat::mul() to arithm.cpp
- move UMat:eye() to matrix_operations.cpp (near setIdentity() implementation)
- move normalize(): convert_scale.cpp => norm.cpp
- move convertAndUnrollScalar(): arithm.cpp => copy.cpp
- move scalarToRawData(): array.cpp => copy.cpp
- move transpose(): matrix_operations.cpp => matrix_transform.cpp
- move flip(), rotate(): copy.cpp => matrix_transform.cpp (rotate90 uses flip and transpose)
- add 'OPENCV_CORE_EXCLUDE_C_API' CMake variable to exclude compilation of C-API functions from the core module
- matrix_wrap.cpp: add compile-time checks for CUDA/OpenGL calls
- the steps above allow to reduce FFmpeg wrapper size for ~1.5Mb (initial size of OpenCV part is about 3Mb)
2021-03-02 11:27:58 +00:00
Nikita Shulga
ec37364762 Use std::atomic in getExpTab32f and getLogTab32f
Reads and writes to volatile bool are not guaranteed to be atomic.
2019-10-07 16:35:07 -07:00
Alexander Alekhin
858a7da5c0 core: rework getContinuousSize() for vector-col/row support 2018-11-10 11:08:28 +00:00
Alexander Alekhin
f185640eda
Merge pull request #12799 from alalek:update_build_js
* js: update build script

- support emscipten 1.38.12 (wasm is ON by default)
- verbose build messages

* js: use builtin Math functions

* js: disable tracing code completelly
2018-10-15 17:35:21 +03:00
Alexander Alekhin
72eccb7694 Merge pull request #12825 from alalek:issue_8413_3.4 2018-10-15 14:23:21 +00:00
Vitaly Tuzov
43d9256096 Replaced core module calls to universal intrinsics with wide universal intrinsics 2018-10-15 11:46:45 +03:00
Alexander Alekhin
c813ad5533 core(ocl): replace ambiguous 'depth' to 'DEPTH_dst'
- always pass DEPTH_dst value to core/arithm kernel
2018-10-14 02:18:04 +00:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
Alexander Alekhin
acce95f446 backport fixes for static analyzer warnings
Commits:
- 09837928d9
- 10fb88d027

Excluded changes with std::atomic (C++98 requirement)
2018-09-04 16:49:42 +03:00
Alexander Alekhin
5b3ac112fe core: move const tables outside of dispatched code
To avoid duplicates in binaries
2018-08-08 17:54:54 +03:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
Tomoaki Teshima
2a781bb616 remove raw SSE2/NEON implementation from mathfuncs.cpp
* replace the implementation by universal intrinsic
  * make sure no degradation happens on ARM platform
2017-10-15 00:24:31 +09:00
Pavel Vlasov
11c2ffaf1c Update for IPP for OpenCV 2017u2 integration;
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
2017-04-25 15:53:12 +03:00
Pavel Vlasov
30a6cee2fe Instrumentation for OpenCV API regions and IPP functions; 2016-08-19 18:10:03 +03:00
Pavel Vlasov
3860b8db02 IPP was enabled in mathfuncs_core;
Exp and Log IPP implementations are changed to hal interface;
2016-08-12 18:16:04 +03:00
Maksim Shabunin
1e667de1f3 HAL math interfaces: fastAtan2, magnitude, sqrt, invSqrt, log, exp 2016-05-31 11:54:52 +03:00
Maksim Shabunin
84f37d352f HAL moved back to core 2015-12-17 12:33:23 +03:00
Alexander Alekhin
b26580cc7b checkRange fixes
1) fix multichannel support
2) remove useless bad_value, read value from original Mat directly
3) add more tests
4) fix docs for cvCeil and checkRange
2015-12-09 18:31:27 +03:00
Vadim Pisarevsky
d19897b734 Merge pull request #5651 from hoangviet1985:fix_solvePoly_3.0.0 2015-12-07 10:12:54 +00:00
Maksim Shabunin
ddf293a081 Merge pull request #5649 from hoangviet1985:solve_pow(x,3)=0_opencv300 2015-11-22 18:02:40 +00:00
hoangviet1985
3e96b724c2 squash 2015-11-20 15:03:32 -05:00
hoangviet1985
b96def885f squash 2015-11-20 14:48:29 -05:00
Vadim Pisarevsky
3942b1f362 Merge pull request #5340 from alalek:ocl_off 2015-11-10 16:53:36 +00:00
Maksim Shabunin
6e9d0d9a0c Visual Studio 2015 warning and test fixes 2015-10-20 12:48:37 +03:00
Alexander Alekhin
7213e5f68a ocl: correct disabling of OpenCL code 2015-09-13 20:28:23 +03:00
Vadim Pisarevsky
85149f8686 hack solvePoly to finds roots of polynoms with zero higher-order coefficients. The roots are populated in this case, which is not valid, strictly speaking, but good enough for function like correctMatches. This solves http://code.opencv.org/issues/4330 2015-05-25 23:43:39 +03:00
Vadim Pisarevsky
73f760fdf0 some more compile warnings fixed 2015-05-05 18:03:40 +03:00
Vadim Pisarevsky
931a519969 fixed warning in mathfuncs 2015-05-05 17:49:36 +03:00
Vadim Pisarevsky
9fbd1d68ad refactored div & pow funcs; added tests for special cases in pow() function.
fixed http://code.opencv.org/issues/3935
possibly fixed http://code.opencv.org/issues/3594
2015-05-01 21:49:11 +03:00
Vadim Pisarevsky
ee11a2d266 fully implemented SSE and NEON cases of intrin.hpp; extended the HAL with some basic math functions 2015-04-16 23:00:26 +03:00
Vladislav Vinogradov
cda6fed41f move tegra namespace out of cv to prevent conflicts 2015-02-27 12:52:11 +03:00
Vladislav Vinogradov
44e41baffe use new functions before all tegra:: calls 2015-02-26 19:34:58 +03:00
Ilya Lavrenov
6bce6ee34a checks 2015-01-12 10:59:31 +03:00
Ilya Lavrenov
1d3c860411 SinCos_32f 2015-01-12 10:59:31 +03:00
Ilya Lavrenov
fc0869735d used popcnt 2015-01-12 10:59:30 +03:00
Ilya Lavrenov
3a78a22733 convertScaleAbs for s8, f64 2015-01-12 10:59:29 +03:00
Ilya Lavrenov
972ff1d0c4 polarToCart 2015-01-12 10:59:28 +03:00
Ilya Lavrenov
0a5c9cf145 magnitude 64f 2015-01-12 10:59:28 +03:00
Ilya Lavrenov
6ab928fb39 phase 64f 2015-01-12 10:59:28 +03:00
Alexander Karsakov
462c3c25a9 Removed incorrect using of rootn() and powr() in ocl_pow 2014-11-06 16:23:02 +03:00
Ilya Lavrenov
5ca25ab8f0 cv::pow (integer power) 2014-11-01 13:19:51 +03:00
Ilya Lavrenov
ccdc71286c cv::polarToCart 2014-11-01 13:19:51 +03:00
Ilya Lavrenov
d5f006eee5 cv::magnitude; cv::corner** 2014-11-01 13:19:51 +03:00
Ilya Lavrenov
fb97273b3c cv::phase; cv::cartToPolar 2014-11-01 13:19:51 +03:00
Pavel Vlasov
45958eaabc Implementation detector and selector for IPP and OpenCL;
IPP can be switched on and off on runtime;

Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;

TS modifications for implementations control;
2014-10-15 14:24:41 +04:00
Ilya Lavrenov
00f16e9178 neon 2014-10-03 08:43:02 +00:00