Commit Graph

74 Commits

Author SHA1 Message Date
Alexander Smorkalov
daa8f7dfc6 Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
HAN Liutong
07bf9cb013
Merge pull request #24325 from hanliutong:rewrite
Rewrite Universal Intrinsic code: float related part #24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
#23885 First patch, an example
#23980 Core module
#24058 ImgProc module, part 1
#24132 ImgProc module, part 2
#24166 ImgProc module, part 3
#24301 Features2d and calib3d module
#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-10-05 17:57:25 +03:00
HAN Liutong
0dd7769bb1
Merge pull request #23980 from hanliutong:rewrite-core
Rewrite Universal Intrinsic code by using new API: Core module. #23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for #23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-08-11 08:33:33 +03:00
yuki takehara
a6277370ca
Merge pull request #21107 from take1014:remove_assert_21038
resolves #21038

* remove C assert

* revert C header

* fix several points in review

* fix test_ds.cpp
2021-11-27 18:34:52 +00:00
Alexander Alekhin
cbfd38bd41 core: rework code locality
- to reduce binaries size of FFmpeg Windows wrapper
- MinGW linker doesn't support -ffunction-sections (used for FFmpeg Windows wrapper)
- move code to improve locality with its used dependencies
- move UMat::dot() to matmul.dispatch.cpp (Mat::dot() is already there)
- move UMat::inv() to lapack.cpp
- move UMat::mul() to arithm.cpp
- move UMat:eye() to matrix_operations.cpp (near setIdentity() implementation)
- move normalize(): convert_scale.cpp => norm.cpp
- move convertAndUnrollScalar(): arithm.cpp => copy.cpp
- move scalarToRawData(): array.cpp => copy.cpp
- move transpose(): matrix_operations.cpp => matrix_transform.cpp
- move flip(), rotate(): copy.cpp => matrix_transform.cpp (rotate90 uses flip and transpose)
- add 'OPENCV_CORE_EXCLUDE_C_API' CMake variable to exclude compilation of C-API functions from the core module
- matrix_wrap.cpp: add compile-time checks for CUDA/OpenGL calls
- the steps above allow to reduce FFmpeg wrapper size for ~1.5Mb (initial size of OpenCV part is about 3Mb)

backport is done to improve merge experience (less conflicts)
backport of commit: 65eb946756
2021-03-02 23:24:28 +00:00
Alexander Alekhin
e180cc050b
Merge pull request #16236 from alalek:fix_core_simd_emulator
* core: fix intrin_cpp, allow to build modules with SIMD emulator

* core(arithm): fix v_zero initialization

* core(simd): 'strict' types for binary/bitwise operations

* features2d: avoid aligned load issue in GCC 5.4 with emulated SIMD

* core(simd): alignment checks in SIMD emulator
2020-01-10 21:31:02 +03:00
Vitaly Tuzov
43d9256096 Replaced core module calls to universal intrinsics with wide universal intrinsics 2018-10-15 11:46:45 +03:00
Vitaly Tuzov
283348afc3 SSE2 code in invert() replaced with universal intrinsics 2018-10-02 12:47:07 +03:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
Hamdi Sahloul
03b3be0f51 MSVC: Slience external/meaningless warnings 2018-09-12 20:02:13 +09:00
Alexander Alekhin
5385086fef core: solve(): add check for passed 'method' values 2018-07-13 15:15:48 +03:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
woody.chow
611cf8d86f Use Eigen::SelfAdjointEigenSolver in cv::eigen 2017-12-05 02:40:55 +03:00
Tomoaki Teshima
fd711219a2 use universal intrinsic in VBLAS
- brush up v_reduce_sum of SSE version
2017-01-31 05:36:27 +09:00
Vladislav Sovrasov
dfe4519c07 Add QR decomposition to HAL 2016-09-05 18:20:04 +03:00
Pavel Vlasov
30a6cee2fe Instrumentation for OpenCV API regions and IPP functions; 2016-08-19 18:10:03 +03:00
Vladislav Sovrasov
a2d0cc878c Implement internal HAL for GEMM and matrix decompositions 2016-06-03 10:38:30 +03:00
Maksim Shabunin
84f37d352f HAL moved back to core 2015-12-17 12:33:23 +03:00
Vadim Pisarevsky
d2aaa70e93 removed HAL calls from public OpenCV headers; put IPP calls back to hall:sqrt() and such (but they are disabled for now) 2015-05-22 16:04:10 +03:00
Vadim Pisarevsky
9fbd1d68ad refactored div & pow funcs; added tests for special cases in pow() function.
fixed http://code.opencv.org/issues/3935
possibly fixed http://code.opencv.org/issues/3594
2015-05-01 21:49:11 +03:00
Vadim Pisarevsky
7918267d02 fixed U non-orthogonality in SVD (http://code.opencv.org/issues/3801) 2015-04-29 16:09:58 +03:00
Vadim Pisarevsky
ee11a2d266 fully implemented SSE and NEON cases of intrin.hpp; extended the HAL with some basic math functions 2015-04-16 23:00:26 +03:00
Dmitry-Me
8ed4bae4dd Reduce variable scope, make formatting consistent with surrounding code 2015-03-14 12:50:42 +03:00
Adil Ibragimov
8a4a1bb018 Several type of formal refactoring:
1. someMatrix.data -> someMatrix.prt()
2. someMatrix.data + someMatrix.step * lineIndex -> someMatrix.ptr( lineIndex )
3. (SomeType*) someMatrix.data -> someMatrix.ptr<SomeType>()
4. someMatrix.data -> !someMatrix.empty() ( or !someMatrix.data -> someMatrix.empty() ) in logical expressions
2014-08-13 15:21:35 +04:00
Adil Ibragimov
98d5731ad8 some formal changes (generally adding constness) 2014-08-07 15:49:14 +04:00
Vadim Pisarevsky
ba3783d205 initial commit; ml has been refactored; it compiles and the tests run well; some other modules, apps and samples do not compile; to be fixed 2014-07-29 23:54:23 +04:00
Roman Donchenko
2c4bbb313c Merge commit '43aec5ad' into merge-2.4
Conflicts:
	cmake/OpenCVConfig.cmake
	cmake/OpenCVLegacyOptions.cmake
	modules/contrib/src/retina.cpp
	modules/gpu/doc/camera_calibration_and_3d_reconstruction.rst
	modules/gpu/doc/video.rst
	modules/gpu/src/speckle_filtering.cpp
	modules/python/src2/cv2.cv.hpp
	modules/python/test/test2.py
	samples/python/watershed.py
2013-08-27 13:26:44 +04:00
Roman Donchenko
e9a28f66ee Normalized file endings. 2013-08-21 18:59:25 +04:00
Andrey Kamaev
e27f4da9c6 Merge pull request #795 from taka-no-me:move_imgproc_utils_to_core 2013-04-11 11:35:15 +04:00
Andrey Kamaev
c98c246fc2 Move border type constants and Moments class to core module 2013-04-10 19:14:24 +04:00
Andrey Kamaev
b0e6606b98 Cleanup core module API
* Drop some low level API
* Remove outdated overloads
* Utilize Input/OutputArray
2013-04-09 13:36:32 +04:00
Andrey Kamaev
67073daf19 Merge branch '2.4' 2013-04-05 21:11:59 +04:00
Andrey Kamaev
235a678458 SVD: always update W vector for better algorithm convergency 2013-04-04 13:55:36 +04:00
Andrey Kamaev
715fa3303e Move cv::Mat out of core.hpp 2013-04-01 15:24:34 +04:00
Andrey Kamaev
1ca8f33b4e Merge branch '2.4' 2013-03-21 23:11:54 +04:00
Vadim Pisarevsky
9a86245242 added test for bug #1448 and hopefully fixes the bug #2898 2013-03-20 11:58:19 +04:00
Andrey Kamaev
55698548dd Avoid assert in lapac.cpp if findHomography fails in BestOf2NearestMatcher::match 2013-03-12 22:49:40 +04:00
Andrey Kamaev
ab221e94c0 Fix invert under MSVC 2013-02-26 11:16:57 +04:00
Vadim Pisarevsky
416432a8e5 replaced tabs with spaces 2013-02-25 23:10:38 +04:00
Vadim Pisarevsky
087537463d attempt to make the ultimate fix for the failure in Core_Invert.small 2013-02-25 22:46:30 +04:00
Vadim Pisarevsky
b57e801c04 now invert 3x3 on "bad" matrices works well on Windows 2012-11-28 23:05:51 +04:00
Vadim Pisarevsky
9163471987 improved accuracy of 3x3 invert on poorly-conditioned matrices (bug #2525) 2012-11-08 14:09:43 +04:00
OpenCV Buildbot
04384a71e4 Normalize line endings and whitespace 2012-10-17 15:32:23 +04:00
Vadim Pisarevsky
4b5f948307 added SSE2-optimized 3x3 invert by Grigoriy Frolov 2012-08-07 17:59:52 +04:00
Vadim Pisarevsky
fac3d9994c integrated another portion of SSE optimizations from Grigory Frolov 2012-07-31 19:07:55 +04:00
Vadim Pisarevsky
b782d8bb53 integrated patch with some SSE2/SSE4.2 optimizations from Grigory Frolov 2012-07-24 17:24:31 +04:00
Vadim Pisarevsky
82cb2ab556 fixed bug in SVD, ticket #2027; fixed building highgui with ffmpeg support on MacOSX 2012-06-28 19:45:13 +00:00
Andrey Kamaev
3108423a37 Fixed assert placement in cv::invert 2012-05-23 09:28:26 +00:00
Victoria Zhislina
fbdb93ec79 CV_ENABLE_UNROLLED 2012-02-10 06:05:04 +00:00
Vadim Pisarevsky
dbfa8408d2 fixed potential bug in cv::eigen() 2012-01-26 19:41:59 +00:00