Commit Graph

134 Commits

Author SHA1 Message Date
Yuantao Feng
3afe8ddaf8
core: Rename cv::float16_t to cv::hfloat (#25217)
* rename cv::float16_t to cv::fp16_t

* add typedef fp16_t float16_t

* remove zero(), bits() from fp16_t class

* fp16_t -> hfloat

* remove cv::float16_t::fromBits; add hfloatFromBits

* undo changes in conv_winograd_f63.simd.hpp and conv_block.simd.hpp

* undo some changes in dnn
2024-03-21 23:44:19 +03:00
Alexander Smorkalov
daa8f7dfc6 Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
Vincent Rabaud
3880d059b3
Merge pull request #24260 from vrabaud:ubsan
Fix undefined behavior arithmetic in copyMakeBorder and adjustROI. #24260

This is due to the undefined: negative int multiplied by size_t pointer increment.

To test, compile with:
```
mkdir build
cd build
cmake ../ -DCMAKE_C_FLAGS_INIT="-fsanitize=undefined" -DCMAKE_CXX_FLAGS_INIT="-fsanitize=undefined" -DCMAKE_C_COMPILER="/usr/bin/clang" -DCMAKE_CXX_COMPILER="/usr/bin/clang++" -DCMAKE_SHARED_LINKER_FLAGS="-fsanitize=undefined -lubsan"
```
And run:
```
make -j opencv_test_core && ./bin/opencv_test_core --gtest_filter=*UndefinedBehavior*
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-14 15:16:28 +03:00
HAN Liutong
0dd7769bb1
Merge pull request #23980 from hanliutong:rewrite-core
Rewrite Universal Intrinsic code by using new API: Core module. #23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for #23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-08-11 08:33:33 +03:00
Alexander Alekhin
65eb946756 core: rework code locality
- to reduce binaries size of FFmpeg Windows wrapper
- MinGW linker doesn't support -ffunction-sections (used for FFmpeg Windows wrapper)
- move code to improve locality with its used dependencies
- move UMat::dot() to matmul.dispatch.cpp (Mat::dot() is already there)
- move UMat::inv() to lapack.cpp
- move UMat::mul() to arithm.cpp
- move UMat:eye() to matrix_operations.cpp (near setIdentity() implementation)
- move normalize(): convert_scale.cpp => norm.cpp
- move convertAndUnrollScalar(): arithm.cpp => copy.cpp
- move scalarToRawData(): array.cpp => copy.cpp
- move transpose(): matrix_operations.cpp => matrix_transform.cpp
- move flip(), rotate(): copy.cpp => matrix_transform.cpp (rotate90 uses flip and transpose)
- add 'OPENCV_CORE_EXCLUDE_C_API' CMake variable to exclude compilation of C-API functions from the core module
- matrix_wrap.cpp: add compile-time checks for CUDA/OpenGL calls
- the steps above allow to reduce FFmpeg wrapper size for ~1.5Mb (initial size of OpenCV part is about 3Mb)
2021-03-02 11:27:58 +00:00
Federico Martinez
773262bc09 Fix UB in CopyMakeConstBoder_8u
Caused by overflow of arithmetic operators conversion rank
2021-02-26 19:15:50 +00:00
Or Avital
5a3a915a9b Remove unnecessary condition (will never reach) 2020-11-22 14:19:20 +02:00
nhlsm
68f527267b
Merge pull request #18080 from nhlsm:improve-mat-operator-assign-scalar
* improve Mat::operator=(Scalar)

* touch

* remove trailing whitespace

* TEST: check if old code pass test or not

* remove CV_Error

* remove warning

* fix: is -> Scalar

* 1) Mat *mat -> Mat &mat 2) return bool, add output param

* add comment
2020-08-14 17:21:23 +00:00
Vadim Pisarevsky
07b475062f
Merge pull request #16608 from vpisarev:fix_mac_ocl_tests
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)

* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code

* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)

* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
2020-02-21 16:13:41 +03:00
Alexander Alekhin
a4bd7506a5 core: CV_STRONG_ALIGNMENT macro
Should be used to guard unsafe type casts of pointers
2020-01-29 18:44:17 +03:00
Alexander Alekhin
8d22ac200f core: workaround flipHoriz() alignment issues 2019-12-19 00:05:23 +00:00
Chip Kerchner
ed7e4273cd Merge pull request #15555 from ChipKerchner:flipVectorize
* Vectorize flipHoriz and flipVert functions.

* Change v_load_mirror_1 to use vec_revb for VSX

* Only use vec_revb in ISA3.0

* Removing vec_revb code since some of the older compilers don't fully support it.

* Use new v_reverse intrinsic and cleanup code.

* Ensure there are no alignment issues with copies
2019-11-01 22:30:48 +03:00
Alexander Alekhin
6a7d1c15d3 core(ipp): skip huge input in flip()
- IPP/SSE4.2 works well
2019-10-14 18:26:19 +03:00
Suleyman TURKMEN
c0489963bb
Update copy.cpp 2019-10-07 11:59:52 +03:00
Alexander Alekhin
d6b82dcd65
Merge pull request #14162 from alalek:eliminate_coverity_scan_issues
core: eliminate coverity scan issues (#14162)

* core(hal): avoid using of r,g,b,a parameters in interleave/deinterleave

- static analysis tools blame on possible parameters reordering
- align AVX parameters with corresponding SSE/NEO/VSX/cpp code

* core: avoid "i,j" parameters in Matx methods

- static analysis tools blame on possible parameters reordering

* core: resolve coverity scan issues
2019-03-27 15:48:00 +03:00
berak
96c99c716a Merge pull request #13193 from berak:core_copyMakeBorder 2018-11-17 13:19:42 +03:00
Alexander Alekhin
858a7da5c0 core: rework getContinuousSize() for vector-col/row support 2018-11-10 11:08:28 +00:00
Alexander Alekhin
5059523937 core: fix processing of vector-rows 2018-11-08 20:04:22 +03:00
maver1
e397434cb6 Merge pull request #12877 from maver1:3.4
* Updated ICV packages and IPP integration

* core(test): minMaxIdx IPP regression test

* core(ipp): workaround minMaxIdx problem

* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun

* Returned semicolon after CV_INSTRUMENT_REGION_IPP()
2018-10-24 15:02:53 +03:00
Michał Janiszewski
c8e6ce304f Catch exceptions by const-reference
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
2018-10-16 22:43:54 +02:00
Vitaly Tuzov
43d9256096 Replaced core module calls to universal intrinsics with wide universal intrinsics 2018-10-15 11:46:45 +03:00
Hamdi Sahloul
ecc9bd0925 Support GpuMat in copyTo() functions 2018-09-17 23:43:14 +09:00
Hamdi Sahloul
5d54def264 Add semicolons after CV_INSTRUMENT macros 2018-09-14 06:45:31 +09:00
cyy
8b48c2a10c Merge pull request #12443 from DEEPIR:master
* simplify condition

* dims must > 0 or latter sz[dims-1] will underflow
2018-09-06 23:09:39 +03:00
Alexander Alekhin
acce95f446 backport fixes for static analyzer warnings
Commits:
- 09837928d9
- 10fb88d027

Excluded changes with std::atomic (C++98 requirement)
2018-09-04 16:49:42 +03:00
Maksim Shabunin
1165fdd0f5 Added more strict checks for empty inputs to compare, meanStdDev and RNG::fill 2018-07-26 18:06:38 +03:00
Maksim Shabunin
c473718bc2 Check for empty Mat in compare, operator= and RNG::fill, fixed related tests 2018-07-17 17:50:50 +03:00
Alexander Alekhin
b09a4a98d4 opencv: Use cv::AutoBuffer<>::data() 2018-07-04 19:11:29 +03:00
yuki takehara
4fe648b15c Merge pull request #11706 from take1014:setTo_Nan_10507
* setTo_#10507

* setTo_Nan_10507

* setTo: update check / test for NaNs
2018-06-12 18:05:44 +00:00
Vadim Pisarevsky
7d19bd6c19 Merge pull request #11634 from vpisarev:empty_mat_with_types_2
fixes handling of empty matrices in some functions (#11634)

* a part of PR #11416 by Yuki Takehara

* moved the empty mat check in Mat::copyTo()

* fixed some test failures
2018-05-31 16:36:39 +00:00
Alexander Alekhin
65726e4244 core(hal): improve v_select() SSE4.1+
v_select 'mask' is restricted to these values only: 0 or ~0 (0xff/0xffff/etc)
mask in accuracy test is updated.
2018-04-23 13:17:53 +03:00
Vitaly Tuzov
ccd16f107d Fixed IPP based implementation of setTo() for infinity value 2018-04-04 16:05:22 +03:00
Sayed Adel
fd0ac962fb core: replace raw intrinsics with universal intrinsics in copy.cpp
- use universal intrinsic instead of raw intrinsic
- add performance check for Mat::copyTo/setTo with mask
2017-12-26 05:30:32 +02:00
Alexander Alekhin
62ed6cdc74 core: fix copyTo(with mask) dst initialization 2017-12-12 18:40:13 +03:00
Vadim Pisarevsky
f4136679ea Merge pull request #9551 from ChristofKaufmann:MultiChannelMask 2017-09-18 09:28:34 +00:00
Pavel Vlasov
37ab318657 Compatibility improvement with old IPP versions (tested on 9.0.1);
Manual IPP dispatcher simplification;
2017-09-08 11:08:24 +03:00
Christof Kaufmann
46a668c565 Add multi-channel mask support to mean, meanStdDev and setTo
This adds the possibility to use multi-channel masks for the functions
cv::mean, cv::meanStdDev and the method Mat::setTo. The tests have now a
probability to use multi-channel masks for operations that support them.
This also includes Mat::copyTo, which supported multi-channel masks
before, but there was no test confirming this.
2017-09-04 19:40:27 +02:00
Pavel Vlasov
a57718e1ac ICV2017u3 package update;
- Optimizations set change. Now IPP integrations will provide code for SSE42, AVX2 and AVX512 (SKX) CPUs only. For HW below SSE42 IPP code is disabled.
- Performance regressions fixes for IPP code paths;
- cv::boxFilter integration improvement;
- cv::filter2D integration improvement;
2017-08-23 14:24:43 +03:00
Maksim Shabunin
a769d69a9d Fixed several issues found by static analysis 2017-06-28 18:06:18 +03:00
Maksim Shabunin
32d4af36e2 Fixing some static analysis issues 2017-06-27 14:30:26 +03:00
Vadim Pisarevsky
ef2e5a9f82 Merge pull request #8988 from sovrasov:repeat_src_eq_dst_fix 2017-06-26 21:58:26 +00:00
Alexander Alekhin
006966e629 trace: initial support for code trace 2017-06-26 17:07:13 +03:00
Vladislav Sovrasov
4f9871817a core: forbid handling of the case when src=dst in cv::repeat 2017-06-26 14:02:52 +03:00
Maksim Shabunin
b04ed5956e Fixed several issues found by static analysis in core module 2017-05-23 12:35:31 +03:00
Pavel Vlasov
11c2ffaf1c Update for IPP for OpenCV 2017u2 integration;
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
2017-04-25 15:53:12 +03:00
Pavel Vlasov
35c7216846 IPP for OpenCV 2017u2 initial enabling patch; 2017-04-20 20:26:30 +03:00
Tetragramm
24379fcb5f Use transpose() as suggested, because it works on pre-existing destination Mats. 2016-11-10 21:35:00 -06:00
Tetragramm
ad5c50a923 Improve the efficiency as suggested by vpisarev.
Alter the Rotation enum to be unambiguous as to direction.
2016-11-02 17:44:13 -05:00
Tetragramm
6f7bf653f7 Add 90 degree rotation methods. This provides a quick simple way of doing 90 degree rotations.
Also fix warnings that show up on other compilers in test builds.
2016-10-22 12:48:52 -05:00
Vadim Pisarevsky
83f2eb79f1 make sure that the empty mat is copied to UMat properly - i.e. UMat becomes empty. Before the patch such copy operation crashed 2016-10-05 14:07:50 +03:00