Improve performance on Arm64
* Improve performance on Apple silicon
This patch will
- Enable dot product intrinsics for macOS arm64 builds
- Enable for macOS arm64 builds
- Improve HAL primitives
- reduction (sum, min, max, sad)
- signmask
- mul_expand
- check_any / check_all
Results on a M1 Macbook Pro
* Updates to #20011 based on feedback
- Removes Apple Silicon specific workarounds
- Makes #ifdef sections smaller for v_mul_expand cases
- Moves dot product optimization to compiler optimization check
- Adds 4x4 matrix transpose optimization
* Remove dotprod and fix v_transpose
Based on the latest, we've removed dotprod entirely and will revisit in a future PR.
Added explicit cats with v_transpose4x4()
This should resolve all opens with this PR
* Remove commented out lines
Remove two extraneous comments
* Adding functions rbegin() and rend() functions to matrix class.
This is important to be more standard compliant with C++ and an ever increasing number of people using standard algorithms for better code readability- and maintainability.
The functions are copy pated from their counterparts (even though they should probably call the counterparts but this gave me some troube).
They return iterators using std::reverse_iterators
Follow up of an open feature request:
https://github.com/opencv/opencv/issues/4641
* Fix rbegin() and rend() and provide tests for them
* Removing unnecessary whitespaces
* Adding rbegin and rend to Mat_ class with the right parameters so we don't need to repeat the template argument.
An instantiating cv::Mat_<int> for example can call it's rbegin() function and doesn't need rbegin<int>() with this convience addition.
Follows what is done for forward iterators
* static cast the vector size (return size_t) to an int (that is required for opencv mat constructor)
Co-authored-by: Stefan <stefan.gerl@tum.de>
* implements https://github.com/opencv/opencv/issues/19147
* CAUTION: this PR will only functions safely in the
4+ branches that already include PR 19029
* CAUTION: this PR requires thread-safe startup of the alloc.cpp
translation unit as implemented in PR 19029
Ordinary quaternion
* version 1.0
* add assumeUnit;
add UnitTest;
check boundary value;
fix the func using method: func(obj);
fix 4x4;
add rodrigues vector transformation;
fix mat to quat;
* fix blank and tab
* fix blank and tab
modify test;cpp to hpp
* mainly improve comment;
add rvec2Quat;fix toRodrigues;
fix throw to CV_Error
* fix bug of quatd * int;
combine hpp and cpp;
fix << overload error in win system;
modify include in test file;
* move implementation to quaternion.ini.hpp;
change some constructor to createFrom* function;
change Rodrigues vector to rotation vector;
change the matexpr to mat of 3x3 return type;
improve comments;
* try fix log function error in win
* add enums for assumeUnit;
improve docs;
add using std::cos funcs
* remove using std::* from header;
add std::* in affine.hpp,warpers_inl.hpp;
* quat: coding style
* quat: AssumeType => QuatAssumeType
- OpenCL kernel cleanup processing is asynchronous and can be called even after forced clFinish()
- buffers are released later in asynchronous mode
- silence these false positive cases for asynchronous cleanup