Improve performance on Arm64
* Improve performance on Apple silicon
This patch will
- Enable dot product intrinsics for macOS arm64 builds
- Enable for macOS arm64 builds
- Improve HAL primitives
- reduction (sum, min, max, sad)
- signmask
- mul_expand
- check_any / check_all
Results on a M1 Macbook Pro
* Updates to #20011 based on feedback
- Removes Apple Silicon specific workarounds
- Makes #ifdef sections smaller for v_mul_expand cases
- Moves dot product optimization to compiler optimization check
- Adds 4x4 matrix transpose optimization
* Remove dotprod and fix v_transpose
Based on the latest, we've removed dotprod entirely and will revisit in a future PR.
Added explicit cats with v_transpose4x4()
This should resolve all opens with this PR
* Remove commented out lines
Remove two extraneous comments
* Add Neon optimised RGB2Lab conversion
* Fix compile errors, change lambda to macro
* Change NEON optimised RGB2Lab to just use HAL
* Change [] to v_extract_n in RGB2Lab
* RGB2LAB Code quality, change to nlane agnostic
* Change RGB2Lab to use function rather than macro
* Remove whitespace
Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>
- Added missing documentation for the CALIB_FIX_FOCAL_LENGTH flag
- Removed erroneous information about the number of distortion coefficients
returned
- Added some missing @ref tags
Fix unsigned int bug in computeECC
* address issue with unsigned ints in computeEcc
* remove additional logic checking firstOctave
* use swap instead of same src/dst
* simplify the unsigned check logic
fix a build warning:
```
C:\Slave\workspace\precommit\windows10\opencv\modules\photo\src\contrast_preserve.hpp(289): warning C4244: '=': conversion from 'double' to '_Tp', possible loss of data
with
[
_Tp=float
]
C:\Slave\workspace\precommit\windows10\opencv\modules\photo\src\contrast_preserve.hpp(361): warning C4244: '=': conversion from 'double' to '_Tp', possible loss of data
with
[
_Tp=float
]
```
(from https://build.opencv.org.cn/job/precommit/job/windows10/1633/console)
Currently, the LOADER_DIR is set as os.path.dirname(os.path.abspath(__file__)). This does not point to the true library path if the cv2 folder is symlinked into the Python package directory such that importing cv2 under Python fails. The proposed change only resolves symbolic links correctly by calling os.path.realpath(__file__) first and does not change anything if __file__ contains no symbolic link.
Fix bug with predictions in RTrees/Boost
* address bug where predict functions with invalid feature count in rtrees/boost models
* compact matrix rep in tests
* check 1..n-1 and n+1 in feature size validation test
Fix Single ThresholdBug in Simple Blob Detector
* address bug with using min dist between blobs in blob detector
cast type in comparison and remove docs
address bug with using min dist between blobs in blob detector
use scalar instead of int
address bug with using min dist between blobs in blob detector
* fix namespace and formatting
Also bring perf_imgproc CornerMinEigenVal accuracy requirements in line with
the test_imgproc accuracy requirements on that test and fix indentation on
the latter.
Partially addresses issue #9821
This commit passes the parameter maxIters that represent
the maximum number of iterations, that can be passed to findFundamentalMat
to the method LMeDS.
This parameter were added to the function findFundamentalMat and
were passed just for the RANSAC method, but should be passed to
both methods to be consistent.
The MinEigenVal path through the corner.cl kernel makes use of native_sqrt,
a math builtin function which has implementation defined accuracy.
Partially addresses issue #9821
* Aligned OpenCV DNN and TF sum op behaviour
Support Mat (shape: [1, m, k, n] ) + Vec (shape: [1, 1, 1, n]) operation
by vec to mat expansion
* Added code corrections: backend, minor refactoring
Added OpenVINO ARM target
* Added IE ARM target
* Added OpenVINO ARM target
* Delete ARM target
* Detect ARM platform
* Changed device name in ArmPlugin
* Change ARM detection
Init params (StereoBMParams) in StereoBMImpl constructor initialization list
* Init StereoBMImpl in initialization list
To improve preformence it is better to init the params (StereoBMImpl) in the
initialization list.
* coding style
* drop useless copy/move ctor
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* Updated cpp reference implementations for a few intrinsics to address wide universal intrinsics as well
* Updated cpp reference implementations for a few more universal intrinsics
* Update polynom_solver.cpp
This pull request is in the response to Issue #19526. I have fixed the problem with the cube root calculation of 2*R. The Issue was in the usage of pow function with negative values of R, but if it is calculated for only positive values of R then changing x0 according to the parity of R, the Issue is resolved. Kindly consider it, Thanks!
* add cv::cubeRoot(double)
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
- to reduce binaries size of FFmpeg Windows wrapper
- MinGW linker doesn't support -ffunction-sections (used for FFmpeg Windows wrapper)
- move code to improve locality with its used dependencies
- move UMat::dot() to matmul.dispatch.cpp (Mat::dot() is already there)
- move UMat::inv() to lapack.cpp
- move UMat::mul() to arithm.cpp
- move UMat:eye() to matrix_operations.cpp (near setIdentity() implementation)
- move normalize(): convert_scale.cpp => norm.cpp
- move convertAndUnrollScalar(): arithm.cpp => copy.cpp
- move scalarToRawData(): array.cpp => copy.cpp
- move transpose(): matrix_operations.cpp => matrix_transform.cpp
- move flip(), rotate(): copy.cpp => matrix_transform.cpp (rotate90 uses flip and transpose)
- add 'OPENCV_CORE_EXCLUDE_C_API' CMake variable to exclude compilation of C-API functions from the core module
- matrix_wrap.cpp: add compile-time checks for CUDA/OpenGL calls
- the steps above allow to reduce FFmpeg wrapper size for ~1.5Mb (initial size of OpenCV part is about 3Mb)
backport is done to improve merge experience (less conflicts)
backport of commit: 65eb946756
* Fixed OCL implementation of pyrlk
If prevPts size is (N, 1) (which is a default layout for converting `vector<Point2f>` to `UMat`) the `prevPts.cols == 1` and optical flow will be calculated for the first point only.
Getting `prevPts.total()` as in line 1048 is the correct way to get points count.
* fixed compilation warning (size_t to int)
Signed-off-by: Sergey Krivohatskiy <s.krivohatskiy@gmail.com>