* Fix integer overflow in cv::Luv2RGBinteger::process.
For LL=49, uu=205, vv=23, we end up with x=7373056 and y=458
which overflows y*x.
* imgproc(test): adjust test parameters to cover SIMD code
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load
In case of very small negative h (e.g. -1e-40), with the current implementation,
you will go through the first condition and end up with h = 6.f, and will miss
the second condition.
issue #20617 addresses lack of warnings on
seamlessClone() function when src is None.
This commit adds source check using CV_Assert
therefore debugging would be easier.
Signed-off-by: nickjackolson <metedurlu@gmail.com>
Add a warning message using CV_LOG__WARNING().
This way api behaviour is preserved. Outputs are
the same but user gets an extra warning in case
fopen() fails to access image file for some reason.
This would help new users and also debugging
complex apps which use imread()
Signed-off-by: nickjackolson <metedurlu@gmail.com>
QR code (encoding process)
* add qrcode encoder
* qr encoder fixes
* qr encoder: fix api and realization
* fixed qr encoder, added eci and kanji modes
* trigger CI
* qr encoder constructor fixes
Co-authored-by: APrigarina <ann73617@gmail.com>
* dnn(ocl4dnn): fix LRN layer accuracy problems
- FP16 intermediate computation is not accurate and may provide NaN values
* dnn(test): update tolerance for FP16
1. Code uses PPC_FEATURE_HAS_VSX, but it's not checked similarly to
PPC_FEATURE2_ARCH_3_00 and PPC_FEATURE2_ARCH_3_00 for availability. FreeBSD has
those macros in machine/cpu.h, but I went with the way chosen for
PPC_FEATURE2_ARCH_3_00 and PPC_FEATURE2_ARCH_3_00. Other than that, FreeBSD also
has sys/auxv.h and that's where elf_aux_info() is defined.
2. getauxval() is actually Linux-only, but code checked for __unix__. It won't
work on all UNIX, so change it back to __linux__. Add another code variant
strictly for FreeBSD.
3. Update comment. This commit adds code for FreeBSD, but recently there
appeared support for powerpc64 in OpenBSD.
fix bug: wrong output dimension when "keep_dims" is false in pooling layer.
* fix bug in max layer
* code align
* delete permute layer and add test case
* add name assert
* check other cases
* remove c++11 features
* style:add "const" remove assert
* style:sanitize file names
* Fix gst error handling
* Use the return value instead of the error, which gives no guarantee of being NULL in case of error
* Test err pointer before accessing it
* Remove unreachable code
* videoio(gstreamer): restore check in writer code
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Fix ORB integer overflow
* set size_t step to fix integer overflow in ptr0 offset
* added issue_537 test
* minor fix tags, points
* added size_t_step and offset to remove mixed unsigned and signed operations
* features2d: update ORB checks
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* dnn: fix unaligned memory access crash on armv7
The getTensorContent function would return a Mat pointing to some
member of a Protobuf-encoded message. Protobuf does not make any
alignment guarantees, which results in a crash on armv7 when loading
models while bit 2 is set in /proc/cpu/alignment (or the relevant
kernel feature for alignment compatibility is disabled). Any read
attempt from the previously unaligned data member would send SIGBUS.
As workaround, this commit makes an aligned copy via existing clone
functionality in getTensorContent. The unsafe copy=false option is
removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact
writes(!) to the Protobuf message. We limit ourselves to fixing the
alignment issues in this commit, and add getTensorContentRefUnaligned
to cover the write case with a safe memcpy. A FIXME marks the issue.
* dnn: reduce amount of .clone() calls
* dnn: update FIXME comment
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* Prefix global javascript functions with sub-namespaces
* js: handle 'namespace_prefix_override', update filtering
- avoid functions override with same name but different namespace
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* Add RowVec_8u32f
* Fix build errors in Linux x64 Debug and armeabi-v7a
* Reformat code to make it more clean and conventional
* Optimise with vx_load_expand_q()
This submission is used to improve the performance of the inpaint algorithm for 3 channels images(RGB or BGR).
Reason:
The original algorithm implementation did not consider the cache hits.
The loop of channels is outside the core loop, so the perfmance is not very good.
Moving the channel loop inside the core loop can significantly improve cache hits, thereby improving performance.
Performance:
360P, about >= 30% improvement
iphone8P: 5.52ms -> 3.75ms
iphone6s: 14.04ms -> 9.15ms
different paddings in cvtColorTwoPlane() for biplane YUV420
* Different paddings support in cvtColorTwoPlane() for biplane YUV420
* Build fix for dispatch case.
* Resoted old behaviour for y.step==uv.step to exclude perf regressions.
Co-authored-by: amir.tulegenov <amir.tulegenov@xperience.ai>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
Add support for YOLOv4x-mish
* backport to 3.4 for supporting yolov4x-mish
* add YOLOv4x-mish test
* address review comments
Co-authored-by: Guo Xu <guoxu@1school.com.cn>
Add Normalize subgraph, fix Slice, Mul and Expand
* Add Normalize subgraph, support for starts<0 and axis<0 in Slice, Mul broadcasting in the middle and fix Expand's unsqueeze
* remove todos
* remove range-based for loop
* address review comments
* change >> to > > in template
* fix indexation
* fix expand that does nothing
* support PPSeg model for dnn module
* fixed README for CI
* add test case
* fixed bug
* deal with comments
* rm dnn_model_runner
* update test case
* fixed bug for testcase
* update testcase
`PyObject*` to `std::vector<T>` conversion logic:
- If user passed Numpy Array
- If array is planar and T is a primitive type (doesn't require
constructor call) that matches with the element type of array, then
copy element one by one with the respect of the step between array
elements. If compiler is lucky (or brave enough) copy loop can be
vectorized.
For classes that require constructor calls this path is not
possible, because we can't begin an object lifetime without hacks.
- Otherwise fall-back to general case
- Otherwise - execute the general case:
If PyObject* corresponds to Sequence protocol - iterate over the
sequence elements and invoke the appropriate `pyopencv_to` function.
`std::vector<T>` to `PyObject*` conversion logic:
- If `std::vector<T>` is empty - return empty tuple.
- If `T` has a corresponding `Mat` `DataType` than return
Numpy array instance of the matching `dtype` e.g.
`std::vector<cv::Rect>` is returned as `np.ndarray` of shape `Nx4` and
`dtype=int`.
This branch helps to optimize further evaluations in user code.
- Otherwise - execute the general case:
Construct a tuple of length N = `std::vector::size` and insert
elements one by one.
Unnecessary functions were removed and code was rearranged to allow
compiler select the appropriate conversion function specialization.
* VideoCapture timeout set/get
* Common formatting for enum values
* Fix enum values wrongly in videoio.hpp
* Define timeout enum values in public api and align with master
Add Python's test for LSTM layer
* Add Python's test for LSTM layer
* Set different test threshold for FP16 target
* rename test to test_input_3d
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
Support non-zero hidden state for LSTM
* fully support non-zero hidden state for LSTM
* check dims of hidden state for LSTM
* fix failed test Test_Model.TextRecognition
* add new tests for LSTM w/ non-zero hidden params
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
Improves support for Unix non-Linux systems, including QNX
* Fixes#20395. Improves support for Unix non-Linux systems. Focus on QNX Neutrino.
Signed-off-by: promero <promero@mathworks.com>
* Update system.cpp
There can be an int overflow.
cv::norm( InputArray _src, int normType, InputArray _mask ) is fine,
not cv::norm( InputArray _src1, InputArray _src2, int normType, InputArray _mask ).
Update rotatedRectangleIntersection function to calculate near to origin
* Change type used in points function from RotatedRect
In the function that sets the points of a RotatedRect, the types
should be double in order to keep the precision when dealing with
RotatedRects that are defined far from the origin.
This commit solves the problem in some assertions from
rotatedRectangleIntersection when dealing with rectangles far from
origin.
* added proper type casts
* Update rotatedRectangleIntersection function to calculate near to origin
This commit changes the rotatedRectangleIntersection function in order
to calculate the intersection of two rectangles considering that they
are shifted near the coordinates origin (0, 0).
This commit solves the problem in some assertions from
rotatedRectangleIntersection when dealing with rectangles far from
origin.
* Revert type changes in types.cpp and adequate code to c++98
* Revert unnecessary casts on types.cpp
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
This commit adds the feature of selecting the thickness
of the matches drawn by the drawMatches function.
In larger images, the default thickness of 1 pixel creates images
that are hard to visualize.
Improve performance on Arm64
* Improve performance on Apple silicon
This patch will
- Enable dot product intrinsics for macOS arm64 builds
- Enable for macOS arm64 builds
- Improve HAL primitives
- reduction (sum, min, max, sad)
- signmask
- mul_expand
- check_any / check_all
Results on a M1 Macbook Pro
* Updates to #20011 based on feedback
- Removes Apple Silicon specific workarounds
- Makes #ifdef sections smaller for v_mul_expand cases
- Moves dot product optimization to compiler optimization check
- Adds 4x4 matrix transpose optimization
* Remove dotprod and fix v_transpose
Based on the latest, we've removed dotprod entirely and will revisit in a future PR.
Added explicit cats with v_transpose4x4()
This should resolve all opens with this PR
* Remove commented out lines
Remove two extraneous comments
* Add Neon optimised RGB2Lab conversion
* Fix compile errors, change lambda to macro
* Change NEON optimised RGB2Lab to just use HAL
* Change [] to v_extract_n in RGB2Lab
* RGB2LAB Code quality, change to nlane agnostic
* Change RGB2Lab to use function rather than macro
* Remove whitespace
Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>
- Added missing documentation for the CALIB_FIX_FOCAL_LENGTH flag
- Removed erroneous information about the number of distortion coefficients
returned
- Added some missing @ref tags
Fix unsigned int bug in computeECC
* address issue with unsigned ints in computeEcc
* remove additional logic checking firstOctave
* use swap instead of same src/dst
* simplify the unsigned check logic
fix a build warning:
```
C:\Slave\workspace\precommit\windows10\opencv\modules\photo\src\contrast_preserve.hpp(289): warning C4244: '=': conversion from 'double' to '_Tp', possible loss of data
with
[
_Tp=float
]
C:\Slave\workspace\precommit\windows10\opencv\modules\photo\src\contrast_preserve.hpp(361): warning C4244: '=': conversion from 'double' to '_Tp', possible loss of data
with
[
_Tp=float
]
```
(from https://build.opencv.org.cn/job/precommit/job/windows10/1633/console)
Currently, the LOADER_DIR is set as os.path.dirname(os.path.abspath(__file__)). This does not point to the true library path if the cv2 folder is symlinked into the Python package directory such that importing cv2 under Python fails. The proposed change only resolves symbolic links correctly by calling os.path.realpath(__file__) first and does not change anything if __file__ contains no symbolic link.
Fix bug with predictions in RTrees/Boost
* address bug where predict functions with invalid feature count in rtrees/boost models
* compact matrix rep in tests
* check 1..n-1 and n+1 in feature size validation test
Fix Single ThresholdBug in Simple Blob Detector
* address bug with using min dist between blobs in blob detector
cast type in comparison and remove docs
address bug with using min dist between blobs in blob detector
use scalar instead of int
address bug with using min dist between blobs in blob detector
* fix namespace and formatting
Also bring perf_imgproc CornerMinEigenVal accuracy requirements in line with
the test_imgproc accuracy requirements on that test and fix indentation on
the latter.
Partially addresses issue #9821
This commit passes the parameter maxIters that represent
the maximum number of iterations, that can be passed to findFundamentalMat
to the method LMeDS.
This parameter were added to the function findFundamentalMat and
were passed just for the RANSAC method, but should be passed to
both methods to be consistent.
The MinEigenVal path through the corner.cl kernel makes use of native_sqrt,
a math builtin function which has implementation defined accuracy.
Partially addresses issue #9821
* Aligned OpenCV DNN and TF sum op behaviour
Support Mat (shape: [1, m, k, n] ) + Vec (shape: [1, 1, 1, n]) operation
by vec to mat expansion
* Added code corrections: backend, minor refactoring
Added OpenVINO ARM target
* Added IE ARM target
* Added OpenVINO ARM target
* Delete ARM target
* Detect ARM platform
* Changed device name in ArmPlugin
* Change ARM detection