* Add RowVec_8u32f
* Fix build errors in Linux x64 Debug and armeabi-v7a
* Reformat code to make it more clean and conventional
* Optimise with vx_load_expand_q()
This submission is used to improve the performance of the inpaint algorithm for 3 channels images(RGB or BGR).
Reason:
The original algorithm implementation did not consider the cache hits.
The loop of channels is outside the core loop, so the perfmance is not very good.
Moving the channel loop inside the core loop can significantly improve cache hits, thereby improving performance.
Performance:
360P, about >= 30% improvement
iphone8P: 5.52ms -> 3.75ms
iphone6s: 14.04ms -> 9.15ms
different paddings in cvtColorTwoPlane() for biplane YUV420
* Different paddings support in cvtColorTwoPlane() for biplane YUV420
* Build fix for dispatch case.
* Resoted old behaviour for y.step==uv.step to exclude perf regressions.
Co-authored-by: amir.tulegenov <amir.tulegenov@xperience.ai>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
Add support for YOLOv4x-mish
* backport to 3.4 for supporting yolov4x-mish
* add YOLOv4x-mish test
* address review comments
Co-authored-by: Guo Xu <guoxu@1school.com.cn>
Add Normalize subgraph, fix Slice, Mul and Expand
* Add Normalize subgraph, support for starts<0 and axis<0 in Slice, Mul broadcasting in the middle and fix Expand's unsqueeze
* remove todos
* remove range-based for loop
* address review comments
* change >> to > > in template
* fix indexation
* fix expand that does nothing
* support PPSeg model for dnn module
* fixed README for CI
* add test case
* fixed bug
* deal with comments
* rm dnn_model_runner
* update test case
* fixed bug for testcase
* update testcase
`PyObject*` to `std::vector<T>` conversion logic:
- If user passed Numpy Array
- If array is planar and T is a primitive type (doesn't require
constructor call) that matches with the element type of array, then
copy element one by one with the respect of the step between array
elements. If compiler is lucky (or brave enough) copy loop can be
vectorized.
For classes that require constructor calls this path is not
possible, because we can't begin an object lifetime without hacks.
- Otherwise fall-back to general case
- Otherwise - execute the general case:
If PyObject* corresponds to Sequence protocol - iterate over the
sequence elements and invoke the appropriate `pyopencv_to` function.
`std::vector<T>` to `PyObject*` conversion logic:
- If `std::vector<T>` is empty - return empty tuple.
- If `T` has a corresponding `Mat` `DataType` than return
Numpy array instance of the matching `dtype` e.g.
`std::vector<cv::Rect>` is returned as `np.ndarray` of shape `Nx4` and
`dtype=int`.
This branch helps to optimize further evaluations in user code.
- Otherwise - execute the general case:
Construct a tuple of length N = `std::vector::size` and insert
elements one by one.
Unnecessary functions were removed and code was rearranged to allow
compiler select the appropriate conversion function specialization.
* VideoCapture timeout set/get
* Common formatting for enum values
* Fix enum values wrongly in videoio.hpp
* Define timeout enum values in public api and align with master
Add Python's test for LSTM layer
* Add Python's test for LSTM layer
* Set different test threshold for FP16 target
* rename test to test_input_3d
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
Support non-zero hidden state for LSTM
* fully support non-zero hidden state for LSTM
* check dims of hidden state for LSTM
* fix failed test Test_Model.TextRecognition
* add new tests for LSTM w/ non-zero hidden params
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
Improves support for Unix non-Linux systems, including QNX
* Fixes#20395. Improves support for Unix non-Linux systems. Focus on QNX Neutrino.
Signed-off-by: promero <promero@mathworks.com>
* Update system.cpp
There can be an int overflow.
cv::norm( InputArray _src, int normType, InputArray _mask ) is fine,
not cv::norm( InputArray _src1, InputArray _src2, int normType, InputArray _mask ).
Update rotatedRectangleIntersection function to calculate near to origin
* Change type used in points function from RotatedRect
In the function that sets the points of a RotatedRect, the types
should be double in order to keep the precision when dealing with
RotatedRects that are defined far from the origin.
This commit solves the problem in some assertions from
rotatedRectangleIntersection when dealing with rectangles far from
origin.
* added proper type casts
* Update rotatedRectangleIntersection function to calculate near to origin
This commit changes the rotatedRectangleIntersection function in order
to calculate the intersection of two rectangles considering that they
are shifted near the coordinates origin (0, 0).
This commit solves the problem in some assertions from
rotatedRectangleIntersection when dealing with rectangles far from
origin.
* Revert type changes in types.cpp and adequate code to c++98
* Revert unnecessary casts on types.cpp
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
This commit adds the feature of selecting the thickness
of the matches drawn by the drawMatches function.
In larger images, the default thickness of 1 pixel creates images
that are hard to visualize.
Improve performance on Arm64
* Improve performance on Apple silicon
This patch will
- Enable dot product intrinsics for macOS arm64 builds
- Enable for macOS arm64 builds
- Improve HAL primitives
- reduction (sum, min, max, sad)
- signmask
- mul_expand
- check_any / check_all
Results on a M1 Macbook Pro
* Updates to #20011 based on feedback
- Removes Apple Silicon specific workarounds
- Makes #ifdef sections smaller for v_mul_expand cases
- Moves dot product optimization to compiler optimization check
- Adds 4x4 matrix transpose optimization
* Remove dotprod and fix v_transpose
Based on the latest, we've removed dotprod entirely and will revisit in a future PR.
Added explicit cats with v_transpose4x4()
This should resolve all opens with this PR
* Remove commented out lines
Remove two extraneous comments
* Add Neon optimised RGB2Lab conversion
* Fix compile errors, change lambda to macro
* Change NEON optimised RGB2Lab to just use HAL
* Change [] to v_extract_n in RGB2Lab
* RGB2LAB Code quality, change to nlane agnostic
* Change RGB2Lab to use function rather than macro
* Remove whitespace
Co-authored-by: Francesco Petrogalli <25690309+fpetrogalli@users.noreply.github.com>