Vectorize minMaxIdx functions
* Updated documentation and intrinsic tests for v_reduce
* Add other files back in from the forced push
* Prevent an constant overflow with v_reduce for int8 type
* Another alternative to fix constant overflow warning.
* Fix another compiler warning.
* Update comments and change comparison form to be consistent with other vectorized loops.
* Change return type of v_reduce_min & max for v_uint8 and v_uint16 to be same as lane type.
* Cast v_reduce functions to int to avoid overflow. Reduce number of parameters in MINMAXIDX_REDUCE macro.
* Restore cast type for v_reduce_min & max to LaneType
Fix implicit conversion from array to scalar in python bindings
* Fix wrong conversion behavior for primitive types
- Introduce ArgTypeInfo namedtuple instead of plain tuple.
If strict conversion parameter for type is set to true, it is
handled like object argument in PyArg_ParseTupleAndKeywords and
converted to concrete type with the appropriate pyopencv_to function
call.
- Remove deadcode and unused variables.
- Fix implicit conversion from numpy array with 1 element to scalar
- Fix narrowing conversion to size_t type.
* Fix wrong conversion behavior for primitive types
- Introduce ArgTypeInfo namedtuple instead of plain tuple.
If strict conversion parameter for type is set to true, it is
handled like object argument in PyArg_ParseTupleAndKeywords and
converted to concrete type with the appropriate pyopencv_to function
call.
- Remove deadcode and unused variables.
- Fix implicit conversion from numpy array with 1 element to scalar
- Fix narrowing conversion to size_t type.·
- Enable tests with wrong conversion behavior
- Restrict passing None as value
- Restrict bool to integer/floating types conversion
* Add PyIntType support for Python 2
* Remove possible narrowing conversion of size_t
* Bindings conversion update
- Remove unused macro
- Add better conversion for types to numpy types descriptors
- Add argument name to fail messages
- NoneType treated as a valid argument. Better handling will be added
as a standalone patch
* Add descriptor specialization for size_t
* Add check for signed to unsigned integer conversion safety
- If signed integer is positive it can be safely converted
to unsigned
- Add check for plain python 2 objects
- Add check for numpy scalars
- Add simple type_traits implementation for better code style
* Resolve type "overflow" false negative in safe casting check
- Move type_traits to separate header
* Add copyright message to type_traits.hpp
* Limit conversion scope for integral numpy types
- Made canBeSafelyCasted specialized only for size_t, so
type_traits header became unused and was removed.
- Added clarification about descriptor pointer
Add lightweight IE hardware targets checks
nGraph: Concat with paddings
Enable more nGraph tests
Restore FP32->FP16 for GPU plugin of IE
try to fix buildbot
Use lightweight IE targets check only starts from R4
- some of `icvCvt_BGR*` functions have R with B channels
swapped what leads to the wrong conversion
- renames misleading `rgb` variable name to `bgr`
- swap back the conversion coefficients, `cB` should be the first
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Actually, we can do this in constant time. xofs always
contains same or increasing offset values. We can instead
find the most extreme value used and never attempt to load it.
Similarly, we can note for all dx >= 0 and dx < (dwidth - cn)
where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn).
Thus, we can use this to control our loop termination optimally.
This fixes#16137 with little or no performance impact. I have
also added a debug check as a sanity check.
* Handle det == 0 in findCircle3pts.
Issue 16051 shows a case where findCircle3pts returns NaN for the
center coordinates and radius due to dividing by a determinant of 0. In
this case, the points are colinear, so the longest distance between any
2 points is the diameter of the minimum enclosing circle.
* imgproc(test): update test checks for minEnclosingCircle()
* imgproc: fix handling of special cases in minEnclosingCircle()
* Eltwise::DIV support in Halide backend
* fix typo
* remove div from generated test suite to pass CI, switching to manual test...
* ensure divisor not near to zero
* use randu
* dnn(test): update test data for Eltwise.Accuracy/DIV layer test
Add checks for empty operands in Matrix expressions that don't check properly
* Starting to add checks for empty operands in Matrix expressions that
don't check properly.
* Adding checks and delcarations for checker functions
* Fix signatures and add checks for each class of Matrix Expr operation
* Make it catch the right exception
* Don't expose helper functions to public API
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization
The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.
The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.
Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.
This should resolve#16137
* imgproc(resize): extra check
dnn(eltwise): fix handling of different number of channels
* dnn(test): reproducer for Eltwise layer issue from PR16063
* dnn(eltwise): rework support for inputs with different channels
* dnn(eltwise): get rid of finalize(), variableChannels
* dnn(eltwise): update input sorting by number of channels
- do not swap inputs if number of channels are same after truncation
* dnn(test): skip "shortcut" with batch size 2 on MYRIAD targets
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
Test create custom layer in python
* check is contiguos
* Add custom layer test
* Fix test
* Remove assert
* Move assert to pyopencv dnn
* remove assert
* Add unregister
* Fix python2
* proto to bytearray
* Fix data type
* calib3d: use normalized input in solvePnPGeneric()
* calib3d: java regression test for solvePnPGeneric
* calib3d: python regression test for solvePnPGeneric
* core: disable invalid constructors in C API by default
- C API objects will lose their default initializers through constructors
* samples: stop using of C API
(1/4) Revert "Correct image borders and principal point computation in cv::stereoRectify"
This reverts commit 93ff1fb2f2.
(2/4) Revert "fix calib3d changes in 6836 plus some others"
This reverts commit fa42a1cfc2.
(3/4) Revert "fix compiler warning"
This reverts commit b3d55489d3.
(4/4) Revert "add test for 6836"
This reverts commit d06b8c4ea9.
Tests for argument conversion of Python bindings generator
* Tests for parsing elemental types from Python bindings
- Add positive and negative tests for int, float, double, size_t,
const char*, bool.
- Tests with wrong conversion behavior are skipped.
* Move implicit conversion of bool to integer/floating types to wrong
conversion behavior.
* Fix infinite loop when trying to change state of the busy camera
- Add finite number of attempts in tryIoctl functions
10 by default.
* Introduced new flag for ioctl call to handle EBUSY
Improving VSX performance of integral function
* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.
* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.
* Adding SSE/NEON/AVX intrinsics.
* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.
* core(simd): add tests for v_extract_n/v_broadcast_element
- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations
* core(simd): fix compilation
- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only
* cleanup
Supported ONNX Squeeze, ReduceL2 and Eltwise::DIV
* Support eltwise div
* Fix test
* OpenCL support added
* refactoring
* fix code style
* Only squeeze with axes supported
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).
* Adding NEON code back in for non 64-bit platforms.
* Remove floats from post processing.
Clarify stereoRectify() doc
The function stereoRectify() takes as input a coordinate transform between two cameras. It is ambiguous how it goes. I clarified that it goes from the second camera to the first.
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above
* Fix 32-bit build
* Fixed calling convention of cleanup callback
* WINAPI changed to NTAPI
* Use proper guard macro
* Vectorize flipHoriz and flipVert functions.
* Change v_load_mirror_1 to use vec_revb for VSX
* Only use vec_revb in ISA3.0
* Removing vec_revb code since some of the older compilers don't fully support it.
* Use new v_reverse intrinsic and cleanup code.
* Ensure there are no alignment issues with copies
* Doc bugfix
The documentation page StereoBinaryBM and StereoBinarySGBM says that it returns a disparity that is scaled multiplied by 16. This scaling must be undone before calling reprojectImageTo3D, otherwise the results are wrong. The function reprojectImageTo3D() could do this scaling internally, maybe, but at least the documentation must explain that this has to be done.
* calib3d: update reprojectImageTo3D documentation
* calib3d: add StereoBM/StereoSGBM into notes list
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)