Supported ONNX Squeeze, ReduceL2 and Eltwise::DIV
* Support eltwise div
* Fix test
* OpenCL support added
* refactoring
* fix code style
* Only squeeze with axes supported
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).
* Adding NEON code back in for non 64-bit platforms.
* Remove floats from post processing.
Clarify stereoRectify() doc
The function stereoRectify() takes as input a coordinate transform between two cameras. It is ambiguous how it goes. I clarified that it goes from the second camera to the first.
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above
* Fix 32-bit build
* Fixed calling convention of cleanup callback
* WINAPI changed to NTAPI
* Use proper guard macro
* Vectorize flipHoriz and flipVert functions.
* Change v_load_mirror_1 to use vec_revb for VSX
* Only use vec_revb in ISA3.0
* Removing vec_revb code since some of the older compilers don't fully support it.
* Use new v_reverse intrinsic and cleanup code.
* Ensure there are no alignment issues with copies
* Doc bugfix
The documentation page StereoBinaryBM and StereoBinarySGBM says that it returns a disparity that is scaled multiplied by 16. This scaling must be undone before calling reprojectImageTo3D, otherwise the results are wrong. The function reprojectImageTo3D() could do this scaling internally, maybe, but at least the documentation must explain that this has to be done.
* calib3d: update reprojectImageTo3D documentation
* calib3d: add StereoBM/StereoSGBM into notes list
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
Fixing bug with comparison of v_int64x2 or v_uint64x2
* Casting v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.
* Fix bad merge.
* Fix equal comparsion for non-SSE4.1. Add test cases for v_int64x2 comparisons.
* Try to fix merge conflict.
* Only test v_int64x2 comparisons if CV_SIMD_64F
* Fix compiler warning.
* G-API: Doxygen documentatation for Async API
* G-API: Doxygen documentatation for Async API
- renamed local variable (reading parameter async) async ->
asyncNumReq in object_detection DNN sample
to avoid Doxygen erroneous linking the sample to cv::gapi::wip::async
documentation
* G-API-NG/Streaming: Introduced a Streaming API
Now a GComputation can be compiled in a special "streaming" way
and then "played" on a video stream.
Currently only VideoCapture is supported as an input source.
* G-API-NG/Streaming: added threading & real streaming
* G-API-NG/Streaming: Added tests & docs on Copy kernel
- Added very simple pipeline tests, not all data types are covered yet
(in fact, only GMat is tested now);
- Started testing non-OCV backends in the streaming mode;
- Added required fixes to Fluid backend, likely it works OK now;
- Added required fixes to OCL backend, and now it is likely broken
- Also added a UMat-based (OCL) version of Copy kernel
* G-API-NG/Streaming: Added own concurrent queue class
- Used only if TBB is not available
* G-API-NG/Streaming: Fixing various issues
- Added missing header to CMakeLists.txt
- Fixed various CI issues and warnings
* G-API-NG/Streaming: Fixed a compile-time GScalar queue deadlock
- GStreamingExecutor blindly created island's input queues for
compile-time (value-initialized) GScalars which didn't have any
producers, making island actor threads wait there forever
* G-API-NG/Streaming: Dropped own version of Copy kernel
One was added into master already
* G-API-NG/Streaming: Addressed GArray<T> review comments
- Added tests on mov()
- Removed unnecessary changes in garray.hpp
* G-API-NG/Streaming: Added Doxygen comments to new public APIs
Also fixed some other comments in the code
* G-API-NG/Streaming: Removed debug info, added some comments & renamed vars
* G-API-NG/Streaming: Fixed own-vs-cv abstraction leak
- Now every island is triggered with own:: (instead of cv::)
data objects as inputs;
- Changes in Fluid backend required to support cv::Mat/Scalar were
reverted;
* G-API-NG/Streaming: use holds_alternative<> instead of index/index_of test
- Also fixed regression test comments
- Also added metadata check comments for GStreamingCompiled
* G-API-NG/Streaming: Made start()/stop() more robust
- Fixed various possible deadlocks
- Unified the shutdown code
- Added more tests covering different corner cases on start/stop
* G-API-NG/Streaming: Finally fixed Windows crashes
In fact the problem hasn't been Windows-only.
Island thread popped data from queues without preserving the Cmd
objects and without taking the ownership over data acquired so when
islands started to process the data, this data may be already freed.
Linux version worked only by occasion.
* G-API-NG/Streaming: Fixed (I hope so) Windows warnings
* G-API-NG/Streaming: fixed typos in internal comments
- Also added some more explanation on Streaming/OpenCL status
* G-API-NG/Streaming: Added more unit tests on streaming
- Various start()/stop()/setSource() call flow combinations
* G-API-NG/Streaming: Added tests on own concurrent bounded queue
* G-API-NG/Streaming: Added more tests on various data types, + more
- Vector/Scalar passed as input;
- Vector/Scalar passed in-between islands;
- Some more assertions;
- Also fixed a deadlock problem when inputs are mixed (1 constant, 1 stream)
* G-API-NG/Streaming: Added tests on output data types handling
- Vector
- Scalar
* G-API-NG/Streaming: Fixed test issues with IE + Windows warnings
* G-API-NG/Streaming: Decoupled G-API from videoio
- Now the core G-API doesn't use a cv::VideoCapture directly,
it comes in via an abstract interface;
- Polished a little bit the setSource()/start()/stop() semantics,
now setSource() is mandatory before ANY call to start().
* G-API-NG/Streaming: Fix STANDALONE build (errors brought by render)
* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles
* Adding support for vectorized masking for uchar/ushort.
* Fixing bug where mask was zeroing the dst. Improved the way to calculate
the mask and tweaked for further performance improvements.
* Fixing mask comparison test.
* Restricting to one channel.
* Adding support for 3 channels, switch old approach to start using HAL's
v_select.
* Cuda + OpenGL on ARM
There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations.
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.
* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
* core: rework and optimize SIMD implementation of dotProd
- add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
- add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
- fix clang build on ppc64le
- support wide universal intrinsics for dotProd_32s
- remove raw SIMD and activate universal intrinsics for dotProd_8
- implement SIMD optimization for dotProd_s16&u16
- extend performance test data types of dotprod
- fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
- optimize v_mul_expand(int32) on VSX
* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast
this changes made depend on "terfendail" review
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
> Size parameter is changed from int to cv::Size type to allow rectangle kernels
> Kernel creation code is adopted for different kernel sizes to not create only white images on the output