* add cv::compare test when Mat type == CV_16F
* add assertion in cv::compare when src.depth() == CV_16F
* cv::compare assertion minor fix
* core: add more checks
* enable tests for DNN_TARGET_CUDA_FP16
* disable deconvolution tests
* disable shortcut tests
* fix typos and some minor changes
* dnn(test): skip CUDA FP16 test too (run_pool_max)
* Handle det == 0 in findCircle3pts.
Issue 16051 shows a case where findCircle3pts returns NaN for the
center coordinates and radius due to dividing by a determinant of 0. In
this case, the points are colinear, so the longest distance between any
2 points is the diameter of the minimum enclosing circle.
* imgproc(test): update test checks for minEnclosingCircle()
* imgproc: fix handling of special cases in minEnclosingCircle()
G-API: Tutorial: Face beautification algorithm implementation
* Introduce a tutorial on face beautification algorithm
- small typo issue in render_ocv.cpp
* Addressing comments rgarnov smirnov-alexey
* Eltwise::DIV support in Halide backend
* fix typo
* remove div from generated test suite to pass CI, switching to manual test...
* ensure divisor not near to zero
* use randu
* dnn(test): update test data for Eltwise.Accuracy/DIV layer test
Add checks for empty operands in Matrix expressions that don't check properly
* Starting to add checks for empty operands in Matrix expressions that
don't check properly.
* Adding checks and delcarations for checker functions
* Fix signatures and add checks for each class of Matrix Expr operation
* Make it catch the right exception
* Don't expose helper functions to public API
* G-API: Added G-API Overview slides & its source code
- Sample code snippets are moved to separate files;
- Introduced a separate benchmark to measure Fluid/OpenCV
performance;
- Added notes on API changes (it is still a 4.0, not a 4.2 talk!)
- Added a "Metropolis" beamer download-n-build script.
* G-API: Addressed review issues on G-API overview slides
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization
The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.
The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.
Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.
This should resolve#16137
* imgproc(resize): extra check
dnn(eltwise): fix handling of different number of channels
* dnn(test): reproducer for Eltwise layer issue from PR16063
* dnn(eltwise): rework support for inputs with different channels
* dnn(eltwise): get rid of finalize(), variableChannels
* dnn(eltwise): update input sorting by number of channels
- do not swap inputs if number of channels are same after truncation
* dnn(test): skip "shortcut" with batch size 2 on MYRIAD targets
G-API: Fix various issues for 4.2 release
* G-API: Fix issues reported by Coverity
- Fixed: passing values by value instead of passing by reference
* G-API: Fix redundant std::move()'s in return statements
Fixes#15903
* G-API: Added a smarter handling of Stop messages in the pipeline
- This should fix the "expected 100, got 99 frames" problem
- Fixes#15882
* G-API: Pass enum instead of GKernelPackage in Streaming test parameters
- Likely fixes#15836
* G-API: Address review issues in new bugfix comments
* G-API-NG/Docs: Added a tutorial page on interactive face detection sample
- Introduced a "--ser" option to run the pipeline serially for
benchmarking purposes
- Reorganized sample code to better fit the documentation;
- Fixed a couple of issues (mainly typos) in the public headers
* G-API-NG/Docs: Reflected meta-less compilation in new G-API tutorial
* G-API-NG/Docs: Addressed review comments on Face Analytics Pipeline example
cuda4dnn(resize): process multiple channels each iteration
* resize bilinear: process multiple chans. per iter.
* remove unused headers
* correct dispatch logic
* resize_nn: process multiple chans. per iter.
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
Test create custom layer in python
* check is contiguos
* Add custom layer test
* Fix test
* Remove assert
* Move assert to pyopencv dnn
* remove assert
* Add unregister
* Fix python2
* proto to bytearray
* Fix data type
* G-API: Addressed various documentation issues
- Fixed various typos and missing references;
- Added brief documentaion on G_TYPED_KERNEL and G_COMPOUND_KERNEL macros;
- Briefly described GComputationT<>;
- Briefly described G-API data objects (in a group section).
* G-API: Some clean-ups in doxygen, also a chapter on Render API
* G-API: Expose more graph compilation arguments in the documentation
* G-API: Address documentation review comments
* calib3d: use normalized input in solvePnPGeneric()
* calib3d: java regression test for solvePnPGeneric
* calib3d: python regression test for solvePnPGeneric
* core: disable invalid constructors in C API by default
- C API objects will lose their default initializers through constructors
* samples: stop using of C API
Fix cudacodec python
* Add python bindings to cudacodec.
* Allow args with CV_OUT GpuMat& or CV_OUT cuda::GpuMat& to generate python bindings that allow the argument to be an optional output in the same way as OutputArray.
* Add wrapper flag to indicate that an OutputArray is a GpuMat.
* python: drop CV_GPU, extra checks in test
* Remove "cuda::GpuMat" check rom python parser
G-API-NG/Streaming: don't require explicit metadata in compileStreaming()
* First probably working version
Hardcode gose to setSource() :)
* Pre final version of move metadata declaration from compileStreaming() to setSource().
* G-API-NG/Streaming: recovered the existing Streaming functionality
- The auto-meta test is disabling since it crashes.
- Restored .gitignore
* G-API-NG/Streaming: Made the meta-less compileStreaming() work
- Works fine even with OpenCV backend;
- Fluid doesn't support such kind of compilation so far - to be fixed
* G-API-NG/Streaming: Fix Fluid to support meta-less compilation
- Introduced a notion of metadata-sensitive passes and slightly
refactored GCompiler and GFluidBackend to support that
- Fixed a TwoVideoSourcesFail test on streaming
* Add three smoke streaming tests to gapi_streaming_tests.
All three teste run pipeline with two different input sets
1) SmokeTest_Two_Const_Mats test run pipeline with two const Mats
2) SmokeTest_One_Video_One_Const_Scalar test run pipleline with Mat(video source) and const Scalar
3) SmokeTest_One_Video_One_Const_Vector test run pipeline with Mat(video source) and const Vector
# Please enter the commit message for your changes. Lines starting
* style fix
* Some review stuff
* Some review stuff
* Added Swish and Mish activations
* Fixed whitespace errors
* Kernel implementation done
* Added function for launching kernel
* Changed type of 1.0
* Attempt to add test for Swish and Mish
* Resolving type mismatch for log
* exp from device
* Use log1pexp instead of adding 1
* Added openCL kernels
(1/4) Revert "Correct image borders and principal point computation in cv::stereoRectify"
This reverts commit 93ff1fb2f2.
(2/4) Revert "fix calib3d changes in 6836 plus some others"
This reverts commit fa42a1cfc2.
(3/4) Revert "fix compiler warning"
This reverts commit b3d55489d3.
(4/4) Revert "add test for 6836"
This reverts commit d06b8c4ea9.
Tests for argument conversion of Python bindings generator
* Tests for parsing elemental types from Python bindings
- Add positive and negative tests for int, float, double, size_t,
const char*, bool.
- Tests with wrong conversion behavior are skipped.
* Move implicit conversion of bool to integer/floating types to wrong
conversion behavior.
Fix incorrect use of std::move() in g-api perf tests
* First version
* Fix perfomace tests
Replace
c.apply(...);
with
cc = c.compile(...);
cc(...);
* Remove output meta arguments from .compile()
* Style fix
* Remove useless commented string
* Stick to common pattern : i.e. use gin() and gout() explicitly.
* Use cc(gin(...), gout(...)) in all cases.
* Fix infinite loop when trying to change state of the busy camera
- Add finite number of attempts in tryIoctl functions
10 by default.
* Introduced new flag for ioctl call to handle EBUSY
Improving VSX performance of integral function
* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.
* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.
* Adding SSE/NEON/AVX intrinsics.
* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.
* core(simd): add tests for v_extract_n/v_broadcast_element
- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations
* core(simd): fix compilation
- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only
* cleanup
Add retrieve encoded frame to VideoCapture
* Add capacity to retrieve the encoded frame from a VideoCapture object.
* Correct raw codec and pixle format output from ffmpeg capture.
* Remove warnings from build.
* Added VideoCaptureRaw subclass.
* Include abstract base class VideoCaptureBase and rename new subclass VideoContainer as suggested by mshabunin.
* Remove using.
* Change base class name for compatibility with jave bindings generator.
* Move grab and retrieve and add override specifier
* Add setRaw and readRaw to IVideoCapture interface
-setRaw to disable video decoding and enable bitstream filters from mp4 to h254 and h265.
-readRaw to return the raw undecoded/filtered bitstream.
Add createRawCapture to initiate a backend with setRaw enabled.
Remove inheritance and use an independant VideoContainer subclass with IVideoCapture member.
* Address unused parameter warings.
Remove VideoContainer from python bindings as it no longer returns a Mat.
Use opencv type uchar instead of unsigned char.
Add missing destructor to VideoContainer class.
* Address build warnings and include all params in documentation.
* Include deprecated bitstream filtering API.
* Update codec_id query to work with older ffmpeg api's.
Change api version defines to be consistent - most recent api version first.
* Fix typo.
* Update test to work with naming of new files in the extra repo
* Investigate test failure
* Check bytes read by ffmpeg
* Removed mp4 video container test
* Applied suggested changes.
* videoio: rework API for extraction of RAW video streams
- FFmpeg only
* address review comments
Introducing the sample of Face Beautification algorithm implemented via Graph-API
* Introducing the sample of Face Beautification algorithm implemented via Graph-API
- 'gapi/samples/face_beautification.cpp' added
- FIXME added in 'gcpukernel.hpp'
* INF_ENGINE fix
- preprocessing clauses added not to run the sample without Inference Engine
* INF_ENGINE fix 2
- warnings removed
* Fixes
- checking IE version cut as there is no dependency
- some alignments fixed
- the comment about preprocessing commands fixed
* ie::backend() issue fix (according to dmatveev)
- as the sample needs the cv::gapi::ie::backend() to be defined regardless of having IE or not, there is its throw-error definition in `giebackend.cpp` now (by dmatveev)
- for the same reason, #includes in `giebackend.hpp` are fixed
- HAVE_INF_ENGINE check is removed from the sample
Implement Camera Multiplexing API
* IdideoCapture + two wrong function
function waitAny
Add errors catcher
Stub for Python added.
Sifting warnings
One test added
Two tests for camera and Perf tests added
* Perf sync and async tests for waitAny() added, waitAnyInterior() deleted, getDeviceHandle() deleted
* Variable OPENCV_TEST_CAMERA_LIST added
* Without fps set
* ASSERT_FAILED for environment variable
* Perf tests is DISABLED_
* --Trailing whitespace
* Return false from cap.cpp deleted
* Two functions deleted from interface, +range for, +environment variable in test_camera
* Space deleted
* printf deleted, perror added
* CV_WRAP deleted, cv2 cleared from stubs
* -- space
* default timeout added
* @param changed
* place of waitAny changed
* --whitespace
* ++function description
* function description changed
* revert unused changes
* videoio: rework API for VideoCapture::waitAny()
Supported ONNX Squeeze, ReduceL2 and Eltwise::DIV
* Support eltwise div
* Fix test
* OpenCL support added
* refactoring
* fix code style
* Only squeeze with axes supported
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).
* Adding NEON code back in for non 64-bit platforms.
* Remove floats from post processing.
Clarify stereoRectify() doc
The function stereoRectify() takes as input a coordinate transform between two cameras. It is ambiguous how it goes. I clarified that it goes from the second camera to the first.
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above
* Fix 32-bit build
* Fixed calling convention of cleanup callback
* WINAPI changed to NTAPI
* Use proper guard macro
* Vectorize flipHoriz and flipVert functions.
* Change v_load_mirror_1 to use vec_revb for VSX
* Only use vec_revb in ISA3.0
* Removing vec_revb code since some of the older compilers don't fully support it.
* Use new v_reverse intrinsic and cleanup code.
* Ensure there are no alignment issues with copies
Build DoG Pyramid if useProvideKeypoints is false
The buildDoGPyramid operation need not be performed unconditionally. In cases where it is not needed, both memory and speed performance can be improved
original commit: e45887e1c0
* Doc bugfix
The documentation page StereoBinaryBM and StereoBinarySGBM says that it returns a disparity that is scaled multiplied by 16. This scaling must be undone before calling reprojectImageTo3D, otherwise the results are wrong. The function reprojectImageTo3D() could do this scaling internally, maybe, but at least the documentation must explain that this has to be done.
* calib3d: update reprojectImageTo3D documentation
* calib3d: add StereoBM/StereoSGBM into notes list
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
Fixing bug with comparison of v_int64x2 or v_uint64x2
* Casting v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.
* Fix bad merge.
* Fix equal comparsion for non-SSE4.1. Add test cases for v_int64x2 comparisons.
* Try to fix merge conflict.
* Only test v_int64x2 comparisons if CV_SIMD_64F
* Fix compiler warning.
* G-API: Doxygen documentatation for Async API
* G-API: Doxygen documentatation for Async API
- renamed local variable (reading parameter async) async ->
asyncNumReq in object_detection DNN sample
to avoid Doxygen erroneous linking the sample to cv::gapi::wip::async
documentation
* G-API-NG/Streaming: Introduced a Streaming API
Now a GComputation can be compiled in a special "streaming" way
and then "played" on a video stream.
Currently only VideoCapture is supported as an input source.
* G-API-NG/Streaming: added threading & real streaming
* G-API-NG/Streaming: Added tests & docs on Copy kernel
- Added very simple pipeline tests, not all data types are covered yet
(in fact, only GMat is tested now);
- Started testing non-OCV backends in the streaming mode;
- Added required fixes to Fluid backend, likely it works OK now;
- Added required fixes to OCL backend, and now it is likely broken
- Also added a UMat-based (OCL) version of Copy kernel
* G-API-NG/Streaming: Added own concurrent queue class
- Used only if TBB is not available
* G-API-NG/Streaming: Fixing various issues
- Added missing header to CMakeLists.txt
- Fixed various CI issues and warnings
* G-API-NG/Streaming: Fixed a compile-time GScalar queue deadlock
- GStreamingExecutor blindly created island's input queues for
compile-time (value-initialized) GScalars which didn't have any
producers, making island actor threads wait there forever
* G-API-NG/Streaming: Dropped own version of Copy kernel
One was added into master already
* G-API-NG/Streaming: Addressed GArray<T> review comments
- Added tests on mov()
- Removed unnecessary changes in garray.hpp
* G-API-NG/Streaming: Added Doxygen comments to new public APIs
Also fixed some other comments in the code
* G-API-NG/Streaming: Removed debug info, added some comments & renamed vars
* G-API-NG/Streaming: Fixed own-vs-cv abstraction leak
- Now every island is triggered with own:: (instead of cv::)
data objects as inputs;
- Changes in Fluid backend required to support cv::Mat/Scalar were
reverted;
* G-API-NG/Streaming: use holds_alternative<> instead of index/index_of test
- Also fixed regression test comments
- Also added metadata check comments for GStreamingCompiled
* G-API-NG/Streaming: Made start()/stop() more robust
- Fixed various possible deadlocks
- Unified the shutdown code
- Added more tests covering different corner cases on start/stop
* G-API-NG/Streaming: Finally fixed Windows crashes
In fact the problem hasn't been Windows-only.
Island thread popped data from queues without preserving the Cmd
objects and without taking the ownership over data acquired so when
islands started to process the data, this data may be already freed.
Linux version worked only by occasion.
* G-API-NG/Streaming: Fixed (I hope so) Windows warnings
* G-API-NG/Streaming: fixed typos in internal comments
- Also added some more explanation on Streaming/OpenCL status
* G-API-NG/Streaming: Added more unit tests on streaming
- Various start()/stop()/setSource() call flow combinations
* G-API-NG/Streaming: Added tests on own concurrent bounded queue
* G-API-NG/Streaming: Added more tests on various data types, + more
- Vector/Scalar passed as input;
- Vector/Scalar passed in-between islands;
- Some more assertions;
- Also fixed a deadlock problem when inputs are mixed (1 constant, 1 stream)
* G-API-NG/Streaming: Added tests on output data types handling
- Vector
- Scalar
* G-API-NG/Streaming: Fixed test issues with IE + Windows warnings
* G-API-NG/Streaming: Decoupled G-API from videoio
- Now the core G-API doesn't use a cv::VideoCapture directly,
it comes in via an abstract interface;
- Polished a little bit the setSource()/start()/stop() semantics,
now setSource() is mandatory before ANY call to start().
* G-API-NG/Streaming: Fix STANDALONE build (errors brought by render)
If an aravis camera is software triggered, a trigger needs to be explicitly sent using `arv_camera_software_trigger`, otherwise the camera will not grab any frames.
* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles
* Adding support for vectorized masking for uchar/ushort.
* Fixing bug where mask was zeroing the dst. Improved the way to calculate
the mask and tweaked for further performance improvements.
* Fixing mask comparison test.
* Restricting to one channel.
* Adding support for 3 channels, switch old approach to start using HAL's
v_select.
* Cuda + OpenGL on ARM
There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations.
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.
* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
* core: rework and optimize SIMD implementation of dotProd
- add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
- add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
- fix clang build on ppc64le
- support wide universal intrinsics for dotProd_32s
- remove raw SIMD and activate universal intrinsics for dotProd_8
- implement SIMD optimization for dotProd_s16&u16
- extend performance test data types of dotprod
- fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
- optimize v_mul_expand(int32) on VSX
* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast
this changes made depend on "terfendail" review
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
> Size parameter is changed from int to cv::Size type to allow rectangle kernels
> Kernel creation code is adopted for different kernel sizes to not create only white images on the output
G-API: add transformation logic to GCompiler
* Introduce transformation logic to GCOmpiler
* Remove partialOk() method
* Fix minor issues
* Refactor code according to code review
1. Re-design matchPatternToSubstitute logic
2. Update transformations order
3. Replace check_transformations pass with a
one time check in GCompiler ctor
* Revert unused nodes handling in pattern matching
* Address minor code review issues
* Address code review comments:
1) Fix some mistakes
2) Add new tests for endless loops
3) Update GCompiler's transformations logic
* Simplify GCompiler check for endless loops
1. Simplify transformations endless loops check:
- Original idea wasn't a full solution
- Need to develop a good method (heuristic?) to find loops
in general case (TODO)
2. Remove irrelevant Endless Loops tests
3. Add new "bad arg" tests and unit tests
* Update comments
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)
* [GSoC 2019]
Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.
* [GSoC2019]
1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api
* [GSoC2019]
1. fix emscripten define
2. use fallback function for f16
* [GSoC2019]
Fix rebase issue
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed some unused code in mips.toolchain.cmake.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* issue 5769 fixed: cv::stereoRectify fails if given inliers mask of type vector<uchar>
* issue5769 fix using reshape and add regression test
* regression test with outlier detection, testing vector and mat data
* Size comparision of wrong vector within CV_Assert in regression test corrected
* cleanup test code
ISA 2.07 (aka POWER8) effectively extended the expanding multiply
operation to word types. The altivec intrinsics prior to gcc 8 did
not get the update.
Workaround this deficiency similar to other fixes.
This was exposed by commit 33fb253a66
which leverages the int -> dword expanding multiply.
This fixes Issue #15506
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.
* Disabling full tests by default.
* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.
* Giving proper names, making array static.
* - headers in "infer/" and "infer/ie/" folders are included into gapi_ext_hdrs;
+ because of that a few #includes are required in the headers
- HAVE_INF_ENGINE flag check in headers "infer/ie.hpp" and "infer/ie/util.hpp" is deleted
* - the "ie/util.hpp" header is a private header now as it's used for tests; it's been moved to the scr directory to the place next to the implementation file "ie/giebackend.cpp"
- the path to this header in files "ie/giebackend.cpp" and "test/infer/gapi_infer_ie_test.cpp" is updated
- As it's private header now and explicitly depends on IE, the "HAVE_INF_ENGINE" flag check is returned
* Support GArray as input in fluid kernels
* Create tests on GArray input in fluid
* Some fixes to fully support GArray
* Refactor code and change the kernel according to review
* Add histogram calculation as a G-API kernel
Add assert that input GArgs in fluid contain at least one GMat
* Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and 15% speedup on x86
* Change compile flag from CV_SIMD128 to CV_SIMD128_64F for use of v_float64x2 type
* Changing WarpPerspectiveLine from class functions and dispatching to static functions.
* Re-add dynamic runtime and dispatch execution.
* RRestore SSE4_1 optimizations inside opt_SSE4_1 namespace
* Convert lkpyramid from SSE SIMD to HAL - 90% faster on Power (VSX).
* Replace stores with reduce_sum. Rework to handle endianess correctly.
* Fix compiler warnings by casting values explicitly to shorts
* Switch to CV_SIMD128 compiler definition. Unroll loop to 8 elements since we've already loaded the data.
Detected by clang trunk:
```
opencv/modules/core/src/ocl.cpp:4337:37: warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling]
CV_OCL_CHECK_RESULT(retval, cv::format("clCreateBuffer(capacity=%lld) => %p", (long long int)entry.capacity_, (void*)entry.clBuffer_).c_str());
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
opencv/modules/core/src/ocl.cpp:193:42: note: expanded from macro 'CV_OCL_CHECK_RESULT'
if (0) { const char* msg_ = (msg); CV_UNUSED(msg_); /* ensure const char* type (cv::String without c_str()) */ } \
```
because `cv::format` yields a temporary std::string, and thus `msg_` points to a destroyed buffer.
* Fix the detection of XIMEA on Windows (when it has been installed by another user with administrative privileges, for example).
* Change the flow: we first try HKEY_CURRENT_USER key and, if empty, then try HKEY_LOCAL_MACHINE
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.
For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
* in embindgen.py added inpaint function
* added test for inpaint function and fixed function in build_js
* fixed test for inpaint function
* rotate deleted, build_js.py fixed
G-API: Fix Journal usage in Fluid backend (#15238)
* Fix Journal usage in Fluid backend
* Delete dumpDotRequired(): invalid check
* Update mem consumption test
* Test that new test works
* Debug memory consumption function
* Increase iterations in test
* Re-write memory consumption measurement part
* Restore correct fix for Fluid journals
* G-API: rename ArgKind OPAQUE to GOPAQUE
Rename ArgKind value to GOPAQUE to fix conflict in the
user code when wingdi.h is included: it defines OPAQUE
macro that (for some reason) is chosen instead of ArgKind
value
* Add compatibility with existing API
* Renamed GOPAQUE to OPAQUE_VAL
Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX) (#15199)
* Convert SSE SIMD to HAL. 35-45% improvement for Power (VSX)
* Remove CV_NEON code. Use v_floor instead of 3 lines of code.
* Invert comparison logic to simplify code.
* Change initialization from v_load to constructor type.
* Remove unavoidable print of CV error
The return value covers whether the device exists.
This might be better hidden behind a debug flag, but I couldn't work out how to do that nicely.
* Use `CV_LOG_WARNING` macro to log rather than removing it entirely
* add -Wno-psabi when using GCC 6
* add -Wundef for CUDA 10
* add -Wdeprecated-declarations when using GCC 7
* add -Wstrict-aliasing and -Wtautological-compare for GCC 7
* replace cudaThreadSynchronize with cudaDeviceSynchronize
Implement cvRound using inline asm. No compiler support
exists today to properly optimize this. This results in
about a 4x speedup over the default rounding. Likewise,
simplify the growing number of rounding function overloads.
For P9 enabled targets, utilize the classification
testing instruction to test for Inf/Nan values. Operation
speedup is about 1.2x for FP32, and 1.5x for FP64 operands.
For P8 targets, fallback to the GCC nan inline. It provides
a 1.1/1.4x improvement for FP32/FP64 arguments.
Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable
usage of GCC inline math functions, if available and requested by the
user.
Likewise, enable it for POWER. This is nearly always a substantial
improvement over using integer manipulation as most operations can
be done in several instructions with no branching. The result is a
1.5-1.8x speedup in the ceil/floor operations.
1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.
Add a basic sanity test to verify the rounding functions
work as expected.
Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
Support new IE API (#15184)
* Add support OpenVINO R2 for layers
* Add Core API
* Fix tests
* Fix expectNoFallbacksFromIE for ONNX nets
* Remove deprecated API
* Remove td
* Remove TargetDevice
* Fix Async
* Add test
* Fix detectMyriadX
* Fix test
* Fix warning
* G-API-NG/API: Introduced inference API and IE-based backend
- Very quick-n-dirty implementation
- OpenCV's own DNN module is not used
- No tests so far
* G-API-NG/IE: Refined IE backend, added more tests
* G-API-NG/IE: Fixed various CI warnings & build issues + tests
- Added tests on multi-dimensional own::Mat
- Added tests on GMatDesc with dimensions
- Documentation on infer.hpp
- Fixed more warnings + added a ROI list test
- Fix descr_of clash for vector<Mat> & standalone mode
- Fix build issue with gcc-4.8x
- Addressed review comments
* G-API-NG/IE: Addressed review comments
- Pass `false` to findDataFile()
- Add deprecation warning suppression macros for IE
* Support for several min and max sizes in PriorBox layer
* Fix minSize
* Check size
* Modify initInfEngine
* Fix tests
* Fix IE support
* Add priorbox test
* Remove inputs
* G-API: fix GOCLExecutable issue with UMat lifetime
Add tests on initialized/uninitialized outputs for all
backends
* Use proper clean-up procedure for magazine
* Rename InitOut test and reduce tested sizes
* Enable output allocation test
G-API: GAPI_TRANSFORM internal functionality rework (#14952)
* Change internal pattern and substitute signatures and refactor tests
* Enhance GArrayU with type-checker function
Add a couple of new tests on GAPI_TRANSFORM
* Added support for the ONNX "ReduceMean" Layer. (as this is the same as the GlobalAveragePool)
* Add ReduceMean test
* Fix ONNX importer
* Fix ReduceMean
* Add assert
* Split test
* Fix split test
G-API: clean up accuracy tests (#14945)
* Delete createOutputMatrices flag
Update the way compile args function is created
Fix instantiation suffix print function
* Update comment (NB)
* Make printable comparison functions
* Use defines instead of objects for compile args
* Remove custom printers, use operator<< overload
* Remove SAME_TYPE and use -1 instead
* Delete createOutputMatrices flag in new tests
* Fix GetParam() printed values
* Update Resize tests: use CompareF object
* Address code review feedback
* Add default cases for operator<< overloads
* change throw to GAPI_Assert
G-API: Add output allocation tests for backends (#15012)
* Add output tests for backends
* Fix large size test: output is in fact reallocated
* Use cv::Mat copies for reallocation tracking
* Separate LargeSizeWithCorrectSubmatrix test
* Rename backed output allocation tests
* Address code review feedback
Update test names
Add illustrative "expect (non-)empty" checks
Rename mat "copy" to mat reference
Add more pointer checks
* Add illustrative checks
- Added new graph compile time argument to specify multiple independent
ROIs (Tiles)
- Added new "executable" with serial loop other user specified
ROIs(Tiles)
- refactored graph traversal code into separate function to be called
once
- added saturate cast to Fluid AddCsimple test kernel
G-API planar kernels (#14917)
* Added resizeP with tests
* NV12 planar filters
* fix warnings in ResizeP test
* fix out mat ocv warning
* sz_on - > sz rename
* cpu tests new signature
* try to fix resizeP test
* trailing spaces remove
* doxygen doc fixed
* doxygen minor fix
* more doxygen fixes
* Doxygen corrected and extended after review.
Crosscorr cleanup (#14936)
* Simplify code for convolution destination type/size
For the 2d filter code, destination size equals source size, and the
crossCorr function even (re-)creates the output matrix with the given size.
The number of channels also have to match. The destination type() is the
one used to create the output matrix, so we can use its type() here.
This is a preparatory patch.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
* Remove redundant destination size and type parameters from crossCorr
All calling sites of crossCorr already use (...,
mat, mat.size(), mat.type(), ...), so the parameters are redundant.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Due to the explicitly declared copy constructor Vec<T, n>::Vec(Vec <T,n>&)
GCC 9 warns if there is no assignment operator, as having one typically
requires the other (rule-of-three, constructor/desctructor/assginment).
As the values are just a plain array the default assignment operator does
the right thing. Tell the compiler explicitly to default it.
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
G-API: Parameterized render tests (#14892)
* Init commit
* Add mat size as test parameter
* Add test for text render
* Add test for rect render
* Add tests for line and circle
* Remove old render tests
* Init output mats
* Remove methods input arguments
* Add comment about data loss in BGR2NV12 conversion
* Add edge test cases
* Replace default color for out mats black -> white
G-API: Introduce new approach to write accuracy tests (#14757)
* G-API: Introduce new common accuracy test fixture
* Enable Range<> to Seq<> implicit conversion
* Fix shadowing parameters
* Update license headers
* Rename ALIGNED_TYPE to SAME_TYPE
* Move MkRange to tests
* Fix TODO(agolubev) in test instantiations
* Squash simple fixture declarations in one line
* Remove unused line
* Fix Windows issues with macro expansion
* Choose between 1 or 2 matrix initialization
* Redesign common class behavior
Use "views" for GetParam() provided by GTest
base class instead of doing segregation
(with copy!) of common and specific parameters:
request common or specific parameter directly
by index from GetParam()-returned parameters
* Refine user-level API and usage of new test model
* Fix -fpermissive errors
* Remove unnecessary init calls
* Replace GCompileArgs member variable with func ptr
* Rename initMatsRandN to make its behavior explicit
Rename initMatsRandN to initMatrixRandN to eliminate confusion:
initMatsRandN only initialized first matrix (similarly to
initMatrixRandU)
* Fix common of initNothing
* Update copyright dates in missed files
* Add check for specific parameters
* Fix coment stlye