the float variant was always shadowed by the int version as
Rect2d is implicitly convertible to Rect.
This swaps things which is fine, as the vector of boxes was always
copied and the computation was done in double.
* feature: Add video capture bitrate read-only property for FFMPEG backend
* test: For WIN32 property should be either expected or 0.
Added `IsOneOf` helper function, enabled only for _WIN32.
dnn(darknet-importer): add grouped convolutions, sigmoid, swish, scale_channels
* update darknet importer to support enetb0-yolo
* remove dropout (pr16438) and fix formatting
* add test for scale_channels
* disable batch testing for scale channels
* do not set LayerParams::name
* merge all activations into setActivation
* Add Tengine support .
* Modify printf to CV_LOG_WARNING
* a few minor fixes in the code
* Renew Tengine version
* Add header file for CV_LOG_WARNING
* Add #ifdef HAVE_TENGINE in tengine_graph_convolution.cpp
* remove trailing whitespace
* Remove trailing whitespace
* Modify for compile problem
* Modify some code style error
* remove whitespace
* Move some code style problem
* test
* add ios limit and build problem
* Modified as alalek suggested
* Add cmake 2.8 support
* modify cmake 3.5.1 problem
* test and set BUILD_ANDROID_PROJECTS OFF
* remove some compile error
* remove some extra code in tengine
* close test.
* Test again
* disable android.
* delete ndk version judgement
* Remove setenv() call . and add License information
* Set tengine default OFF. Close test .
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Lets the user choose the maximum number of iterations the robust
estimator runs for, similary to findHomography. This can significantly
improve performance (at a computational cost).
The hard-coded string value "Mat" was used in the two format strings for vector_mat and vector_mat_template, preventing UMat arguments to functions that have these types from working correctly. as noted in #12231.
* Vectorize calculating integral for line for single and multiple channels
* Single vector processing for 4-channels - 25-30% faster
* Single vector processing for 4-channels - 25-30% faster
* Fixed AVX512 code for 4 channels
* Disable 3 channel 8UC1 to 32S for SSE2 and SSE3 (slower). Use new version of 8UC1 to 64F for AVX512.
fixed the ordering of contour convex hull points
* partially fixed the issue #4539
* fixed warnings and test failures
* fixed integer overflow (issue #14521)
* added comment to force buildbot to re-run
* extended the test for the issue 4539. Check the expected behaviour on the original contour as well
* added comment; fixed typo, renamed another variable for a little better clarity
* added yet another part to the test for issue #4539, where we run convexHull and convexityDetects on the original contour, without any manipulations. the rest of the test stays the same
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)
* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code
* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)
* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
fixed cv::moveWindow() on mac
* fixed cv::moveWindow() on mac (issue #16343). Thanks to cwreynolds and saskatchewancatch for the help!
* fixed warnings about _x0 and _y0
* fixed warnings about _x0 and _y0
* Fix NN resize with dimentions > 4
* add test check for nn resize with channels > 4
* Change types from float to double
* Del unnecessary test file. Move nn test to test_imgwarp. Add 5 channels test only.
Fix compilation errors on GLES platforms
* Do not include glx.h when using GLES
GL/glx.h is included on all LINUX plattforms, which is wrong
for a number of reasons:
- GL_PERSPECTIVE_CORRECTION_HINT is defined in GL/gl.h, so we
want gl.h not glx.h, the latter just includes the former
- GL/gl.h is a Desktop GL header, and should not be included
on GLES plattforms
- GL/gl.h is already included via QtOpenGL ->
QtGui/qopengl.h on desktop plattforms
This fixes a problem when Qt is compiled with GLES, which
is often done on ARM platforms where desktop GL is not or
only poorly supported (e.g. slow due to emulation).
Fixes part of #9171.
* Only set GL_PERSPECTIVE_CORRECTION_HINT when GL version defines it
GL_PERSPECTIVE_CORRECTION_HINT does not exist in GLES 2.0/3.x,
and has been deprecated in OpenGL 3.0 core profiles.
Fixes part of #9171.
This is a correction of the previously missleading documentation and a warning related to a common calibration failure described in issue 15992
* corrected incorrect description of failed calibration state.
see issue 15992
* calib3d: apply suggestions from code review by catree
QR-Code detector : multiple detection
* change in qr-codes detection
* change in qr-codes detection
* change in test
* change in test
* add multiple detection
* multiple detection
* multiple detect
* add parallel implementation
* add functional for performance tests
* change in test
* add perftest
* returned implementation for 1 qr-code, added support for vector<Mat> and vector<vector<Point2f>> in MultipleDetectAndDecode
* deleted all lambda expressions
* changing in triangle sort
* fixed warnings
* fixed errors
* add java and python tests
* change in java tests
* change in java and python tests
* change in perf test
* change in qrcode.cpp
* add spaces
* change in qrcode.cpp
* change in qrcode.cpp
* change in qrcode.cpp
* change in java tests
* change in java tests
* solved problems
* solved problems
* change in java and python tests
* change in python tests
* change in python tests
* change in python tests
* change in methods name
* deleted sample qrcode_multi, change in qrcode.cpp
* change in perf tests
* change in objdetect.hpp
* deleted code duplication in sample qrcode.cpp
* returned spaces
* added spaces
* deleted draw function
* change in qrcode.cpp
* change in qrcode.cpp
* deleted all draw functions
* objdetect(QR): extractVerticalLines
* objdetect(QR): whitespaces
* objdetect(QR): simplify operations, avoid duplicated code
* change in interface, additional checks in java and python tests, added new key in sample for saving original image from camera
* fix warnings and errors in python test
* fix
* write in file with space key
* solved error with empty mat check in python test
* correct path to test image
* deleted spaces
* solved error with check empty mat in python tests
* added check of empty vector of points
* samples: rework qrcode.cpp
* objdetect(QR): fix API, input parameters must be first
* objdetect(QR): test/fix points layout
* Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster on VSX
* Add is_same method so c++11 is not required
* Remove trailing whitespaces.
* Change is_same to DataType depth check
Added type check for solvePnPGeneric | Issue: #16049
* Added type check
* Added checks before type fix
* Tests for 16049
* calib3d: update solvePnP regression check (16049)
Vectorize minMaxIdx functions
* Updated documentation and intrinsic tests for v_reduce
* Add other files back in from the forced push
* Prevent an constant overflow with v_reduce for int8 type
* Another alternative to fix constant overflow warning.
* Fix another compiler warning.
* Update comments and change comparison form to be consistent with other vectorized loops.
* Change return type of v_reduce_min & max for v_uint8 and v_uint16 to be same as lane type.
* Cast v_reduce functions to int to avoid overflow. Reduce number of parameters in MINMAXIDX_REDUCE macro.
* Restore cast type for v_reduce_min & max to LaneType
Fix implicit conversion from array to scalar in python bindings
* Fix wrong conversion behavior for primitive types
- Introduce ArgTypeInfo namedtuple instead of plain tuple.
If strict conversion parameter for type is set to true, it is
handled like object argument in PyArg_ParseTupleAndKeywords and
converted to concrete type with the appropriate pyopencv_to function
call.
- Remove deadcode and unused variables.
- Fix implicit conversion from numpy array with 1 element to scalar
- Fix narrowing conversion to size_t type.
* Fix wrong conversion behavior for primitive types
- Introduce ArgTypeInfo namedtuple instead of plain tuple.
If strict conversion parameter for type is set to true, it is
handled like object argument in PyArg_ParseTupleAndKeywords and
converted to concrete type with the appropriate pyopencv_to function
call.
- Remove deadcode and unused variables.
- Fix implicit conversion from numpy array with 1 element to scalar
- Fix narrowing conversion to size_t type.·
- Enable tests with wrong conversion behavior
- Restrict passing None as value
- Restrict bool to integer/floating types conversion
* Add PyIntType support for Python 2
* Remove possible narrowing conversion of size_t
* Bindings conversion update
- Remove unused macro
- Add better conversion for types to numpy types descriptors
- Add argument name to fail messages
- NoneType treated as a valid argument. Better handling will be added
as a standalone patch
* Add descriptor specialization for size_t
* Add check for signed to unsigned integer conversion safety
- If signed integer is positive it can be safely converted
to unsigned
- Add check for plain python 2 objects
- Add check for numpy scalars
- Add simple type_traits implementation for better code style
* Resolve type "overflow" false negative in safe casting check
- Move type_traits to separate header
* Add copyright message to type_traits.hpp
* Limit conversion scope for integral numpy types
- Made canBeSafelyCasted specialized only for size_t, so
type_traits header became unused and was removed.
- Added clarification about descriptor pointer
Add lightweight IE hardware targets checks
nGraph: Concat with paddings
Enable more nGraph tests
Restore FP32->FP16 for GPU plugin of IE
try to fix buildbot
Use lightweight IE targets check only starts from R4
- some of `icvCvt_BGR*` functions have R with B channels
swapped what leads to the wrong conversion
- renames misleading `rgb` variable name to `bgr`
- swap back the conversion coefficients, `cB` should be the first
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Actually, we can do this in constant time. xofs always
contains same or increasing offset values. We can instead
find the most extreme value used and never attempt to load it.
Similarly, we can note for all dx >= 0 and dx < (dwidth - cn)
where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn).
Thus, we can use this to control our loop termination optimally.
This fixes#16137 with little or no performance impact. I have
also added a debug check as a sanity check.
* Handle det == 0 in findCircle3pts.
Issue 16051 shows a case where findCircle3pts returns NaN for the
center coordinates and radius due to dividing by a determinant of 0. In
this case, the points are colinear, so the longest distance between any
2 points is the diameter of the minimum enclosing circle.
* imgproc(test): update test checks for minEnclosingCircle()
* imgproc: fix handling of special cases in minEnclosingCircle()
* Eltwise::DIV support in Halide backend
* fix typo
* remove div from generated test suite to pass CI, switching to manual test...
* ensure divisor not near to zero
* use randu
* dnn(test): update test data for Eltwise.Accuracy/DIV layer test
Add checks for empty operands in Matrix expressions that don't check properly
* Starting to add checks for empty operands in Matrix expressions that
don't check properly.
* Adding checks and delcarations for checker functions
* Fix signatures and add checks for each class of Matrix Expr operation
* Make it catch the right exception
* Don't expose helper functions to public API
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization
The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.
The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.
Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.
This should resolve#16137
* imgproc(resize): extra check
dnn(eltwise): fix handling of different number of channels
* dnn(test): reproducer for Eltwise layer issue from PR16063
* dnn(eltwise): rework support for inputs with different channels
* dnn(eltwise): get rid of finalize(), variableChannels
* dnn(eltwise): update input sorting by number of channels
- do not swap inputs if number of channels are same after truncation
* dnn(test): skip "shortcut" with batch size 2 on MYRIAD targets
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
Test create custom layer in python
* check is contiguos
* Add custom layer test
* Fix test
* Remove assert
* Move assert to pyopencv dnn
* remove assert
* Add unregister
* Fix python2
* proto to bytearray
* Fix data type
* calib3d: use normalized input in solvePnPGeneric()
* calib3d: java regression test for solvePnPGeneric
* calib3d: python regression test for solvePnPGeneric
* core: disable invalid constructors in C API by default
- C API objects will lose their default initializers through constructors
* samples: stop using of C API
(1/4) Revert "Correct image borders and principal point computation in cv::stereoRectify"
This reverts commit 93ff1fb2f2.
(2/4) Revert "fix calib3d changes in 6836 plus some others"
This reverts commit fa42a1cfc2.
(3/4) Revert "fix compiler warning"
This reverts commit b3d55489d3.
(4/4) Revert "add test for 6836"
This reverts commit d06b8c4ea9.
Tests for argument conversion of Python bindings generator
* Tests for parsing elemental types from Python bindings
- Add positive and negative tests for int, float, double, size_t,
const char*, bool.
- Tests with wrong conversion behavior are skipped.
* Move implicit conversion of bool to integer/floating types to wrong
conversion behavior.
* Fix infinite loop when trying to change state of the busy camera
- Add finite number of attempts in tryIoctl functions
10 by default.
* Introduced new flag for ioctl call to handle EBUSY
Improving VSX performance of integral function
* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.
* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.
* Adding SSE/NEON/AVX intrinsics.
* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.
* core(simd): add tests for v_extract_n/v_broadcast_element
- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations
* core(simd): fix compilation
- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only
* cleanup
Supported ONNX Squeeze, ReduceL2 and Eltwise::DIV
* Support eltwise div
* Fix test
* OpenCL support added
* refactoring
* fix code style
* Only squeeze with axes supported
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).
* Adding NEON code back in for non 64-bit platforms.
* Remove floats from post processing.
Clarify stereoRectify() doc
The function stereoRectify() takes as input a coordinate transform between two cameras. It is ambiguous how it goes. I clarified that it goes from the second camera to the first.
* Use FlsAlloc/FlsFree/FlsGetValue/FlsSetValue instead of TlsAlloc/TlsFree/TlsGetValue/TlsSetValue to implment TLS value cleanup when thread has been terminated on Windows Vista and above
* Fix 32-bit build
* Fixed calling convention of cleanup callback
* WINAPI changed to NTAPI
* Use proper guard macro
* Vectorize flipHoriz and flipVert functions.
* Change v_load_mirror_1 to use vec_revb for VSX
* Only use vec_revb in ISA3.0
* Removing vec_revb code since some of the older compilers don't fully support it.
* Use new v_reverse intrinsic and cleanup code.
* Ensure there are no alignment issues with copies
Build DoG Pyramid if useProvideKeypoints is false
The buildDoGPyramid operation need not be performed unconditionally. In cases where it is not needed, both memory and speed performance can be improved
original commit: e45887e1c0
* Doc bugfix
The documentation page StereoBinaryBM and StereoBinarySGBM says that it returns a disparity that is scaled multiplied by 16. This scaling must be undone before calling reprojectImageTo3D, otherwise the results are wrong. The function reprojectImageTo3D() could do this scaling internally, maybe, but at least the documentation must explain that this has to be done.
* calib3d: update reprojectImageTo3D documentation
* calib3d: add StereoBM/StereoSGBM into notes list
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method
(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
Fixing bug with comparison of v_int64x2 or v_uint64x2
* Casting v_uint64x2 to v_float64x2 and comparing does NOT work in all cases. Rewrite using epi64 instructions - faster too.
* Fix bad merge.
* Fix equal comparsion for non-SSE4.1. Add test cases for v_int64x2 comparisons.
* Try to fix merge conflict.
* Only test v_int64x2 comparisons if CV_SIMD_64F
* Fix compiler warning.
* New v_reverse HAL intrinsic for reversing the ordering of a vector
* Fix conflict.
* Try to resolve conflict again.
* Try one more time.
* Add _MM_SHUFFLE. Remove non-vectorize code in SSE2. Fix copy and paste issue with NEON.
* Change v_uint16x8 SSE2 version to use shuffles
* Adding support for vectorized masking for uchar/ushort.
* Fixing bug where mask was zeroing the dst. Improved the way to calculate
the mask and tweaked for further performance improvements.
* Fixing mask comparison test.
* Restricting to one channel.
* Adding support for 3 channels, switch old approach to start using HAL's
v_select.
* Cuda + OpenGL on ARM
There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations.
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.
* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
* core: rework and optimize SIMD implementation of dotProd
- add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
- add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
- fix clang build on ppc64le
- support wide universal intrinsics for dotProd_32s
- remove raw SIMD and activate universal intrinsics for dotProd_8
- implement SIMD optimization for dotProd_s16&u16
- extend performance test data types of dotprod
- fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
- optimize v_mul_expand(int32) on VSX
* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast
this changes made depend on "terfendail" review
- renamed Cascade Lake AVX512_CEL => AVX512_CLX (align with Intel SDE tool)
- fixed CLX instruction sets (no IFMA/VBMI)
- added flag to bypass CPU baseline check: OPENCV_SKIP_CPU_BASELINE_CHECK
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)
* [GSoC 2019]
Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.
* [GSoC2019]
1. use short license header
2. fix documentation node issue
3. remove the unused `hasSIMD128()` api
* [GSoC2019]
1. fix emscripten define
2. use fallback function for f16
* [GSoC2019]
Fix rebase issue
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed some unused code in mips.toolchain.cmake.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
* issue 5769 fixed: cv::stereoRectify fails if given inliers mask of type vector<uchar>
* issue5769 fix using reshape and add regression test
* regression test with outlier detection, testing vector and mat data
* Size comparision of wrong vector within CV_Assert in regression test corrected
* cleanup test code
ISA 2.07 (aka POWER8) effectively extended the expanding multiply
operation to word types. The altivec intrinsics prior to gcc 8 did
not get the update.
Workaround this deficiency similar to other fixes.
This was exposed by commit 33fb253a66
which leverages the int -> dword expanding multiply.
This fixes Issue #15506
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.
* Disabling full tests by default.
* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.
* Giving proper names, making array static.
* Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and 15% speedup on x86
* Change compile flag from CV_SIMD128 to CV_SIMD128_64F for use of v_float64x2 type
* Changing WarpPerspectiveLine from class functions and dispatching to static functions.
* Re-add dynamic runtime and dispatch execution.
* RRestore SSE4_1 optimizations inside opt_SSE4_1 namespace
* Convert lkpyramid from SSE SIMD to HAL - 90% faster on Power (VSX).
* Replace stores with reduce_sum. Rework to handle endianess correctly.
* Fix compiler warnings by casting values explicitly to shorts
* Switch to CV_SIMD128 compiler definition. Unroll loop to 8 elements since we've already loaded the data.
Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.
For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.
Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX) (#15199)
* Convert SSE SIMD to HAL. 35-45% improvement for Power (VSX)
* Remove CV_NEON code. Use v_floor instead of 3 lines of code.
* Invert comparison logic to simplify code.
* Change initialization from v_load to constructor type.
* Remove unavoidable print of CV error
The return value covers whether the device exists.
This might be better hidden behind a debug flag, but I couldn't work out how to do that nicely.
* Use `CV_LOG_WARNING` macro to log rather than removing it entirely
* add -Wno-psabi when using GCC 6
* add -Wundef for CUDA 10
* add -Wdeprecated-declarations when using GCC 7
* add -Wstrict-aliasing and -Wtautological-compare for GCC 7
* replace cudaThreadSynchronize with cudaDeviceSynchronize
Implement cvRound using inline asm. No compiler support
exists today to properly optimize this. This results in
about a 4x speedup over the default rounding. Likewise,
simplify the growing number of rounding function overloads.
For P9 enabled targets, utilize the classification
testing instruction to test for Inf/Nan values. Operation
speedup is about 1.2x for FP32, and 1.5x for FP64 operands.
For P8 targets, fallback to the GCC nan inline. It provides
a 1.1/1.4x improvement for FP32/FP64 arguments.
Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable
usage of GCC inline math functions, if available and requested by the
user.
Likewise, enable it for POWER. This is nearly always a substantial
improvement over using integer manipulation as most operations can
be done in several instructions with no branching. The result is a
1.5-1.8x speedup in the ceil/floor operations.
1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.
Add a basic sanity test to verify the rounding functions
work as expected.
Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
Support new IE API (#15184)
* Add support OpenVINO R2 for layers
* Add Core API
* Fix tests
* Fix expectNoFallbacksFromIE for ONNX nets
* Remove deprecated API
* Remove td
* Remove TargetDevice
* Fix Async
* Add test
* Fix detectMyriadX
* Fix test
* Fix warning
* Support for several min and max sizes in PriorBox layer
* Fix minSize
* Check size
* Modify initInfEngine
* Fix tests
* Fix IE support
* Add priorbox test
* Remove inputs