* Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and 15% speedup on x86
* Change compile flag from CV_SIMD128 to CV_SIMD128_64F for use of v_float64x2 type
* Changing WarpPerspectiveLine from class functions and dispatching to static functions.
* Re-add dynamic runtime and dispatch execution.
* RRestore SSE4_1 optimizations inside opt_SSE4_1 namespace
* Rewrite polar transformations
- A new wrapPolar function encapsulate both linear and semi-log remap
- Destination size is a parameter or calculated automatically to keep objects size between remapping
- linearPolar and logPolar has been deprecated
* Fix build warning and error in accuracy test
* Fix function name to warpPolar
* Explicitly specify the mapping mode, so we retain all the parameters as non-optional.
Introduces WarpPolarMode enum to specify the mapping mode in flags
* resolves performance warning on windows build
* removed duplicated logPolar and linearPolar implementations
* use universal intrinsic instead of raw intrinsic
* add 2 channels de-interleave on x86 platform
* add v_int32x4 version of v_muladd
* add accumulate version of v_dotprod based on the commit from seiko2plus on bf1852d
* remove some verify check in performance test
* avoid the out of boundary access and keep the performance
- Optimizations set change. Now IPP integrations will provide code for SSE42, AVX2 and AVX512 (SKX) CPUs only. For HW below SSE42 IPP code is disabled.
- Performance regressions fixes for IPP code paths;
- cv::boxFilter integration improvement;
- cv::filter2D integration improvement;
Added assertios to remap and warpAffine functions
As @mshabunin said, remap and warpAffine functions do not support more than 4 channels in
Bicubic and Lanczos4 interpolation modes. Assertions were added. Appropriate test was chenged.
resolves#8272
Warping a matrix with more than 4 channels using BORDER_CONSTANT and
INTER_NEAREST, INTER_CUBIC or INTER_LANCZOS4 interpolation led to
undefined behaviour. This commit changes the behavior of these methods
to be similar to that of INTER_LINEAR. Changed the scope of some of the
variables to more local. Modified some tests to be able to detect the
error described.
- don't use undefined flag=0. It should be CONSTANT instead.
- don't allow 'UMat* m=NULL' argument (except LOCAL/CONSTANT flags).
This case is not handled well to provide NULL __global pointers.
It is better to use '-D' macro defines instead (at least for performance)
Add new OpenCL kernels for bicubic interploation, it is 20% faster
than current warp image kernel with bicubic interploation.
Signed-off-by: Li Peng <peng.li@intel.com>
Add new ocl kernels for warpAffine and warpPerspective,
The average performance improvemnt is about 30%. The new
ocl kernels require CV_8UC1 format and support nearest
neighbor and bilinear interpolation.
Signed-off-by: Li Peng <peng.li@intel.com>
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;