Add new 5x5 gaussian blur kernel for CV_8UC1 format,
it is 50% ~ 70% faster than current ocl kernel in the perf test.
Signed-off-by: Li Peng <peng.li@intel.com>
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;
This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*
Signed-off-by: Li Peng <peng.li@intel.com>
The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test
./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*
Also add test cases for this ocl kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* seriously improved performance of blur function, especially 3x3 and 5x5 cases
* trying to fix warnings and test failures
* replaced #if 0 with #if IPP_DISABLE_BLOCK
dotProd_16s - disabled for IPP 9.0.0;
filter2D - fixed kernel preparation;
morphology - conditions fix and disabled FilterMin and FilterMax for IPP 9.0.0;
GaussianBlur - disabled for CV_8UC1 due to buffer overflow;
integral - disabled for IPP 9.0.0;
IppAutoBuffer class was added;
IPP_VERSION_MAJOR * 100 + IPP_VERSION_MINOR*10 + IPP_VERSION_UPDATE
to manage changes between updates more easily.
IPP_DISABLE_BLOCK was added to ease tracking of disabled IPP functions;
Removed IPP port for tiny arithm.cpp functions
Additional warnings fix on various platforms.
Build without OPENCL and GCC warnings fixed
Fixed warnings, trailing spaces and removed unused secure_cpy.
IPP code refactored.
IPP code path implemented as separate static functions to simplify future work with IPP code and make it more readable.
Conflicts:
modules/gpu/perf/perf_imgproc.cpp
Cast a long integer to double explicitly.
Conflicts:
modules/python/src2/cv2.cpp
Cast some matrix sizes to type int.
Change some vector mask types to unsigned.
Conflicts:
modules/core/src/arithm.cpp
IPP can be switched on and off on runtime;
Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;
TS modifications for implementations control;