* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
* define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
* brush up implementation on old compiler (guard correctly)
* add test for v_load_f16 and round trip conversion of v_float16x4
* fix conversion error
* raise an error when wrong bit depth passed
* raise an build error when wrong depth is specified for cvtScaleHalf_
* remove unnecessary safe check in cvtScaleHalf_
* use intrinsic instead of direct pointer access
* update the explanation
* check compiler more strictly
* use gcc version of fp16 conversion if it's possible (gcc 4.7 and later)
* use current SW implementation in other cases
* check compiler support
* check HW support before executing
* add test doing round trip conversion from / to FP32
* treat array correctly if size is not multiple of 4
* add declaration to prevent warning
* make it possible to enable fp16 on 32bit ARM
* let the conversion possible on non-supported HW, too.
* add test using both HW and SW implementation
- added new functions from core module: split, merge, add, sub, mul, div, ...
- added function replacement mechanism
- added example of HAL replacement library
IPP_VERSION_MAJOR * 100 + IPP_VERSION_MINOR*10 + IPP_VERSION_UPDATE
to manage changes between updates more easily.
IPP_DISABLE_BLOCK was added to ease tracking of disabled IPP functions;
Removed IPP port for tiny arithm.cpp functions
Additional warnings fix on various platforms.
Build without OPENCL and GCC warnings fixed
Fixed warnings, trailing spaces and removed unused secure_cpy.
IPP code refactored.
IPP code path implemented as separate static functions to simplify future work with IPP code and make it more readable.
IPP can be switched on and off on runtime;
Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;
TS modifications for implementations control;