- initialize arithmetic dispatcher
- add new universal intrinsic v_absdiffs
- add new universal intrinsic v_pack_b
- add accumulate version of universal intrinsic v_round
- fix sse/avx2:uint8 multiplication overflow
- reimplement arithmetic, logic and comparison operations into wide universal intrinsics
with full support for all types
- reimplement IPP arithmetic, logic and comparison operations in a sperate file arithm_ipp.hpp
- avoid scalar multiplication if scaling factor eq 1 and use integer multiplication
- move C arithmetic operations to precomp.hpp and delete [arithm_simd|arithm_core].hpp
- add compatibility with new opencv4 divide policy
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
* Remove isIntel check from deep learning layers
* Remove fp16->fp32 fallbacks where it's not necessary
* Fix Kernel::run to prevent localsize > globalsize
* rewrote Mat::convertTo() and convertScaleAbs() to wide universal intrinsics; added always-available and SIMD-optimized FP16<=>FP32 conversion
* fixed compile warnings
* fix some more compile errors
* slightly relaxed accuracy threshold for int->float conversion (since we now do it using single-precision arithmetics, not double-precision)
* fixed compile errors on iOS, Android and in the baseline C++ version (intrin_cpp.hpp)
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
* trying to fix ARM-neon builds
Intrinsics must be effective, so don't declare FP16 type/operations if there is no native support.
- CV_FP16: supports load/store into/from float32
- CV_SIMD_FP16: declares FP16 types and native FP16 operations
* core:OE-27 prepare universal intrinsics to expand (#11022)
* core:OE-27 prepare universal intrinsics to expand (#11022)
* core: Add universal intrinsics for AVX2
* updated implementation of wide univ. intrinsics; converted several OpenCV HAL functions: sqrt, invsqrt, magnitude, phase, exp to the wide universal intrinsics.
* converted log to universal intrinsics; cleaned up the code a bit; added v_lut_deinterleave intrinsics.
* core: Add universal intrinsics for AVX2
* fixed multiple compile errors
* fixed many more compile errors and hopefully some test failures
* fixed some more compile errors
* temporarily disabled IPP to debug exp & log; hopefully fixed Doxygen complains
* fixed some more compile errors
* fixed v_store(short*, v_float16&) signatures
* trying to fix the test failures on Linux
* fixed some issues found by alalek
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
* restored IPP optimization after the patch with AVX wide intrinsics has been properly tested
- 'if' logic is moved into templates.
- removed unnecessary cv::Mat objects creation.
- fixed inv() test (invA * A == eye)
- added more Matx tests to cover all defined template specializations