opencv

mirror of https://github.com/opencv/opencv.git synced 2024-11-30 06:10:02 +08:00

Author	SHA1	Message	Date
Vitaly Tuzov	3b015dfc7d	Merge pull request #14210 from terfendail:wui_512 AVX512 wide universal intrinsics (#14210) * Added implementation of 512-bit wide universal intrinsics(WIP) * Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP) * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store * Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store * Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics * Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations * Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons * Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction * Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values * Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float * Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT * Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images * Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave * Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks * Added implementation of 512-bit wide universal intrinsics(WIP): build fixes * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16 * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16 * Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros * Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part * Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings * Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize * Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask() * Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces * Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines * Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable. * Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask. * Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512() * Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build * Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.	2019-06-03 18:05:35 +03:00
Alexander Alekhin	48e8e76a34	fix build warnings	2018-09-27 16:31:31 +03:00
Alexander Alekhin	3f302cabb8	core(test): intrinsic tests for all dispatched CPU optimizations - tests for both SIMD128 / SIMD256 - different dispatched + baseline(SIMD128) intrinsics	2018-08-01 13:50:42 +03:00
Sayed Adel	6499263b41	core:test Expand hal_intrin tests to support SIMD256	2018-07-30 08:50:50 +02:00
Vadim Pisarevsky	f058b5fb1e	Wide univ intrinsics (#11953 ) * core:OE-27 prepare universal intrinsics to expand (#11022) * core:OE-27 prepare universal intrinsics to expand (#11022) * core: Add universal intrinsics for AVX2 * updated implementation of wide univ. intrinsics; converted several OpenCV HAL functions: sqrt, invsqrt, magnitude, phase, exp to the wide universal intrinsics. * converted log to universal intrinsics; cleaned up the code a bit; added v_lut_deinterleave intrinsics. * core: Add universal intrinsics for AVX2 * fixed multiple compile errors * fixed many more compile errors and hopefully some test failures * fixed some more compile errors * temporarily disabled IPP to debug exp & log; hopefully fixed Doxygen complains * fixed some more compile errors * fixed v_store(short, v_float16&) signatures trying to fix the test failures on Linux * fixed some issues found by alalek * restored IPP optimization after the patch with AVX wide intrinsics has been properly tested * restored IPP optimization after the patch with AVX wide intrinsics has been properly tested	2018-07-16 18:57:24 +03:00
Alexander Alekhin	352510cc19	core: fix ARM intrinsincs '0' is specific case (make no sence as a standalone operation), but it can be useful in template-based programming. reverts commit: `a58c9d4d63`	2018-05-09 23:31:02 +03:00
Tomoaki Teshima	a58c9d4d63	arm: fix build error of v_rotate_left * remove meaningless tests	2018-05-08 00:35:18 +09:00
Ryan Wong	6f675ae75b	Merge pull request #11304 from kinchungwong:issue_11242_intrin_cv34x_nocpp11 * Issue 11242 intrinsics v_extract, v_rotate improvement, branch 3.4, without C++11 (remove type restrictions for SSE2, use PALIGNR on SSSE3, compile to no-op when imm is 0 or nlanes). * fix whitespace * Fix #11242 (NEON intrinsics v_rotate...) branch 3.4 Separate macro expansion OPENCV_HAL_IMPL_NEON_SHIFT_OP for bitwise shifts for integers, from macro expansion OPENCV_HAL_IMPL_NEON_ROTATE for lane rotations. Bitwise shifts do not apply to floats, but lane-rotations can apply to both. * fix whitespace * Fix #11242 compile error (VSX intrinsics v_rotate(a)) branch 3.4 no-c++11	2018-04-20 18:43:47 +03:00
Alexander Alekhin	5a791e6e06	cmake: update reporting of excluded dispatching files (#10711 ) * cmake: add ocv_get_smart_file_name() macro * cmake: avoid adding files for unavailable dispatch modes	2018-02-12 14:48:20 +03:00
Alexander Alekhin	4a297a2443	ts: refactor OpenCV tests - removed tr1 usage (dropped in C++17) - moved includes of vector/map/iostream/limits into ts.hpp - require opencv_test + anonymous namespace (added compile check) - fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions - added missing license headers	2018-02-03 19:39:47 +00:00
Tomoaki Teshima	3cbe60cca2	Merge pull request #9753 from tomoaki0705:universalMatmul * add accuracy test and performance check for matmul * add performance tests for transform and dotProduct * add test Core_TransformLargeTest for 8u version of transform * remove raw SSE2/NEON implementation from matmul.cpp * use universal intrinsic instead of raw intrinsic * remove unused templated function * add v_matmuladd which multiply 3x3 matrix and add 3x1 vector * add v_rotate_left/right in universal intrinsic * suppress intrinsic on some function and platform * add pure SW implementation of new universal intrinsics * add test for new universal intrinsics * core: prevent memory access after the end of buffer * fix perf tests	2017-11-20 15:56:53 +03:00
Alexander Alekhin	b18983a005	test(hal): properly dispatch FP16 test	2017-08-24 20:54:17 +00:00
Boris Fomitchev	76f7fb5231	Extending CPU dispatch to the tests; fixing a typo	2017-08-21 20:58:12 -07:00
Alexander Alekhin	e23b59da5c	build: fix v_reduce_sum4 (requires SSE3)	2017-06-14 09:37:06 +00:00
Alexander Alekhin	e5d9b608c4	cmake: fix fp16 support	2017-04-04 20:34:58 +03:00
Tomoaki Teshima	8b22099da2	use universal intrinsic and SSE4 popcount instruction in normHamming - add v_popcount in universal intrinsic - add test for v_popcount - add wrapper of popcount for both MSVC and GCC	2017-01-12 09:09:22 +09:00
Tomoaki Teshima	b823c8e95c	add universal intrinsic in StereoSGBM * add 8 elements version of reduce operation * add tests for new universal intrinsic	2016-10-28 21:47:13 +09:00
Tomoaki Teshima	841ccccada	use universal intrinsic in canny * add v_abs for universal intrinsic * add test of v_abs in test_intrin * fix compile error on gcc * fix bool OR operation	2016-10-03 13:23:43 +09:00
Tomoaki Teshima	c7cb116dc0	check FP16 build condition correctly * use __GNUC_MINOR__ in correct place to check the version of GCC * check processor support of FP16 at run time * check compiler support of FP16 and pass correct compiler option * rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed * guard correctly using ifdef in case of various configuration * use v_float16x4 correctly by including the right header file	2016-09-23 11:04:22 +09:00
Tomoaki Teshima	903789f7af	use universal intrinsic for FP16 * use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation * define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed * brush up implementation on old compiler (guard correctly) * add test for v_load_f16 and round trip conversion of v_float16x4 * fix conversion error	2016-09-05 08:13:52 +09:00
Maksim Shabunin	28db4a2207	Merge pull request #7175 from tomoaki0705:featureIntrinsic64	2016-09-02 10:16:44 +00:00
Tomoaki Teshima	7fef96be1e	add 64F intrinsic in HAL NEON * use universal intrinsic for accumulate series using float/double * accumulate, accumulateSquare, accumulateProduct and accumulateWeighted * add v_cvt_f64_high in both SSE/NEON * add test for conversion v_cvt_f64_high in test_intrin.cpp * improve some existing universal intrinsic by using new instructions in Aarch64 * add workaround for Android build in intrin_neon.hpp	2016-08-30 17:21:02 +09:00
Matthew Self	9678d48e1a	2-channel interleaved load/store for universal intrinsics (float only) * Added 2-channel ops to match existing 3-channel and 4-channel ops * v_load_deinterleave() and v_store_interleave() * Implements float32x4 only on SSE (but all types on NEON and CPP) * Includes tests * Will be used to vectorize 2D functions, such as estimateAffine2D()	2016-08-26 18:17:08 -07:00
Maksim Shabunin	1e667de1f3	HAL math interfaces: fastAtan2, magnitude, sqrt, invSqrt, log, exp	2016-05-31 11:54:52 +03:00
Tomoaki Teshima	7077d1de63	fix hal_intrin test on 64bit ARM * fix issue 6521 * use correct comparison	2016-05-12 18:30:09 +09:00
Maksim Shabunin	84f37d352f	HAL moved back to core	2015-12-17 12:33:23 +03:00

26 Commits