Commit Graph

15 Commits

Author SHA1 Message Date
Alexander Alekhin
b18983a005 test(hal): properly dispatch FP16 test 2017-08-24 20:54:17 +00:00
Boris Fomitchev
76f7fb5231 Extending CPU dispatch to the tests; fixing a typo 2017-08-21 20:58:12 -07:00
Alexander Alekhin
e23b59da5c build: fix v_reduce_sum4 (requires SSE3) 2017-06-14 09:37:06 +00:00
Alexander Alekhin
e5d9b608c4 cmake: fix fp16 support 2017-04-04 20:34:58 +03:00
Tomoaki Teshima
8b22099da2 use universal intrinsic and SSE4 popcount instruction in normHamming
- add v_popcount in universal intrinsic
 - add test for v_popcount
 - add wrapper of popcount for both MSVC and GCC
2017-01-12 09:09:22 +09:00
Tomoaki Teshima
b823c8e95c add universal intrinsic in StereoSGBM
* add 8 elements version of reduce operation
  * add tests for new universal intrinsic
2016-10-28 21:47:13 +09:00
Tomoaki Teshima
841ccccada use universal intrinsic in canny
* add v_abs for universal intrinsic
  * add test of v_abs in test_intrin
  * fix compile error on gcc
  * fix bool OR operation
2016-10-03 13:23:43 +09:00
Tomoaki Teshima
c7cb116dc0 check FP16 build condition correctly
* use __GNUC_MINOR__ in correct place to check the version of GCC
  * check processor support of FP16 at run time
  * check compiler support of FP16 and pass correct compiler option
  * rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed
  * guard correctly using ifdef in case of various configuration
  * use v_float16x4 correctly by including the right header file
2016-09-23 11:04:22 +09:00
Tomoaki Teshima
903789f7af use universal intrinsic for FP16
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
  * define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
  * brush up implementation on old compiler (guard correctly)
  * add test for v_load_f16 and round trip conversion of v_float16x4
  * fix conversion error
2016-09-05 08:13:52 +09:00
Maksim Shabunin
28db4a2207 Merge pull request #7175 from tomoaki0705:featureIntrinsic64 2016-09-02 10:16:44 +00:00
Tomoaki Teshima
7fef96be1e add 64F intrinsic in HAL NEON
* use universal intrinsic for accumulate series using float/double
  * accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
  * add v_cvt_f64_high in both SSE/NEON
  * add test for conversion v_cvt_f64_high in test_intrin.cpp
  * improve some existing universal intrinsic by using new instructions in Aarch64
  * add workaround for Android build in intrin_neon.hpp
2016-08-30 17:21:02 +09:00
Matthew Self
9678d48e1a 2-channel interleaved load/store for universal intrinsics (float only)
* Added 2-channel ops to match existing 3-channel and 4-channel ops

* v_load_deinterleave() and v_store_interleave()

* Implements float32x4 only on SSE (but all types on NEON and CPP)

* Includes tests

* Will be used to vectorize 2D functions, such as estimateAffine2D()
2016-08-26 18:17:08 -07:00
Maksim Shabunin
1e667de1f3 HAL math interfaces: fastAtan2, magnitude, sqrt, invSqrt, log, exp 2016-05-31 11:54:52 +03:00
Tomoaki Teshima
7077d1de63 fix hal_intrin test on 64bit ARM
* fix issue 6521
  * use correct comparison
2016-05-12 18:30:09 +09:00
Maksim Shabunin
84f37d352f HAL moved back to core 2015-12-17 12:33:23 +03:00