mirror of
https://github.com/opencv/opencv.git
synced 2024-11-24 03:00:14 +08:00
4.x
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Vitaly Tuzov
|
3b015dfc7d |
Merge pull request #14210 from terfendail:wui_512
AVX512 wide universal intrinsics (#14210) * Added implementation of 512-bit wide universal intrinsics(WIP) * Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP) * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store * Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store * Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics * Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations * Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons * Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction * Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values * Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float * Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT * Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images * Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave * Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks * Added implementation of 512-bit wide universal intrinsics(WIP): build fixes * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16 * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16 * Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros * Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics * Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part * Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings * Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2 * Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize * Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask() * Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces * Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines * Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable. * Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask. * Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512() * Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build * Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation. |