Alexander Alekhin
1e9ad5476d
core(intrin): drop hasSIMD128 checks
...
- use compile-time checks instead (`#if CV_SIMD128`)
- runtime checks are useless
2019-06-08 19:20:20 +00:00
bommo1
a38157a1f4
Fix https://github.com/opencv/opencv/issues/14265
2019-06-03 23:05:03 +02:00
Vitaly Tuzov
3b015dfc7d
Merge pull request #14210 from terfendail:wui_512
...
AVX512 wide universal intrinsics (#14210 )
* Added implementation of 512-bit wide universal intrinsics(WIP)
* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
2019-06-03 18:05:35 +03:00
Alexander Alekhin
aaf56c2839
Merge pull request #14649 from savuor:fix/luv_hls_read_oob
2019-05-27 16:24:55 +00:00
Alexander Alekhin
a81c0e6db9
Merge pull request #14447 from catree:fix_issue_14423
2019-05-27 15:00:21 +00:00
Rostislav Vasilikhin
8c698262ea
rgb2hls_b: out of bounds read fixed
2019-05-27 16:19:52 +03:00
Rostislav Vasilikhin
791ebd05fc
out of bounds read fixed in rgb2luv_b
2019-05-27 16:19:01 +03:00
Rostislav Vasilikhin
e07ffe902e
Merge pull request #14616 from savuor:hsv_wide
...
HSV and HLS color conversions rewritten to wide intrinsics (#14616 )
* RGB2HSV_b vectorized
* RGB2HSV_f: widen
* RGB2HSV_f: shorten, more intuitive
* HSV2RGB_f and HSV2RGB_b widen
* hls2rgb_f widen
* instrumentation instead vx_cleanup
* RGB2HLS_f widen
* RGB2HLS_b rewritten to wide universal intrinsics
* define guard against no SIMD code
* hls2rgb_b rewritten
* extra define removed
* warning fixed
* hls2rgb_b: performance fixed
2019-05-24 23:01:08 +03:00
Ahmed Ashour
f3319f6140
java: remove redundant declaration of java.lang package
2019-05-23 14:06:34 +02:00
catree
7ed858e38e
Fix issue with solvePnPRansac and Nx3 1-channel input when the number of points is 5. Try to uniform the input shape of projectPoints and undistortPoints.
2019-05-22 14:19:16 +02:00
Rostislav Vasilikhin
e90e0ef9aa
Merge pull request #14106 from savuor:lab_wide
...
Lab, Luv and XYZ conversions rewritten to wide intrinsics (#14106 )
* rgb2xyz<float> re-vectorized
* rgb2xyz_i vectorized for ushort and uchar
* xyz2rgb<float> vectorized
* xyz2rgb_i vectorized for both uchar and ushort
* intermediate conversions (int->float) rewritten
* packed rgb2luv rewritten
* (some) float conversions rewritten
* burnt volatile int _3 and similar
* RGB2Lab_b rewritten
* tests: logging made better
* RGB2Lab_f (LRGB path) rewritten
* Lab2RGBfloat rewritten
* Lab2RGBinteger and Lab2RGB_b rewritten to wide universal intrinsics
* Luv2RGBinteger wide vectorized
* RGB2Lab_b fixed: v_sub_wrap instead of saturated sub
* warnings fixed
* trying to fix compilation on older compilers
* using 16x8 registers for 8-element dot product
* cleanup added
* splineInterpolate: loop unrolled, perf fix for f32x4
* Lab2RGBfloat: grab 2x more data to process on f32x4
* nrepeats for Luv2RGBfloat, +20% perf
* minor
* nrepeats to RGB2Lab_f
* Lab2RGBinteger: no tab for linear BGR
* nrepeats for RGB2Luvfloat
* Luv2RGBinteger: no tab for linear RGB
* +10% more to perf of Luv2RGBfloat
* nrepeats for 256-simd for Lab2RGBfloat
* less warnings
* BOM removed
* CV_SIMD_WIDTH used for lanes number checking
* trilinearPackedInterpolate: 128-bit specialization added
* fix build; no vx_cleanup(), instrumentation instead
2019-05-20 21:10:20 +03:00
Alexander Alekhin
30a595789c
Merge pull request #14463 from thangktran:thangktran/fix-imgproc-intersectConvexConvex
2019-05-16 14:50:20 +00:00
Thang Tran
1aff378ae8
imgproc: fixed bug from intersectConvexConvex
...
Added checks for all of vertices from each contour instead of checking
only for the first vertex.
2019-05-01 11:06:30 +02:00
Alexander Alekhin
1c180f4c7f
imgproc: fix RemoveOverlaps() with empty input vector
2019-04-29 21:15:23 +00:00
Suleyman TURKMEN
3f9343e238
Update imgproc.hpp
2019-04-22 00:48:11 +03:00
Alexander Alekhin
9dccfe2a96
Merge pull request #13917 from sturkmen72:removed_c_api
2019-04-17 19:04:33 +00:00
Brad Kelly
0fe17eeb68
Implementing AVX512 Support for 1 channel mats for CV_64F format
2019-03-22 09:44:23 -07:00
Alexander Alekhin
8c8715c4dd
fix static analysis issues
2019-03-13 17:19:39 +03:00
Alexander Alekhin
2c07c6718f
imgproc: dispatch morph
2019-03-11 13:54:12 +00:00
Alexander Alekhin
5a01227aa1
imgproc: dispatch box_filter
2019-03-11 13:54:12 +00:00
Alexander Alekhin
ce3c92eb1f
imgproc: dispatch bilateral_filter
2019-03-11 13:54:12 +00:00
Alexander Alekhin
b99c9145bf
imgproc: dispatch smooth
2019-03-11 13:54:12 +00:00
Alexander Alekhin
6ec08f268f
imgproc: dispatch medianBlur
2019-03-11 13:54:12 +00:00
Alexander Alekhin
8546ac3ce6
imgproc: get rid of filter.avx2.cpp
2019-03-11 13:54:12 +00:00
Alexander Alekhin
9a8dbfd57f
imgproc: dispatch filter.cpp
2019-03-11 13:54:12 +00:00
Alexander Alekhin
756a98a395
imgproc: keep history of filters files
2019-03-11 13:54:07 +00:00
Alexander Alekhin
9dc7554089
imgproc: copy .dispatch.cpp
2019-03-11 13:53:59 +00:00
Alexander Alekhin
6eac8f78b9
imgproc: copy .simd.hpp
2019-03-11 13:53:59 +00:00
Alexander Alekhin
7e8cc580c9
Merge pull request #13997 from alalek:imgproc_dispatch_cvtcolor
2019-03-08 16:18:44 +00:00
Alexander Alekhin
8b541e450b
imgproc: dispatch color*
...
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
2019-03-07 15:45:05 +03:00
Alexander Alekhin
39783a6584
core: keep history of color*.cpp
2019-03-07 15:38:13 +03:00
Alexander Alekhin
f26912960f
imgproc: clone color*.dispatch.cpp
2019-03-07 15:35:49 +03:00
Alexander Alekhin
db588bb831
imgproc: clone color*.simd.hpp
2019-03-07 15:35:13 +03:00
Alexander Alekhin
d5a2fe5180
perf: ignore _ovx tests
2019-03-06 15:52:23 +03:00
Vitaly Tuzov
99b39aa5bd
Fixed out of bound reading in LINEAR_EXACT resize for 8UC3
2019-03-05 17:21:21 +03:00
Suleyman TURKMEN
3d1dbd2ccd
clean up C API
2019-03-03 21:43:27 +03:00
Alexander Alekhin
3ba49ccecc
imgproc: removed LSD code due original code license conflict
2019-03-01 16:25:39 +03:00
Vitaly Tuzov
9548093b46
Horizontal line processing for pyrDown() reworked using wide universal intrinsics.
2019-02-28 00:12:57 +03:00
Alexander Alekhin
1db5d82b7f
Merge pull request #13844 from brad-kelly:integral_avx512_cn234
2019-02-20 12:27:16 +00:00
Vitaly Tuzov
334c4d62b5
Merge pull request #13781 from terfendail:warp_wintr
...
Resize reworked using wide universal intrinsics (#13781 )
* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
* Reworked linear resize using new wide LUT intrinsics
* Fix for VSX intrinsics
2019-02-20 14:30:28 +03:00
Brad Kelly
507f8add1c
Implementing AVX512 Support for 2 and 4 channel mats for CV_64F format
2019-02-19 11:31:20 -08:00
Alexander Alekhin
757d8ac8f7
Merge pull request #13769 from savuor:cvtColor_tests_16u_32f
2019-02-08 15:29:35 +00:00
Alexander Alekhin
8f7e92e466
Merge pull request #13764 from nglee:dev_CudaCLAHE16bitSupport
2019-02-08 10:13:11 +00:00
Rostislav Vasilikhin
4e679e1cc5
disabled 16u and 32f perf tests
2019-02-07 19:26:36 +03:00
Rostislav Vasilikhin
87f651c119
disabled sanity check for 32f
2019-02-07 18:20:29 +03:00
Vitaly Tuzov
07c10d6fc3
Fixed out of bound reading issue in erode() and dilate()
2019-02-07 17:28:58 +03:00
Namgoo Lee
fb8e652c3f
Add CV_16UC1 support for cuda::CLAHE
...
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
2019-02-06 17:21:55 +00:00
Rostislav Vasilikhin
bbedebb57c
perf tests for cvtColor for 16U and 32f added
2019-02-06 17:56:44 +03:00
Rostislav Vasilikhin
554eae56d1
Merge pull request #13708 from savuor:yuv42x_wide
...
YUV42x color conversions rewritten to wide intrinsics (#13708 )
* a*b+c -> fma
* YUV420sp2RGB initially vectorized
* shorter var names
* loops by 4
* yuv420p2rgb vectorized
* yuv422toRGB vectorized
* reg arrays
* rgb2yuv420 vectorized
* warnings fixed
* try to fix align error
2019-02-01 19:09:31 +03:00
Vitaly Tuzov
2f5af1bd33
Merge pull request #13693 from terfendail:spatialgrad_wintr
...
* spatialGradient() reworked to use wide universal intrinsics
* Moved row pointers inside loops
2019-01-30 22:37:27 +03:00