Commit Graph

2958 Commits

Author SHA1 Message Date
Alexander Alekhin
61c4cfd896 imgproc(resize): drop unused 'pix_size4' 2020-03-29 02:41:50 +00:00
Alexander Alekhin
be17f532e1 imgproc(resize): fix resizeNNInvoker handling of generic pixel size 2020-03-29 02:41:41 +00:00
Alexander Alekhin
0fb4f2cc9c imgproc: add src.empty() checks in filter operations 2020-03-20 21:04:29 +00:00
Alexander Alekhin
fd09413566
Merge pull request #16731 from alalek:issue_16708
* imgproc(integral): avoid OOB access

* imgproc(test): fix integral perf check

- FP32 computation is not accurate

* imgproc(integral): tune loop limits
2020-03-04 19:28:04 +00:00
Chip Kerchner
8c24af66bd
Merge pull request #16556 from ChipKerchner:vectorizeIntegralSumPixels
* Vectorize calculating integral for line for single and multiple channels

* Single vector processing for 4-channels - 25-30% faster

* Single vector processing for 4-channels - 25-30% faster

* Fixed AVX512 code for 4 channels

* Disable 3 channel 8UC1 to 32S for SSE2 and SSE3 (slower).  Use new version of 8UC1 to 64F for AVX512.
2020-02-28 19:34:06 +03:00
Vadim Pisarevsky
8f3867756c
Merge pull request #16594 from vpisarev:hull_ordering_fix
fixed the ordering of contour convex hull points

* partially fixed the issue #4539

* fixed warnings and test failures

* fixed integer overflow (issue #14521)

* added comment to force buildbot to re-run

* extended the test for the issue 4539. Check the expected behaviour on the original contour as well

* added comment; fixed typo, renamed another variable for a little better clarity

* added yet another part to the test for issue #4539, where we run convexHull and convexityDetects on the original contour, without any manipulations. the rest of the test stays the same
2020-02-21 18:18:24 +03:00
Vadim Pisarevsky
07b475062f
Merge pull request #16608 from vpisarev:fix_mac_ocl_tests
* fixed several problems when running tests on Mac:
* OCL_pyrUp
* OCL_flip
* some basic UMat tests
* histogram badarg test (out of range access)

* retained the storepix fix in ocl_flip only for 16U/16S datatype, where the OpenCL compiler on Mac generates incorrect code

* moved deletion of ACCESS_FAST flag to non-SVM branch (where SVM is shared virtual memory (in OpenCL 2.x), not support vector machine)

* force OpenCL to use read/write for GPU<=>CPU memory transfers on machines with discrete video only on Macs. On Windows/Linux the drivers are seemingly smart enough to implement map/unmap properly (and maybe more efficiently than explicit read/write)
2020-02-21 16:13:41 +03:00
Alexander Alekhin
3a546aa380 imgproc: revert resize changes from PR 16497 2020-02-17 15:23:59 +03:00
keeper121
d84360e7f3
Merge pull request #16497 from keeper121:master
* Fix NN resize with dimentions > 4

* add test check for nn resize with channels > 4

* Change types from float to double

* Del unnecessary test file. Move nn test to test_imgwarp. Add 5 channels test only.
2020-02-16 19:33:25 +03:00
Alexander Alekhin
d917f889b1 Merge pull request #16504 from alalek:issue_16501 2020-02-04 16:39:17 +00:00
Vadim Pisarevsky
e50acb923e
Merge pull request #16495 from vpisarev:drawing_aa_border_fix
* fixed antialiased line rendering to process image border correctly

* fixed warning on Windows

* imgproc(test): circle drawing regression
2020-02-04 19:37:33 +03:00
Alexander Alekhin
f67c8e37d6 imgproc(resize): drop optimization for channels>4 2020-02-04 17:14:52 +03:00
Vadim Pisarevsky
5c6d319ebc
Merge pull request #16493 from vpisarev:bordertype_sgbm_doc_fixes
* added note about BORDER_TYPE in separable filters; fixed SGBMStereo description

* added # to BORDER_ constants to generate hyperlinks
2020-02-04 14:30:16 +03:00
Arnaud Brejeon
ecbba852cf
Merge pull request #16415 from arnaudbrejeon:bug_fix_16410
* Fix bug 16410 and add test

* imgproc(connectedcomponents): avoid manual uninitialized allocations

* imgproc(connectedcomponents): force 'odd' chunk range size

* imgproc(connectedcomponents): reuse stripeFirstLabel{4/8}Connectivity

* imgproc(connectedcomponents): extend fix from PR14964
2020-01-29 23:55:43 +03:00
Alexander Alekhin
76c21b73aa Merge pull request #16374 from alalek:imgproc_dispatch_sumpixels 2020-01-24 21:21:48 +00:00
Alexander Alekhin
881cee4d8f Merge pull request #16146 from pmur:reg_16137x2 2020-01-22 13:53:51 +00:00
Alexander Alekhin
09b3383a7e imgproc: dispatch sumpixels (integral) 2020-01-17 16:54:29 +03:00
Alexander Alekhin
b4316af834 imgproc: rename sumpixels.avx512_skx.{cpp,hpp} 2020-01-17 16:50:08 +03:00
Alexander Alekhin
358139b6b9 imgproc(dispatch): keep history of sumpixels.cpp 2020-01-16 15:09:10 +03:00
Alexander Alekhin
c6a622542d imgproc: copy sumpixels.dispatch.cpp 2020-01-16 15:07:48 +03:00
Alexander Alekhin
4ecbcf0885 imgproc: copy sumpixels.simd.hpp 2020-01-16 15:06:34 +03:00
Alexander Alekhin
e180cc050b
Merge pull request #16236 from alalek:fix_core_simd_emulator
* core: fix intrin_cpp, allow to build modules with SIMD emulator

* core(arithm): fix v_zero initialization

* core(simd): 'strict' types for binary/bitwise operations

* features2d: avoid aligned load issue in GCC 5.4 with emulated SIMD

* core(simd): alignment checks in SIMD emulator
2020-01-10 21:31:02 +03:00
Paul E. Murphy
c1cdb2416a imgproc(resize): improve 8u3 HResize vector exit calc
Actually, we can do this in constant time. xofs always
contains same or increasing offset values. We can instead
find the most extreme value used and never attempt to load it.

Similarly, we can note for all dx >= 0 and dx < (dwidth - cn)
where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn).

Thus, we can use this to control our loop termination optimally.

This fixes #16137 with little or no performance impact. I have
also added a debug check as a sanity check.
2020-01-03 14:46:59 -06:00
Alexander Alekhin
40ac72a8f1 Merge pull request #16238 from alalek:imgproc_resize_fix_types 2020-01-03 16:30:28 +00:00
Brian Wignall
f9c514b391 Fix spelling typos
backport commit 659ffaddb4
2019-12-27 12:46:53 +00:00
Alexander Alekhin
07729e396d imgproc(resize): avoid unnecessary type conversions 2019-12-26 00:02:52 +00:00
Alexander Alekhin
4733a19bab
Merge pull request #16194 from alalek:fix_16192
* imgproc(test): resize(LANCZOS4) reproducer 16192

* imgproc: fix resize LANCZOS4 coefficients generation
2019-12-19 13:20:42 +03:00
Alexander Alekhin
a8345133ac Merge pull request #16191 from terfendail:lres2c_fix 2019-12-18 22:31:52 +00:00
Vitaly Tuzov
f5a84f75c4 Fix for CV_8UC2 linear resize vectorization 2019-12-18 21:41:36 +00:00
mcellis33
5d15c65e48 Merge pull request #16136 from mcellis33:mec-nan
* Handle det == 0 in findCircle3pts.

Issue 16051 shows a case where findCircle3pts returns NaN for the
center coordinates and radius due to dividing by a determinant of 0. In
this case, the points are colinear, so the longest distance between any
2 points is the diameter of the minimum enclosing circle.

* imgproc(test): update test checks for minEnclosingCircle()

* imgproc: fix handling of special cases in minEnclosingCircle()
2019-12-18 17:25:59 +03:00
Paul Murphy
1c4a64f0a1 Merge pull request #16138 from pmur:reg_16137
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization

The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.

The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.

Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.

This should resolve #16137

* imgproc(resize): extra check
2019-12-12 13:00:44 +03:00
shimat
b89581960c s/Voroni/Voronoi/g 2019-12-11 09:13:58 +09:00
Maksim Shabunin
435c97c7a2 imgproc: add parameter checks in calcHist and calcBackProj 2019-12-10 16:10:19 +03:00
RAJKIRAN NATARAJAN
b9435b9e38 Merge pull request #16094 from saskatchewancatch:issue-16053
* Add eps error checking for approxPolyDP to allow sensible values only
for epsilon value of Douglas-Peucker algorithm.

* Review changes for PR
2019-12-09 22:24:35 +03:00
Paul Murphy
a011035ed6 Merge pull request #15257 from pmur:resize
* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
Alexander Alekhin
734de34b7a
Merge pull request #16085 from alalek:imgproc_threshold_to_zero_ipp_bug
* imgproc(IPP): wrong result from threshold(THRESH_TOZERO)

* imgproc(IPP): disable IPP code to pass THRESH_TOZERO test
2019-12-09 14:51:02 +03:00
Alexander Alekhin
b369c456f2 imgproc(color): clarify error message 2019-12-06 13:25:51 +03:00
Brian Wignall
af997529a1 Fix some typos 2019-11-26 18:41:19 +03:00
Everton Constantino
75315fb297 Merge pull request #15494 from everton1984:hal_vector_get_n
Improving VSX performance of integral function

* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.

* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.

* Adding SSE/NEON/AVX intrinsics.

* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.

* core(simd): add tests for v_extract_n/v_broadcast_element

- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations

* core(simd): fix compilation

- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only

* cleanup
2019-11-20 13:41:07 +03:00
clunietp
2185bce4b7 Fix 13577 2019-11-18 07:41:34 -05:00
Alexander Alekhin
f4d55d512f
imgproc: fix bit-exact GaussianBlur() / sepFilter2D() (#15855)
* imgproc: fix bit-exact GaussianBlur() / sepFilter2D()

- avoid kernels with bad approximation
- GaussiabBlur - apply error-diffusion approximation for kernel (8-bit fraction)

* java(test): update features2d ref data

* test: update test_facedetect
2019-11-18 01:39:27 +03:00
Alexander Alekhin
686ea5c1a6 Merge pull request #15917 from ChipKerchner:demosaicingToHal2 2019-11-16 19:45:37 +00:00
ChipKerchner
1d33335e33 Convert demosiacing with variable number of gradients to HAL - 5.5x faster 2019-11-15 07:42:03 -06:00
Alexander Alekhin
6773b938b3 Merge pull request #15896 from alalek:build_gcc_9 2019-11-14 14:22:02 +00:00
Alexander Alekhin
763b80d5fa imgproc(IPP): disable ippiDistanceTransform_3x3_8u32f_C1R 2019-11-13 14:14:19 +03:00
Alexander Alekhin
7ecdcf6ca6 build: GCC9 compilation 2019-11-12 18:49:34 +03:00
Chip Kerchner
2112aa31e6 Merge pull request #15828 from ChipKerchner:momentsToHal
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).

* Adding NEON code back in for non 64-bit platforms.

* Remove floats from post processing.
2019-11-05 18:52:35 +03:00
Ciprian Alexandru Pitis
d2e02779c4 Merge pull request #15799 from Cpitis:feature/parallelization
Parallelize pyrDown & calcSharrDeriv

* ::pyrDown has been parallelized

* CalcSharrDeriv parallelized

* Fixed whitespace

* Set granularity based on amount of threads enabled

* Granularity changed to cv::getNumThreads, now each thread should receive 1/n sized stripes

* imgproc: move PyrDownInvoker<CastOp>::operator() implementation

* imgproc(pyramid): remove syloopboundary()

* video: SharrDerivInvoker replace 'Mat*' => 'Mat&' fields
2019-10-31 23:38:49 +03:00
Alexander Alekhin
bad4e5c3eb Merge pull request #15692 from alalek:core_tls_handle_thread_termination 2019-10-29 20:40:35 +00:00
Alexander Alekhin
7cf1054d36 Merge pull request #15764 from ChipKerchner:demosaicingToHal 2019-10-25 13:49:46 +00:00