Commit Graph

2944 Commits

Author SHA1 Message Date
Alexander Alekhin
76c21b73aa Merge pull request #16374 from alalek:imgproc_dispatch_sumpixels 2020-01-24 21:21:48 +00:00
Alexander Alekhin
881cee4d8f Merge pull request #16146 from pmur:reg_16137x2 2020-01-22 13:53:51 +00:00
Alexander Alekhin
09b3383a7e imgproc: dispatch sumpixels (integral) 2020-01-17 16:54:29 +03:00
Alexander Alekhin
b4316af834 imgproc: rename sumpixels.avx512_skx.{cpp,hpp} 2020-01-17 16:50:08 +03:00
Alexander Alekhin
358139b6b9 imgproc(dispatch): keep history of sumpixels.cpp 2020-01-16 15:09:10 +03:00
Alexander Alekhin
c6a622542d imgproc: copy sumpixels.dispatch.cpp 2020-01-16 15:07:48 +03:00
Alexander Alekhin
4ecbcf0885 imgproc: copy sumpixels.simd.hpp 2020-01-16 15:06:34 +03:00
Alexander Alekhin
e180cc050b
Merge pull request #16236 from alalek:fix_core_simd_emulator
* core: fix intrin_cpp, allow to build modules with SIMD emulator

* core(arithm): fix v_zero initialization

* core(simd): 'strict' types for binary/bitwise operations

* features2d: avoid aligned load issue in GCC 5.4 with emulated SIMD

* core(simd): alignment checks in SIMD emulator
2020-01-10 21:31:02 +03:00
Paul E. Murphy
c1cdb2416a imgproc(resize): improve 8u3 HResize vector exit calc
Actually, we can do this in constant time. xofs always
contains same or increasing offset values. We can instead
find the most extreme value used and never attempt to load it.

Similarly, we can note for all dx >= 0 and dx < (dwidth - cn)
where xofs[dx] + cn < xofs[dwidth-cn] implies dx < (dwidth - cn).

Thus, we can use this to control our loop termination optimally.

This fixes #16137 with little or no performance impact. I have
also added a debug check as a sanity check.
2020-01-03 14:46:59 -06:00
Alexander Alekhin
40ac72a8f1 Merge pull request #16238 from alalek:imgproc_resize_fix_types 2020-01-03 16:30:28 +00:00
Brian Wignall
f9c514b391 Fix spelling typos
backport commit 659ffaddb4
2019-12-27 12:46:53 +00:00
Alexander Alekhin
07729e396d imgproc(resize): avoid unnecessary type conversions 2019-12-26 00:02:52 +00:00
Alexander Alekhin
4733a19bab
Merge pull request #16194 from alalek:fix_16192
* imgproc(test): resize(LANCZOS4) reproducer 16192

* imgproc: fix resize LANCZOS4 coefficients generation
2019-12-19 13:20:42 +03:00
Alexander Alekhin
a8345133ac Merge pull request #16191 from terfendail:lres2c_fix 2019-12-18 22:31:52 +00:00
Vitaly Tuzov
f5a84f75c4 Fix for CV_8UC2 linear resize vectorization 2019-12-18 21:41:36 +00:00
mcellis33
5d15c65e48 Merge pull request #16136 from mcellis33:mec-nan
* Handle det == 0 in findCircle3pts.

Issue 16051 shows a case where findCircle3pts returns NaN for the
center coordinates and radius due to dividing by a determinant of 0. In
this case, the points are colinear, so the longest distance between any
2 points is the diameter of the minimum enclosing circle.

* imgproc(test): update test checks for minEnclosingCircle()

* imgproc: fix handling of special cases in minEnclosingCircle()
2019-12-18 17:25:59 +03:00
Paul Murphy
1c4a64f0a1 Merge pull request #16138 from pmur:reg_16137
* imgproc: Prevent 1B overrun of 8C3 SIMD optimization

The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.

The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.

Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.

This should resolve #16137

* imgproc(resize): extra check
2019-12-12 13:00:44 +03:00
shimat
b89581960c s/Voroni/Voronoi/g 2019-12-11 09:13:58 +09:00
Maksim Shabunin
435c97c7a2 imgproc: add parameter checks in calcHist and calcBackProj 2019-12-10 16:10:19 +03:00
RAJKIRAN NATARAJAN
b9435b9e38 Merge pull request #16094 from saskatchewancatch:issue-16053
* Add eps error checking for approxPolyDP to allow sensible values only
for epsilon value of Douglas-Peucker algorithm.

* Review changes for PR
2019-12-09 22:24:35 +03:00
Paul Murphy
a011035ed6 Merge pull request #15257 from pmur:resize
* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
Alexander Alekhin
734de34b7a
Merge pull request #16085 from alalek:imgproc_threshold_to_zero_ipp_bug
* imgproc(IPP): wrong result from threshold(THRESH_TOZERO)

* imgproc(IPP): disable IPP code to pass THRESH_TOZERO test
2019-12-09 14:51:02 +03:00
Alexander Alekhin
b369c456f2 imgproc(color): clarify error message 2019-12-06 13:25:51 +03:00
Brian Wignall
af997529a1 Fix some typos 2019-11-26 18:41:19 +03:00
Everton Constantino
75315fb297 Merge pull request #15494 from everton1984:hal_vector_get_n
Improving VSX performance of integral function

* Adding support for vector get function on VSX datatypes so the
integral function gains a bit of performance.

* Removing get as a datatype member function and implementing a new HAL
instruction v_extract_n to get the n-th element of a vector register.

* Adding SSE/NEON/AVX intrinsics.

* Implement new HAL instruction v_broadcast_element on VSX/AVX/NEON/SSE.

* core(simd): add tests for v_extract_n/v_broadcast_element

- updated docs
- commented out code to repair compilation
- added WASM and MSA default implementations

* core(simd): fix compilation

- x86: avoid _mm256_extract_epi64/32/16/8 with MSVS 2015
- x86: _mm_extract_epi64 is 64-bit only

* cleanup
2019-11-20 13:41:07 +03:00
clunietp
2185bce4b7 Fix 13577 2019-11-18 07:41:34 -05:00
Alexander Alekhin
f4d55d512f
imgproc: fix bit-exact GaussianBlur() / sepFilter2D() (#15855)
* imgproc: fix bit-exact GaussianBlur() / sepFilter2D()

- avoid kernels with bad approximation
- GaussiabBlur - apply error-diffusion approximation for kernel (8-bit fraction)

* java(test): update features2d ref data

* test: update test_facedetect
2019-11-18 01:39:27 +03:00
Alexander Alekhin
686ea5c1a6 Merge pull request #15917 from ChipKerchner:demosaicingToHal2 2019-11-16 19:45:37 +00:00
ChipKerchner
1d33335e33 Convert demosiacing with variable number of gradients to HAL - 5.5x faster 2019-11-15 07:42:03 -06:00
Alexander Alekhin
6773b938b3 Merge pull request #15896 from alalek:build_gcc_9 2019-11-14 14:22:02 +00:00
Alexander Alekhin
763b80d5fa imgproc(IPP): disable ippiDistanceTransform_3x3_8u32f_C1R 2019-11-13 14:14:19 +03:00
Alexander Alekhin
7ecdcf6ca6 build: GCC9 compilation 2019-11-12 18:49:34 +03:00
Chip Kerchner
2112aa31e6 Merge pull request #15828 from ChipKerchner:momentsToHal
* Convert moments in tile algorithms to HAL (1.3x faster for VSX).

* Adding NEON code back in for non 64-bit platforms.

* Remove floats from post processing.
2019-11-05 18:52:35 +03:00
Ciprian Alexandru Pitis
d2e02779c4 Merge pull request #15799 from Cpitis:feature/parallelization
Parallelize pyrDown & calcSharrDeriv

* ::pyrDown has been parallelized

* CalcSharrDeriv parallelized

* Fixed whitespace

* Set granularity based on amount of threads enabled

* Granularity changed to cv::getNumThreads, now each thread should receive 1/n sized stripes

* imgproc: move PyrDownInvoker<CastOp>::operator() implementation

* imgproc(pyramid): remove syloopboundary()

* video: SharrDerivInvoker replace 'Mat*' => 'Mat&' fields
2019-10-31 23:38:49 +03:00
Alexander Alekhin
bad4e5c3eb Merge pull request #15692 from alalek:core_tls_handle_thread_termination 2019-10-29 20:40:35 +00:00
Alexander Alekhin
7cf1054d36 Merge pull request #15764 from ChipKerchner:demosaicingToHal 2019-10-25 13:49:46 +00:00
Alexander Alekhin
17e2bf5717 core(tls): implement releasing of TLS on thread termination
- move TLS & instrumentation code out of core/utility.hpp
- (*) TLSData lost .gather() method (to dispose thread data on thread termination)
- use TLSDataAccumulator for reliable collecting of thread data
- prefer using of .detachData() + .cleanupDetachedData() instead of .gather() method

(*) API is broken: replace TLSData => TLSDataAccumulator if gather required
(objects disposal on threads termination is not available in accumulator mode)
2019-10-24 06:36:18 +00:00
ChipKerchner
c46f119e0e Convert demosaic functions to HAL 2019-10-23 10:47:07 -05:00
Steve Nicholson
acb3b3bd4d Add documentation and example program for intersectConvexConvex 2019-10-19 22:08:07 -07:00
jasjuang
4c7db02925 document CC_STAT_MAX in ConnectedComponentsTypes 2019-10-16 17:22:25 -07:00
Everton Constantino
9ca9249992 Merge pull request #15527 from everton1984:faster_acc
* Adding support for vectorized masking for uchar/ushort.

* Fixing bug where mask was zeroing the dst. Improved the way to calculate
the mask and tweaked for further performance improvements.

* Fixing mask comparison test.

* Restricting to one channel.

* Adding support for 3 channels, switch old approach to start using HAL's
v_select.
2019-10-11 18:32:59 +03:00
Alexander Alekhin
4748aca61f
Merge pull request #15642 from alalek:issue_15597 2019-10-08 00:33:20 +03:00
Alexander Alekhin
a007220c52 imgproc: update histogram test 2019-10-07 15:06:43 +03:00
Alexander Alekhin
f301f17b61 imgproc: accurate histogram value thresholding 2019-10-04 19:56:25 +03:00
Alexander Alekhin
c69245da1f imgproc: fix fitLine() implementation
- update optimal solutions on each iteration
2019-10-03 21:23:52 +00:00
Alexander Alekhin
f81e401cd0 imgproc: fix indexing issue in pyramids
UBSAN violation expression: 'tab = tabR - x;'
2019-09-26 18:09:47 +03:00
Vitaly Tuzov
1c17b3281a Fixed OOB reading in pyrDown 2019-09-25 13:24:17 +03:00
Vitaly Tuzov
7b3a752012 Fixed universal intrinsic undistort() implementation 2019-09-16 17:16:38 +03:00
Alexander Alekhin
e7b6753a10 imgproc: avoid manual memory allocation in connectedcomponents.cpp 2019-09-05 16:20:08 +03:00
Everton Constantino
76e403cf25 Merge pull request #15440 from everton1984:new_integral_tests
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.

* Disabling full tests by default.

* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.

* Giving proper names, making array static.
2019-09-04 19:14:00 +03:00