Commit Graph

284 Commits

Author SHA1 Message Date
Alexander Alekhin
2fed41dfa5 imgproc(test): test bitExact cases in OCL/sepFilter2D 2020-07-08 10:14:56 +00:00
Alexander Alekhin
a1b09a3734 imgproc(perf): add GaussianBlur cases for SIFT 2020-05-17 10:15:31 +00:00
Alexander Alekhin
fd09413566
Merge pull request #16731 from alalek:issue_16708
* imgproc(integral): avoid OOB access

* imgproc(test): fix integral perf check

- FP32 computation is not accurate

* imgproc(integral): tune loop limits
2020-03-04 19:28:04 +00:00
Paul Murphy
a011035ed6 Merge pull request #15257 from pmur:resize
* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
2019-12-09 14:54:06 +03:00
Everton Constantino
76e403cf25 Merge pull request #15440 from everton1984:new_integral_tests
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.

* Disabling full tests by default.

* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.

* Giving proper names, making array static.
2019-09-04 19:14:00 +03:00
luz.paz
fcc7d8dd4e Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`

backporting of commit: ec43292e1e
2019-08-16 17:34:29 +03:00
Alexander Alekhin
32772a5436 3.4: backported changes from 'master' branch 2019-08-14 16:36:08 +03:00
dcouwenh
d3cf0d2c06 Bayer VNG Demosaicing Fix #2 (Merge pull request #15086)
* Update demosaicing.cpp

Fixed calculation of Bs for non-green pixels.

* Fixed cvtColor perf test for bayer VNG
2019-07-30 23:49:46 +03:00
Vitaly Tuzov
e0f8bb83a6 Merge pull request #14994 from terfendail:wintr_undistort
WUI based implementation to initUndistortRectifyMap (#14994)

* Add initUndistortRectifyMap performance test

* Move cv namespace boundaries

* Add wide universal intrinsics based implementation to initUndistortRectifyMap

* Dispatch undistort
2019-07-18 19:32:51 +03:00
Alexander Alekhin
d5a2fe5180 perf: ignore _ovx tests 2019-03-06 15:52:23 +03:00
Brad Kelly
507f8add1c Implementing AVX512 Support for 2 and 4 channel mats for CV_64F format 2019-02-19 11:31:20 -08:00
Alexander Alekhin
757d8ac8f7 Merge pull request #13769 from savuor:cvtColor_tests_16u_32f 2019-02-08 15:29:35 +00:00
Rostislav Vasilikhin
4e679e1cc5 disabled 16u and 32f perf tests 2019-02-07 19:26:36 +03:00
Rostislav Vasilikhin
87f651c119 disabled sanity check for 32f 2019-02-07 18:20:29 +03:00
Namgoo Lee
fb8e652c3f Add CV_16UC1 support for cuda::CLAHE
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.

The amount of memory needed for building histogram is:

    65536 * 4byte = 256KB

and shared memory limit is 48KB typically.

Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.

There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
2019-02-06 17:21:55 +00:00
Rostislav Vasilikhin
bbedebb57c perf tests for cvtColor for 16U and 32f added 2019-02-06 17:56:44 +03:00
Vitaly Tuzov
ed2e1af3e8 Added performance test for blendLinear 2019-01-25 14:16:19 +03:00
Vitaly Tuzov
78f80c35d2 Performance test for bounding rect estimation 2019-01-18 15:50:21 +03:00
Brad Kelly
0165ffa90d Implementing AVX512 support for 3 channel cv::integral for CV_64F 2019-01-14 16:11:01 -08:00
Vitaly Tuzov
3903174f7c Merge pull request #13334 from terfendail:histogram_wintr
* added performance test for compareHist

* compareHist reworked to use wide universal intrinsics

* Disabled vectorization for CV_COMP_CORREL and CV_COMP_BHATTACHARYYA if f64 is unsupported
2018-12-13 14:20:22 +03:00
Vitaly Tuzov
e991e05b9b Added anonymous namespace to perf_contours 2018-11-27 11:35:40 +03:00
Vitaly Tuzov
e9e8bf4b81 Added performance tests for findContours 2018-11-21 19:57:02 +03:00
Sayed Adel
8965f3ae06 imgproc:simd Enable VSX and wide universal intrinsics for accumulate operations
- improve cpu dispatching calls to allow more SIMD extentions
    (SSE4.1, AVX2, VSX)
  - wide universal intrinsics
  - replace dummy v_expand with v_expand_low
  - replace v_expand + v_mul_wrap with v_mul_expand for product accumulate operations
  - use FMA for accumulate operations
  - add mask and more types to accumulate's performance tests
2018-10-11 04:37:12 +02:00
Alexander Alekhin
2170811e48 imgproc(perf): update getPerspectiveTransform perf test
Function is very fast, so 0.000 ms results are useless.
1000 runs requires 25ms on i7-6700K.
2018-07-13 15:31:33 +03:00
yuki takehara
ed207d79e7 Merge pull request #11108 from take1014:hough_4303
* Added accumulator value to the output of HoughLines and HoughCircles

* imgproc: refactor Hough patch

- eliminate code duplication
- fix type handling, fix OpenCL code
- fix test data generation
- re-generated test data in debug mode via plain CPU code path
2018-05-23 20:42:12 +00:00
Vadim Pisarevsky
b8b7ca7302
Rewite polar transforms (#11323)
* Rewrite polar transformations

- A new wrapPolar function encapsulate both linear and semi-log remap
- Destination size is a parameter or calculated automatically to keep objects size between remapping
- linearPolar and logPolar has been deprecated

* Fix build warning and error in accuracy test

* Fix function name to warpPolar

* Explicitly specify the mapping mode, so we retain all the parameters as non-optional.

Introduces WarpPolarMode enum to specify the mapping mode in flags

* resolves performance warning on windows build

* removed duplicated logPolar and linearPolar implementations
2018-04-17 15:50:52 +03:00
Tomoaki Teshima
a82e70cd40 remove raw SSE2/NEON implementation from imgwarp.cpp
* use universal intrinsic instead of raw intrinsic
  * add 2 channels de-interleave on x86 platform
  * add v_int32x4 version of v_muladd
  * add accumulate version of v_dotprod based on the commit from seiko2plus on bf1852d
  * remove some verify check in performance test
  * avoid the out of boundary access and keep the performance
2018-04-13 21:19:16 +09:00
Alexander Alekhin
4a297a2443 ts: refactor OpenCV tests
- removed tr1 usage (dropped in C++17)
- moved includes of vector/map/iostream/limits into ts.hpp
- require opencv_test + anonymous namespace (added compile check)
- fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions
- added missing license headers
2018-02-03 19:39:47 +00:00
Tom Becker
592f8d8c1b Merge pull request #10232 from TomBecker-BD:hough-many-circles
Hough many circles (#10232)

* Add Hui's optimization. Merge with latest changes in OpenCV.

* Use conditional compilation instead of a runtime flag.

* Whitespace.

* Create the sequence for the nonzero edge pixels only if using that approach.

* Improve performance for finding very large numbers of circles

* Return the circles with the larger accumulator values first, as per API documentation.
Use a separate step to check distance between circles. Allows circles to be sorted by strength first. Avoids locking in EstimateRadius which was slowing it down.
Return centers only if maxRadius == 0 as per API documentation.

* Sort the circles so results are deterministic. Otherwise the order of circles with the same strength depends on parallel processing completion order.

* Add test for HoughCircles.

* Add beads test.

* Wrap the non-zero points structure in a common interface so the code can use either a vector or a matrix.

* Remove the special case for skipping the radius search if maxRadius==0.

* Add performance tests.

* Use NULL instead of nullptr.
OpenCV should compile with C++98 compiler.

* Put test suite name first.
Use different test suite names for each test to avoid an error from the test runner.

* Address build bot errors and warnings.

* Skip radius search if maxRadius < 0.

* Dynamically switch to NZPointList when it will be faster than NZPointSet.

* Fix compile error: missing 'typename' prior to dependent type name.

* Fix compile error: missing 'typename' prior to dependent type name.
This time fix it the non C++ 11 way.

* Fix compile error: no type named 'const_reference' in 'class cv::NZPointList'

* Disable ManySmallCircles tests. Failing on Mac.

* Change beads image to JPEG for smaller file size.
Try enabling the ManySmallCircles tests again.

* Remove ManySmallCircles tests. They are failing on the Mac build.

* Fix expectations to check all circles.

* Changing case on a case-insensitive file system
Step 1: remove the old file names

* Changing case on a case-insensitive file system
Step 2: add them back with the new names

* Fix cmpAccum function to be strictly weak ordered.

* Add tests for many small circles.

* imgproc(perf): fix HoughCircles tests

* imgproc(houghCircles): refactor code

- simplify NZPointList
- drop broken (de-synchronization of 'current'/'mi' fields) NZPointSet iterator
- NZPointSet iterator is replaced to direct area scan
- use SIMD intrinsics
- avoid std exceptions (build for embedded systems)
2017-12-28 17:23:11 +03:00
Vadim Pisarevsky
69a6765bf7 Merge pull request #10387 from terfendail:resize23_perftest 2017-12-22 13:26:05 +00:00
Vitaly Tuzov
b6fe4cc807 Added performance tests for linear resize of 2 and 3-channel images 2017-12-20 18:11:21 +03:00
Alexander Alekhin
d5f152494b fix file names 2017-12-15 14:59:35 +03:00
Vitaly Tuzov
51cb56ef2c Implementation of bit-exact resize. Internal calls to linear resize updated to use bit-exact version. (#9468) 2017-12-13 15:00:38 +03:00
elenagvo
a25c443d1f add perf test for boxFilter CV8U to CV16U 2017-12-01 14:38:00 +03:00
elenagvo
11ddb9332c add HAL for adaptiveThreshold 2017-11-27 15:35:02 +03:00
Alexander Alekhin
a08044d6f2 Merge pull request #8833 from terfendail:resizenn_perftest 2017-09-25 16:42:07 +00:00
vipinanand4
39e742765a Merge pull request #9618 from vipinanand4:goodFeaturesToTrack_added_gradiantSize
Added gradiantSize param into goodFeaturesToTrack API (#9618)

* Added gradiantSize param into goodFeaturesToTrack API

Removed hardcode value 3 in goodFeaturesToTrack API, and
added new param 'gradinatSize' in this API so that user can
pass any gradiant size as 3, 5 or 7.

Signed-off-by: Vipin Anand <anand.vipin@gmail.com>
Signed-off-by: Nilaykumar Patel<nilay.nilpat@gmail.com>
Signed-off-by: Prashanth Voora <prashanthx85@gmail.com>

* fixed compilation error for java test

Signed-off-by: Vipin Anand <anand.vipin@gmail.com>

* Modifying code for previous binary compatibility and fixing other warnings

fixed ABI break issue

resolved merged conflict

compilation error fix

Signed-off-by: Vipin Anand <anand.vipin@gmail.com>
Signed-off-by: Patel, Nilaykumar K <nilay.nilpat@gmail.com>
2017-09-22 14:04:43 +00:00
Vitaly Tuzov
22fcbaed64 Added performance test for nearest neighbor resize 2017-09-22 12:34:58 +03:00
Alexander Alekhin
a4a47b538c build: detect Android via '__ANDROID__' macro
https://sourceforge.net/p/predef/wiki/OperatingSystems
2017-07-10 12:43:59 +03:00
Alexander Alekhin
006966e629 trace: initial support for code trace 2017-06-26 17:07:13 +03:00
Vitaly Tuzov
3a5e036feb Updated fix for accumulate performance test in case of multiple iterations 2017-06-13 19:23:34 +03:00
Vitaly Tuzov
01f773b803 Fix for accumulate performance test in case of multiple iterations 2017-05-16 18:28:13 +03:00
Vitaly Tuzov
2492c299f3 Extended set of existing performance test to OpenVX HAL suitable execution modes 2017-04-27 12:32:29 +03:00
Tomoaki Teshima
aec59aba34 suppress warnings
- brush up the implementation
2017-02-23 09:11:12 +09:00
Tomoaki Teshima
64cf206fb5 optimize blend using universal intrinsic
- add more channels/depth performance test for blend
2017-02-20 19:09:26 +09:00
Tomoaki Teshima
37be9ddeec add enum Bayer**2BGRA
- let it possible to reach Bayer2BGRA conversion
2017-02-11 00:20:57 +09:00
Alexander Alekhin
7a9ed39655 test: update HoughLines perf test 2016-10-20 16:54:35 +03:00
Tomoaki Teshima
ea6410d1e7 use universal intrinsic in threshold
* add performance test for 32F and 64F threshold
  * requires update of opencv_extra
2016-10-11 18:00:41 +09:00
Vadim Pisarevsky
a9ab869800 seriously improved performance of blur function, especially 3x3 and 5x5 cases (#7262)
* seriously improved performance of blur function, especially 3x3 and 5x5 cases

* trying to fix warnings and test failures

* replaced #if 0 with #if IPP_DISABLE_BLOCK
2016-09-09 23:31:02 +04:00
Pavel Vlasov
680ca88ce0 Outdated ICV restrictions were removed; 2016-08-19 15:08:39 +03:00