* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.
* Disabling full tests by default.
* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.
* Giving proper names, making array static.
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.
* added performance test for compareHist
* compareHist reworked to use wide universal intrinsics
* Disabled vectorization for CV_COMP_CORREL and CV_COMP_BHATTACHARYYA if f64 is unsupported
- improve cpu dispatching calls to allow more SIMD extentions
(SSE4.1, AVX2, VSX)
- wide universal intrinsics
- replace dummy v_expand with v_expand_low
- replace v_expand + v_mul_wrap with v_mul_expand for product accumulate operations
- use FMA for accumulate operations
- add mask and more types to accumulate's performance tests
* Add HPX backend for OpenCV implementation
Adds hpx backend for cv::parallel_for_() calls respecting the nstripes chunking parameter. C++ code for the backend is added to modules/core/parallel.cpp. Also, the necessary changes to cmake files are introduced.
Backend can operate in 2 versions (selectable by cmake build option WITH_HPX_STARTSTOP): hpx (runtime always on) and hpx_startstop (start and stop the backend for each cv::parallel_for_() call)
* WIP: Conditionally include hpx_main.hpp to tests in core module
Header hpx_main.hpp is included to both core/perf/perf_main.cpp and core/test/test_main.cpp.
The changes to cmake files for linking hpx library to above mentioned test executalbles are proposed but have issues.
* Add coditional iclusion of hpx_main.hpp to cpp cpu modules
* Remove start/stop version of hpx backend
* Added accumulator value to the output of HoughLines and HoughCircles
* imgproc: refactor Hough patch
- eliminate code duplication
- fix type handling, fix OpenCL code
- fix test data generation
- re-generated test data in debug mode via plain CPU code path
* Rewrite polar transformations
- A new wrapPolar function encapsulate both linear and semi-log remap
- Destination size is a parameter or calculated automatically to keep objects size between remapping
- linearPolar and logPolar has been deprecated
* Fix build warning and error in accuracy test
* Fix function name to warpPolar
* Explicitly specify the mapping mode, so we retain all the parameters as non-optional.
Introduces WarpPolarMode enum to specify the mapping mode in flags
* resolves performance warning on windows build
* removed duplicated logPolar and linearPolar implementations
* use universal intrinsic instead of raw intrinsic
* add 2 channels de-interleave on x86 platform
* add v_int32x4 version of v_muladd
* add accumulate version of v_dotprod based on the commit from seiko2plus on bf1852d
* remove some verify check in performance test
* avoid the out of boundary access and keep the performance
- removed tr1 usage (dropped in C++17)
- moved includes of vector/map/iostream/limits into ts.hpp
- require opencv_test + anonymous namespace (added compile check)
- fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions
- added missing license headers
Hough many circles (#10232)
* Add Hui's optimization. Merge with latest changes in OpenCV.
* Use conditional compilation instead of a runtime flag.
* Whitespace.
* Create the sequence for the nonzero edge pixels only if using that approach.
* Improve performance for finding very large numbers of circles
* Return the circles with the larger accumulator values first, as per API documentation.
Use a separate step to check distance between circles. Allows circles to be sorted by strength first. Avoids locking in EstimateRadius which was slowing it down.
Return centers only if maxRadius == 0 as per API documentation.
* Sort the circles so results are deterministic. Otherwise the order of circles with the same strength depends on parallel processing completion order.
* Add test for HoughCircles.
* Add beads test.
* Wrap the non-zero points structure in a common interface so the code can use either a vector or a matrix.
* Remove the special case for skipping the radius search if maxRadius==0.
* Add performance tests.
* Use NULL instead of nullptr.
OpenCV should compile with C++98 compiler.
* Put test suite name first.
Use different test suite names for each test to avoid an error from the test runner.
* Address build bot errors and warnings.
* Skip radius search if maxRadius < 0.
* Dynamically switch to NZPointList when it will be faster than NZPointSet.
* Fix compile error: missing 'typename' prior to dependent type name.
* Fix compile error: missing 'typename' prior to dependent type name.
This time fix it the non C++ 11 way.
* Fix compile error: no type named 'const_reference' in 'class cv::NZPointList'
* Disable ManySmallCircles tests. Failing on Mac.
* Change beads image to JPEG for smaller file size.
Try enabling the ManySmallCircles tests again.
* Remove ManySmallCircles tests. They are failing on the Mac build.
* Fix expectations to check all circles.
* Changing case on a case-insensitive file system
Step 1: remove the old file names
* Changing case on a case-insensitive file system
Step 2: add them back with the new names
* Fix cmpAccum function to be strictly weak ordered.
* Add tests for many small circles.
* imgproc(perf): fix HoughCircles tests
* imgproc(houghCircles): refactor code
- simplify NZPointList
- drop broken (de-synchronization of 'current'/'mi' fields) NZPointSet iterator
- NZPointSet iterator is replaced to direct area scan
- use SIMD intrinsics
- avoid std exceptions (build for embedded systems)