Lab, Luv and XYZ conversions rewritten to wide intrinsics (#14106)
* rgb2xyz<float> re-vectorized
* rgb2xyz_i vectorized for ushort and uchar
* xyz2rgb<float> vectorized
* xyz2rgb_i vectorized for both uchar and ushort
* intermediate conversions (int->float) rewritten
* packed rgb2luv rewritten
* (some) float conversions rewritten
* burnt volatile int _3 and similar
* RGB2Lab_b rewritten
* tests: logging made better
* RGB2Lab_f (LRGB path) rewritten
* Lab2RGBfloat rewritten
* Lab2RGBinteger and Lab2RGB_b rewritten to wide universal intrinsics
* Luv2RGBinteger wide vectorized
* RGB2Lab_b fixed: v_sub_wrap instead of saturated sub
* warnings fixed
* trying to fix compilation on older compilers
* using 16x8 registers for 8-element dot product
* cleanup added
* splineInterpolate: loop unrolled, perf fix for f32x4
* Lab2RGBfloat: grab 2x more data to process on f32x4
* nrepeats for Luv2RGBfloat, +20% perf
* minor
* nrepeats to RGB2Lab_f
* Lab2RGBinteger: no tab for linear BGR
* nrepeats for RGB2Luvfloat
* Luv2RGBinteger: no tab for linear RGB
* +10% more to perf of Luv2RGBfloat
* nrepeats for 256-simd for Lab2RGBfloat
* less warnings
* BOM removed
* CV_SIMD_WIDTH used for lanes number checking
* trilinearPackedInterpolate: 128-bit specialization added
* fix build; no vx_cleanup(), instrumentation instead
Lab/XYZ modes have been postponed (color_lab.cpp):
- need to split code for tables initialization and for pixels processing first
- no significant performance improvements for switching between SSE42 / AVX2 code generation
Resize reworked using wide universal intrinsics (#13781)
* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
* Reworked linear resize using new wide LUT intrinsics
* Fix for VSX intrinsics
* LineIterator witout a Mat
cv::LineIterator can be used without being attached to any cv::Mat, it only needs the size and type of data. An alternative constructor has been defined for that.
In that case, a LineIterator can no more be dereferenced with the * operator, but pos() still returns valid pixel positions.
It can be useful when LineIterator is just used to compute positions of pixels on a line, without requiring to build a Mat just for that.
Use case : with a dataset that would represent a huge image, pixel positions can be pre-computed before querying the dataset API.
* Update imgproc.hpp
removed trailing spaces
* Update drawing.cpp
fixed warning
PyrDown: Fix bug #12961 (#13672)
* Force unaligned pointer and create test
* More cross-platform solution
* MSVC expects a proper order
* Remove useless clang macro
* added performance test for compareHist
* compareHist reworked to use wide universal intrinsics
* Disabled vectorization for CV_COMP_CORREL and CV_COMP_BHATTACHARYYA if f64 is unsupported
* significantly reduced OpenCV binary size by disabling IPP calls in some OpenCV functions: Sobel, Scharr, medianBlur, GaussianBlur, filter2D, mean, meanStdDev, norm, sum, minMaxIdx, sort.
* re-enable IPP in norm, since it's much faster (without adding too much space overhead)
* removed C API in the following modules: photo, video, imgcodecs, videoio
* trying to fix various compile errors and warnings on Windows and Linux
* continue to fix compile errors and warnings
* continue to fix compile errors, warnings, as well as the test failures
* trying to resolve compile warnings on Android
* Update cap_dc1394_v2.cpp
fix warning from the new GCC
* Updated boxFilter implementations to use wide universal intrinsics
* boxFilter implementation moved to separate file
* Replaced ROUNDUP macro with roundUp() function
* integrated the new C++ persistence; removed old persistence; most of OpenCV compiles fine! the tests have not been run yet
* fixed multiple bugs in the new C++ persistence
* fixed raw size of the parsed empty sequences
* [temporarily] excluded obsolete applications traincascade and createsamples from build
* fixed several compiler warnings and multiple test failures
* undo changes in cocoa window rendering (that was fixed in another PR)
* fixed more compile warnings and the remaining test failures (hopefully)
* trying to fix the last little warning
* RGB2RGB initially rewritten
* NEON impl removed
* templated version added for ushort, float
* data copying allowed for RGB2RGB
* inplace processing fixed
* fields to local vars
* no zeroupper until it's fixed
* vx_cleanup() added back
* rewrote the line segment intersection function to make the static analyzer happy
* fixed bug with improper "no intersection" detection in some of corner cases
* fixed bug with improper "no intersection" detection in some of corner cases
Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.
- improve cpu dispatching calls to allow more SIMD extentions
(SSE4.1, AVX2, VSX)
- wide universal intrinsics
- replace dummy v_expand with v_expand_low
- replace v_expand + v_mul_wrap with v_mul_expand for product accumulate operations
- use FMA for accumulate operations
- add mask and more types to accumulate's performance tests
* bgr2gray 8u fixed to be in conformance with IPP code
* coefficients fixed so their sum is 32768
* java test for CascadeDetect fixed: equalizeHist added
fix some errors found by static analyzer. (#12391)
* fix possible divided by zero and by negative values
* only 4 elements are used in these arrays
* fix uninitialized member
* use boolean type for semantic boolean variables
* avoid invalid array index
* to avoid exception and because base64_beg is only used in this block
* use std::atomic<bool> to avoid thread control race condition
* fix 12218
* Update test_distancetransform.cpp
marked the test as "BIGDATA_TEST" in order to skip it on low-mem platforms
* modify test
* use a smaller image in the test
* fix test code
* Bit-exact resize reworked to use wide intrinsics
* Reworked bit-exact resize row data loading
* Added bit-exact resize row data loaders for SIMD256 and SIMD512
* Fixed type punned pointer dereferencing warning
* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
* add universal intrinsics for HSV2RGB_b
* rewritten HSV2RGB_b without using extra universal intrinsics
* removed unused variable
* undo changes in v_load_deinterleave
fixes handling of empty matrices in some functions (#11634)
* a part of PR #11416 by Yuki Takehara
* moved the empty mat check in Mat::copyTo()
* fixed some test failures
* Added accumulator value to the output of HoughLines and HoughCircles
* imgproc: refactor Hough patch
- eliminate code duplication
- fix type handling, fix OpenCL code
- fix test data generation
- re-generated test data in debug mode via plain CPU code path
Arm: fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4 (#11409)
* fix the test failure of OCL_Imgproc/CLAHETest.Accuracy on ODROID-XU4
* avoid the race condition in the reduce
* imgproc(ocl): simplify CLAHE code
* remove unused class
* model is not learned when grabcut is called with GC_EVAL
* fixed test, was writing to wrong file.
* modified patch by Iwan Paolucci; added GC_EVAL_FREEZE_MODEL in addition to GC_EVAL (which semantics is retained)
* Rewrite polar transformations
- A new wrapPolar function encapsulate both linear and semi-log remap
- Destination size is a parameter or calculated automatically to keep objects size between remapping
- linearPolar and logPolar has been deprecated
* Fix build warning and error in accuracy test
* Fix function name to warpPolar
* Explicitly specify the mapping mode, so we retain all the parameters as non-optional.
Introduces WarpPolarMode enum to specify the mapping mode in flags
* resolves performance warning on windows build
* removed duplicated logPolar and linearPolar implementations
* use universal intrinsic instead of raw intrinsic
* add 2 channels de-interleave on x86 platform
* add v_int32x4 version of v_muladd
* add accumulate version of v_dotprod based on the commit from seiko2plus on bf1852d
* remove some verify check in performance test
* avoid the out of boundary access and keep the performance
* Added custom implementation for NxN bit-exact GaussianBlur
* Reworked fixedpoint interface a bit
* Reworked horizontal line estimation for bit-exact GaussianBlur
* Reworked vertical line estimation for bit-exact GaussianBlur
* Updated range estimation for vectorized part of bit-exact GaussianBlur evaluation
* Fix#9363
* Renamed the structure and added a new function to the LineSegmentDetectorImpl class as a static member
* Added a new function to the LineSegmentDetectorImpl class as a static member
color.cpp split (#10869)
* initial split is done
* files renamed (these names are excluded during compilation)
* IPP code moved to corresponding files
* splineBuild, splineInterpolate -> color_lab.cpp
* Lab, Luv: little refactored
* it compiles (didn't check work); Lab OCL code moved to color_lab.cpp
* cvtcolor.cl: Lab/Luv part moved to color_lab.cl
* cvtcolor.cl: color_rgb.cl extracted
* cvtcolor.cl: color_yuv.cl separated
* cvtcolor.cl: color_hsv.cl extracted
* cvtcolor.cl: extracted to color_lab.cl and color_rgb.cl
* helper functions moved to hpp file
* Lab, Luv: moved to color_lab.cpp
* CPU XYZ: to color_lab.cpp
* OCL XYZ: to color_lab.cpp
* warning fixed
* CvtHelper added
* CPU YUV: to color_yuv.cpp, helpers to color.hpp
* CPU HLS/HSV: to color_hsv.cpp
* CPU BGR2BGR: to color_rgb.cpp
* CPU RGB: to color_rgb.cpp
* extra arg removed
* CPU YUV: to color_yuv.cpp
* color code decoded
* OclHelper added, some funcs rewritten
* color_lab.cpp: refactored to use OclHelper
* OCL RGB: to color_rgb.cpp
* OCL HLS/HSV: to color_hsv.cpp
* OCL YUV: to color_yuv.cpp
* OCL YUV planes: to color_yuv.cpp
* OCL: color code reduced
* licence to demosaicing.cpp
* IPP func tables to color_rgb.cpp
* code cleanup
* HAVE_OPENCL ifdefs added
* helpers made more common
* fixed two plane YUV with separate mats
* fixed warning in gcc7.2.0
* precomp header fixed
* color space classification functions fixed
* helpers fixed
* rename: isSRGB -> is_sRGB
* Add a new interface for hough transform
* Fixed warning code
* Fix HoughLinesUsingSetOfPoints based on HoughLinesStandard
* Delete memset
* Rename HoughLinesUsingSetOfPoints and add common function
* Fix test error
* Change static function name
* Change using CV_Assert instead of if-block and add integer test case
* I solve the conflict and delete 'std :: tr1' and changed it to use 'tuple'
* I deleted std::tr1::get and changed int to use 'get'
* Fixed sample code
* revert test_main.cpp
* Delete sample code in comment and add snippets
* Change file name
* Delete static function
* Fixed build error
* Fixing a bug in Canny implemetation when Sobel aperture size is 7.
* Fixing the bug in Canny accross variants and in test_canny.cpp
* Replacing a tab with white space
* Bit-exact implementation of GaussianBlur smoothing
* Added universal intrinsics based implementation for bit-exact CV_8U GaussianBlur smoothing.
* Added parallel_for to evaluation of bit-exact GaussianBlur
* Added custom implementations for 3x3 and 5x5 bit-exact GaussianBlur
Hough many circles (#10232)
* Add Hui's optimization. Merge with latest changes in OpenCV.
* Use conditional compilation instead of a runtime flag.
* Whitespace.
* Create the sequence for the nonzero edge pixels only if using that approach.
* Improve performance for finding very large numbers of circles
* Return the circles with the larger accumulator values first, as per API documentation.
Use a separate step to check distance between circles. Allows circles to be sorted by strength first. Avoids locking in EstimateRadius which was slowing it down.
Return centers only if maxRadius == 0 as per API documentation.
* Sort the circles so results are deterministic. Otherwise the order of circles with the same strength depends on parallel processing completion order.
* Add test for HoughCircles.
* Add beads test.
* Wrap the non-zero points structure in a common interface so the code can use either a vector or a matrix.
* Remove the special case for skipping the radius search if maxRadius==0.
* Add performance tests.
* Use NULL instead of nullptr.
OpenCV should compile with C++98 compiler.
* Put test suite name first.
Use different test suite names for each test to avoid an error from the test runner.
* Address build bot errors and warnings.
* Skip radius search if maxRadius < 0.
* Dynamically switch to NZPointList when it will be faster than NZPointSet.
* Fix compile error: missing 'typename' prior to dependent type name.
* Fix compile error: missing 'typename' prior to dependent type name.
This time fix it the non C++ 11 way.
* Fix compile error: no type named 'const_reference' in 'class cv::NZPointList'
* Disable ManySmallCircles tests. Failing on Mac.
* Change beads image to JPEG for smaller file size.
Try enabling the ManySmallCircles tests again.
* Remove ManySmallCircles tests. They are failing on the Mac build.
* Fix expectations to check all circles.
* Changing case on a case-insensitive file system
Step 1: remove the old file names
* Changing case on a case-insensitive file system
Step 2: add them back with the new names
* Fix cmpAccum function to be strictly weak ordered.
* Add tests for many small circles.
* imgproc(perf): fix HoughCircles tests
* imgproc(houghCircles): refactor code
- simplify NZPointList
- drop broken (de-synchronization of 'current'/'mi' fields) NZPointSet iterator
- NZPointSet iterator is replaced to direct area scan
- use SIMD intrinsics
- avoid std exceptions (build for embedded systems)
* Add test that fails
* Fix integer pointPolygonTest for large coordinate values
* Review fixes:
- change type from long long to int64
- move test code to test_contours.cpp, and make it C++98 compliant
* Hopefully fix compiler error by using push_back instead of emplace_back
* fixed OpenCL functions on Mac, so that the tests pass
* fixed compile warnings; temporarily disabled OCL branch of TV L1 optical flow on mac
* fixed other few warnings on macos
If there are no OpenCL/UMat methods calls from application.
OpenCL subsystem is initialized:
- haveOpenCL() is called from application
- useOpenCL() is called from application
- access to OpenCL allocator: UMat is created (empty UMat is ignored) or UMat <-> Mat conversions are called
Don't call OpenCL functions if OPENCV_OPENCL_RUNTIME=disabled
(independent from OpenCL linkage type)
* Update OpenCVCompilerOptimizations.cmake
Neon not supported on MSVC ARM breaking build fix
* Update OpenCVCompilerOptimizations.cmake
Whitespace
* Update intrin.hpp
Many problems in MSVC ARM builds (at least on VS2017) being fixed in this PR now.
C:\Users\Gregory\DOCUME~1\MYLIBR~1\OPENCV~3\opencv\sources\modules\core\include\opencv2/core/hal/intrin.hpp(444): error C3861: '_tzcnt_u32': identifier not found
* Update hal_replacement.hpp
Passing variadic expansion in a macro to another macro does not work properly in MSVC and a famous known workaround is hereby applied. Discussion of it: https://stackoverflow.com/questions/5134523/msvc-doesnt-expand-va-args-correctly
Only needed the fix for ARM builds: TEGRA_ macros are used for cv_hal_ functions in the carotene library.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): warning C4003: not enough actual parameters for macro 'TEGRA_ADD'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2143: syntax error: missing ')' before ','
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\core\src\arithm.cpp(2378): error C2059: syntax error: ')'
* Update hal_replacement.hpp
All hal_replacement's using carotene\hal\tegra_hal.hpp TEGRA_ functions as macros preprocessed by variadic macros should be changed, identical as was done in core.
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): warning C4003: not enough actual parameters for macro 'TEGRA_CVTBGRTOBGR'
C:\Users\Gregory\Documents\My Libraries\opencv330\opencv\sources\modules\imgproc\src\color.cpp(9604): error C2059: syntax error: '=='
* Update OpenCVCompilerOptimizations.cmake
* Update hal_replacement.hpp
* Update hal_replacement.hpp
Adds fitEllipseDirect to imgproc: The Direct least square (Direct) method by Fitzgibbon1999.
New Tests are included for the methods.
fitEllipseAMS Tests
fitEllipseDirect Tests
Comparative examples are added to fitEllipse.cpp in Samples.
imgproc: use universal intrinsic as much as possible (#9714)
* use universal intrinsic as much as possible
* make SSE3 part as common as possible with universal intrinsic implementation
* put the reducing part out of the main loop
* follow the comment
* fix the typo
* use v_reduce_sum4
* follow the comment again
* remove all CV_SSE3 part from smooth.cpp
The non-maximum suppression in the Hough accumulator incorrectly ignores maxima that extend over more than one cell, i.e. two neighboring cells both have the same accumulator value. This maximum is dropped completely instead of picking at least one of the entries. This frequently results in obvious circles being missed.
The behavior is now changed to be the same as for hough_lines.
See also https://github.com/opencv/opencv/issues/4440
Added gradiantSize param into goodFeaturesToTrack API (#9618)
* Added gradiantSize param into goodFeaturesToTrack API
Removed hardcode value 3 in goodFeaturesToTrack API, and
added new param 'gradinatSize' in this API so that user can
pass any gradiant size as 3, 5 or 7.
Signed-off-by: Vipin Anand <anand.vipin@gmail.com>
Signed-off-by: Nilaykumar Patel<nilay.nilpat@gmail.com>
Signed-off-by: Prashanth Voora <prashanthx85@gmail.com>
* fixed compilation error for java test
Signed-off-by: Vipin Anand <anand.vipin@gmail.com>
* Modifying code for previous binary compatibility and fixing other warnings
fixed ABI break issue
resolved merged conflict
compilation error fix
Signed-off-by: Vipin Anand <anand.vipin@gmail.com>
Signed-off-by: Patel, Nilaykumar K <nilay.nilpat@gmail.com>
* lab_tetra squashed
* initial version is almost written
* unfinished work
* compilation fixed, to be debugged
* Lab test removed
* more fixes
* Luv2RGBinteger: channels order fixed
* Lab structs removed
* good trilinear interpolation added
* several fixes
* removed Luv2RGB interpolations, XYZ tables; 8-cell LUT added
* no_interpolate made 8-cell
* interpolations rewritten to 8-cell, minor fixes
* packed interpolation added for RGB2Luv
* tetra implemented
* removing unnecessary code
* LUT building merged
* changes ported to color.cpp
* minor fixes; try to suppress warnings
* fixed v range of Luv
* fixed incorrect src channel number
* minor fixes
* preliminary version of Luv2RGBinteger is done
* Luv2RGB_b is in progress
* XYZ color constants converted to softfloat
* Luv test: precision fixed
* Luv bit-exactness test added
* warnings fixed
* compilation fixed, error message fixed
* Luv check is limited to [0-2,0-2,0-2] by XYZ
* L->Y generation moved to LUT
* LUTs added for up and vp of Luv2RGB_b
* still works
* fixed-point is done, works at maxerr 2
* vectorized code is done, 2x slower than original
* perf improved by 10%
* extra comments removed
* code moved to color.cpp
* test_lab.cpp updated
* minor refactoring
* test added for Luv2RGB
* OCL Luv2RGB_b: XYZ are limited to [0, 2]; docs updated
* Luv2RGB_b rewritten to universal intrinsics
* test_lab.cpp moved to luv_tetra branch