imgproc: optimized nearest neighbour interpolation for warpAffine, warpPerspective and remap #26505
PR Description has a limit of 65536 characters. So performance stats are attached below.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
imgproc: add optimized warpAffine kernels for 8U/16U/32F + C1/C3/C4 inputs #25984
Merge wtih https://github.com/opencv/opencv_extra/pull/1198.
Merge with https://github.com/opencv/opencv_contrib/pull/3787.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
C-API cleanup: apps, imgproc_c and some constants #25075
Merge with https://github.com/opencv/opencv_contrib/pull/3642
* Removed obsolete apps - traincascade and createsamples (please use older OpenCV versions if you need them). These apps relied heavily on C-API
* removed all mentions of imgproc C-API headers (imgproc_c.h, types_c.h) - they were empty, included core C-API headers
* replaced usage of several C constants with C++ ones (error codes, norm modes, RNG modes, PCA modes, ...) - most part of this PR (split into two parts - all modules and calib+3d - for easier backporting)
* removed imgproc C-API headers (as separate commit, so that other changes could be backported to 4.x)
Most of these changes can be backported to 4.x.
First proposal of cv::remap with relative displacement field (#24603) #24621
Implements #24603
Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates.
However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`
It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.
This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.
Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Implement color conversion from RGB to YUV422 family #24333
Related PR for extra: https://github.com/opencv/opencv_extra/pull/1104
Hi,
This patch provides CPU and OpenCL implementations of color conversions from RGB/BGR to YUV422 family (such as UYVY and YUY2).
These features would come in useful for enabling standard RGB images to be supplied as input to algorithms or networks that make use of images in YUV422 format directly (for example, on resource constrained devices working with camera images captured in YUV422).
The code, tests and perf tests are all written following the existing pattern. There is also an example `bin/example_cpp_cvtColor_RGB2YUV422` that loads an image from disk, converts it from BGR to UYVY and then back to BGR, and displays the result as a visual check that the conversion works.
The OpenCL performance for the forward conversion implemented here is the same as the existing backward conversion on my hardware. The CPU implementation, unfortunately, isn't very optimized as I am not yet familiar with the SIMD code.
Please let me know if I need to fix something or can make other modifications.
Thanks!
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
- [x] The feature is well documented and sample code can be built with the project CMake
imgproc: add basic IntelligentScissorsMB performance test #23698
Adding basic performance test that can be used before and after the #21959 changes etc. as per @asmorkalov's https://github.com/opencv/opencv/pull/21959#issuecomment-1565240926 comment.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Also bring perf_imgproc CornerMinEigenVal accuracy requirements in line with
the test_imgproc accuracy requirements on that test and fix indentation on
the latter.
Partially addresses issue #9821
* goodFeaturesToTrack returns also corner value
(cherry picked from commit 4a8f06755c)
* Added response to GFTT Detector keypoints
(cherry picked from commit b88fb40c6e)
* Moved corner values to another optional variable to preserve backward compatibility
(cherry picked from commit 6137383d32)
* Removed corners valus from perf tests and better unit tests for corners values
(cherry picked from commit f3d0ef21a7)
* Fixed detector gftt call
(cherry picked from commit be2975553b)
* Restored test_cornerEigenValsVecs
(cherry picked from commit ea3e11811f)
* scaling fixed;
mineigen calculation rolled back;
gftt function overload added (with quality parameter);
perf tests were added for the new api function;
external bindings were added for the function (with different alias);
fixed issues with composition of the output array of the new function (e.g. as requested in comments) ;
added sanity checks in the perf tests;
removed C API changes.
* minor change to GFTTDetector::detect
* substitute ts->printf with EXPECT_LE
* avoid re-allocations
Co-authored-by: Anas <anas.el.amraoui@live.com>
Co-authored-by: amir.tulegenov <amir.tulegenov@xperience.ai>
Bit-exact Nearest Neighbor Resizing
* bit exact resizeNN
* change the value of method enum
* add bitexact-nn to ResizeExactTest
* test to compare with non-exact version
* add perf for bit-exact resizenn
* use cvFloor-equivalent
* 1/3 scaling is not stable for floating calculation
* stricter test
* bugfix: broken data in case of 6 or 12bytes elements
* bugfix: broken data in default pix_size
* stricter threshold
* use raw() for floor
* use double instead of int
* follow code reviews
* fewer cases in perf test
* center pixel convention
* resize: HResizeLinear reduce duplicate work
There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.
Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.
The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this. The performance
improvement is 10-50% depending on the scaling.
* imgproc: vectorize HResizeLinear
Performance is mostly gated by the gather operations
for x inputs.
Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.
While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.
For float types, this results in a more modest
1.2x improvement.
* Update U8 processing for non-bitexact linear resize
* core: hal: vsx: improve v_load_expand_q
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
* resize: Fix cn == 3 step per feedback
Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
* Adding all possible data type interactions to the perf tests since some
use SIMD acceleration and others do not.
* Disabling full tests by default.
* Giving proper names, removing magic numbers and sanity checks of new
performance tests for the integral function.
* Giving proper names, making array static.
Due to size limit of shared memory, histogram is built on
the global memory for CV_16UC1 case.
The amount of memory needed for building histogram is:
65536 * 4byte = 256KB
and shared memory limit is 48KB typically.
Added test cases for CV_16UC1 and various clip limits.
Added perf tests for CV_16UC1 on both CPU and CUDA code.
There was also a bug in CV_8UC1 case when redistributing
"residual" clipped pixels. Adding the test case where clip
limit is 5.0 exposes this bug.