imgproc: add optimized warpAffine kernels for 8U/16U/32F + C1/C3/C4 inputs #25984
Merge wtih https://github.com/opencv/opencv_extra/pull/1198.
Merge with https://github.com/opencv/opencv_contrib/pull/3787.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add a check for src == dst in ocl warpTransform #25898
As mentioned in #25853, when doing WarpAffine with Mat and UMat respectively, if you force the use of the in-place operation (so that src and dst are passed the same variables), Mat produces the correct results, but UMat produces unexpected results.
Obviously in-place operations are not possible with this transformation. When Mat performs the operation, if dst and src are the same variable, the function inherently makes a copy of src without telling the user.
74b50c7af0/modules/imgproc/src/imgwarp.cpp (L2831-L2834)
So I did the same check in UMat, but I'm not sure if it's appropriate, should we just do a copy operation without telling the user (even if the user thinks he's doing an in-place operation), or should we throw an exception to indicate that we shouldn't pass in two same variables here?
The possible reason for this problem is that there is a create function here, so it gives the developer the false impression that this create function has allocated new memory for dst, however it does not.
74b50c7af0/modules/imgproc/src/imgwarp.cpp (L2607-L2609)
Because by the time the check is done here, the function has returned back.
74b50c7af0/modules/core/src/umatrix.cpp (L668-L675)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
First proposal of cv::remap with relative displacement field (#24603) #24621
Implements #24603
Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates.
However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))`
It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.
This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.
Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Implement color conversion from RGB to YUV422 family #24333
Related PR for extra: https://github.com/opencv/opencv_extra/pull/1104
Hi,
This patch provides CPU and OpenCL implementations of color conversions from RGB/BGR to YUV422 family (such as UYVY and YUY2).
These features would come in useful for enabling standard RGB images to be supplied as input to algorithms or networks that make use of images in YUV422 format directly (for example, on resource constrained devices working with camera images captured in YUV422).
The code, tests and perf tests are all written following the existing pattern. There is also an example `bin/example_cpp_cvtColor_RGB2YUV422` that loads an image from disk, converts it from BGR to UYVY and then back to BGR, and displays the result as a visual check that the conversion works.
The OpenCL performance for the forward conversion implemented here is the same as the existing backward conversion on my hardware. The CPU implementation, unfortunately, isn't very optimized as I am not yet familiar with the SIMD code.
Please let me know if I need to fix something or can make other modifications.
Thanks!
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
- [x] The feature is well documented and sample code can be built with the project CMake
* attempt to add 0d/1d mat support to OpenCV
* revised the patch; now 1D mat is treated as 1xN 2D mat rather than Nx1.
* a step towards 'green' tests
* another little step towards 'green' tests
* calib test failures seem to be fixed now
* more fixes _core & _dnn
* another step towards green ci; even 0D mat's (a.k.a. scalars) are now partly supported!
* * fixed strange bug in aruco/charuco detector, not sure why it did not work
* also fixed a few remaining failures (hopefully) in dnn & core
* disabled failing GAPI tests - too complex to dig into this compiler pipeline
* hopefully fixed java tests
* trying to fix some more tests
* quick followup fix
* continue to fix test failures and warnings
* quick followup fix
* trying to fix some more tests
* partly fixed support for 0D/scalar UMat's
* use updated parseReduce() from upstream
* trying to fix the remaining test failures
* fixed [ch]aruco tests in Python
* still trying to fix tests
* revert "fix" in dnn's CUDA tensor
* trying to fix dnn+CUDA test failures
* fixed 1D umat creation
* hopefully fixed remaining cuda test failures
* removed training whitespaces
Also bring perf_imgproc CornerMinEigenVal accuracy requirements in line with
the test_imgproc accuracy requirements on that test and fix indentation on
the latter.
Partially addresses issue #9821
The MinEigenVal path through the corner.cl kernel makes use of native_sqrt,
a math builtin function which has implementation defined accuracy.
Partially addresses issue #9821
* goodFeaturesToTrack returns also corner value
(cherry picked from commit 4a8f06755c)
* Added response to GFTT Detector keypoints
(cherry picked from commit b88fb40c6e)
* Moved corner values to another optional variable to preserve backward compatibility
(cherry picked from commit 6137383d32)
* Removed corners valus from perf tests and better unit tests for corners values
(cherry picked from commit f3d0ef21a7)
* Fixed detector gftt call
(cherry picked from commit be2975553b)
* Restored test_cornerEigenValsVecs
(cherry picked from commit ea3e11811f)
* scaling fixed;
mineigen calculation rolled back;
gftt function overload added (with quality parameter);
perf tests were added for the new api function;
external bindings were added for the function (with different alias);
fixed issues with composition of the output array of the new function (e.g. as requested in comments) ;
added sanity checks in the perf tests;
removed C API changes.
* minor change to GFTTDetector::detect
* substitute ts->printf with EXPECT_LE
* avoid re-allocations
Co-authored-by: Anas <anas.el.amraoui@live.com>
Co-authored-by: amir.tulegenov <amir.tulegenov@xperience.ai>
* loosen some test threshold mainly for integer types
* use relative error for floating points result
* avoid division by zero by following the comment
* fix the indentation
- removed tr1 usage (dropped in C++17)
- moved includes of vector/map/iostream/limits into ts.hpp
- require opencv_test + anonymous namespace (added compile check)
- fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions
- added missing license headers
Add new 5x5 gaussian blur kernel for CV_8UC1 format,
it is 50% ~ 70% faster than current ocl kernel in the perf test.
Signed-off-by: Li Peng <peng.li@intel.com>
Add new OpenCL kernels for bicubic interploation, it is 20% faster
than current warp image kernel with bicubic interploation.
Signed-off-by: Li Peng <peng.li@intel.com>
Add new ocl kernels for warpAffine and warpPerspective,
The average performance improvemnt is about 30%. The new
ocl kernels require CV_8UC1 format and support nearest
neighbor and bilinear interpolation.
Signed-off-by: Li Peng <peng.li@intel.com>
This ocl kernel is 46%~171% faster than current laplacian 3x3
ocl kernel in the perf test, with image format "CV_8UC1".
Signed-off-by: Li Peng <peng.li@intel.com>
This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*
Signed-off-by: Li Peng <peng.li@intel.com>
This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*
Also add accuracy test cases for this kernel, the test command is
./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*
Signed-off-by: Li Peng <peng.li@intel.com>
The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test
./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*
Also add test cases for this ocl kernel.
Signed-off-by: Li Peng <peng.li@intel.com>