Impl hal_rvv LUT | Add more LUT test #26941
Implement through the existing `cv_hal_lut` interfaces.
Add more LUT accuracy and performance tests:
- **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust.
- **Performance test**: Multi-channel input and multi-channel table tests are added.
Perf test done on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)
```sh
$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300
```
```sh
Geometric mean (ms)
Name of Test scalar ui rvv ui rvv
vs vs
scalar scalar
(x-factor) (x-factor)
LUT::SizePrm::320x240 0.248 0.249 0.052 1.00 4.74
LUT::SizePrm::640x480 0.277 0.275 0.085 1.01 3.28
LUT::SizePrm::1920x1080 0.950 0.947 0.634 1.00 1.50
LUT_multi2::SizePrm::320x240 2.051 2.045 2.049 1.00 1.00
LUT_multi2::SizePrm::640x480 2.128 2.134 2.125 1.00 1.00
LUT_multi2::SizePrm::1920x1080 7.397 7.380 7.390 1.00 1.00
LUT_multi::SizePrm::320x240 0.715 0.747 0.154 0.96 4.64
LUT_multi::SizePrm::640x480 0.741 0.766 0.257 0.97 2.88
LUT_multi::SizePrm::1920x1080 2.766 2.765 1.925 1.00 1.44
```
This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:
- For the U8C1 table type, the optimization works when `vlen >= 256`
- For U16C1, it works when `vlen >= 512`
- For U32C1, it works when `vlen >= 1024`
Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.
This patch does not implement optimizations for multi-channel tables.
Previous attempts:
1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
3. Using `vluxei8` as a general solution does not show any performance improvement.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Tests added for mixed type arithmetic operations #25671
### Changes
* added accuracy tests for mixed type arithmetic operations
_Note: div-by-zero values are removed from checking since the result is implementation-defined in common case_
* added perf tests for the same cases
* fixed a typo in `getMulExtTab()` function that lead to dead code
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Tests for cv::rotate() added #25633fixes#25449
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Transform offset to indeces for MatND in minMaxIdx HAL #25563
Address comments in https://github.com/opencv/opencv/pull/25553
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Added in-place support for cartToPolar and polarToCart #24893
- a fused hal::cartToPolar[32|64]f() is used instead of sequential hal::magnitude[32|64]f/hal::fastAtan[32|64]f
- ipp_polarToCart is skipped for in-place processing (it seems not to support it correctly)
relates to #24891
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
* add broadcast_to with tests
* change name
* fix test
* fix implicit type conversion
* replace type of shape with InputArray
* add perf test
* add perf tests which takes care of axis
* v2 from ficus expand
* rename to broadcast
* use randu in place of declare
* doc improvement; smaller scale in perf
* capture get_index by reference
There can be an int overflow.
cv::norm( InputArray _src, int normType, InputArray _mask ) is fine,
not cv::norm( InputArray _src1, InputArray _src2, int normType, InputArray _mask ).
* add cv::compare test when Mat type == CV_16F
* add assertion in cv::compare when src.depth() == CV_16F
* cv::compare assertion minor fix
* core: add more checks
* added basic support for CV_16F (the new datatype etc.). CV_USRTYPE1 is now equal to CV_16F, which may break some [rarely used] functionality. We'll see
* fixed just introduced bug in norm; reverted errorneous changes in Torch importer (need to find a better solution)
* addressed some issues found during the PR review
* restored the patch to fix some perf test failures
* a part of https://github.com/opencv/opencv/pull/11364 by Tetragramm. Rewritten and extended findNonZero & PSNR to support more types, not just 8u.
* fixed compile & doxygen warnings
* fixed small bug in findNonZero test
- removed tr1 usage (dropped in C++17)
- moved includes of vector/map/iostream/limits into ts.hpp
- require opencv_test + anonymous namespace (added compile check)
- fixed norm() usage (must be from cvtest::norm for checks) and other conflict functions
- added missing license headers
* remove raw SSE2/NEON implementation from convert.cpp
* remove raw implementation from Cvt_SIMD
* remove raw implementation from cvtScale_SIMD
* remove raw implementation from cvtScaleAbs_SIMD
* remove duplicated implementation cvt_<float, short>
* remove duplicated implementation cvtScale_<short, short, float>
* add "from double" version of Cvt_SIMD
* modify the condition of test ConvertScaleAbs
* Update convert.cpp
fixed crash in cvtScaleAbs(8s=>8u)
* fixed compile error on Win32
* fixed several test failures because of accuracy loss in cvtScale(int=>int)
* fixed NEON implementation of v_cvt_f64(int=>double) intrinsic
* another attempt to fix test failures
* keep trying to fix the test failures and just introduced compile warnings
* fixed one remaining test (subtractScalar)
This adds the possibility to use multi-channel masks for the functions
cv::mean, cv::meanStdDev and the method Mat::setTo. The tests have now a
probability to use multi-channel masks for operations that support them.
This also includes Mat::copyTo, which supported multi-channel masks
before, but there was no test confirming this.