opencv/modules/core/perf/perf_math.cpp
GenshinImpactStarts 2090407002
Merge pull request #26999 from GenshinImpactStarts:polar_to_cart
[HAL RVV] unify and impl polar_to_cart | add perf test #26999

### Summary

1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces.
2. Add `polarToCart` performance tests
3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar`
4. To achieve the 3rd point, the original implementation was moved, and some modifications were made.

Tested through:
```sh
opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" 
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300
```

### HAL performance test

***UPDATE***: Current implementation is no more depending on vlen.

**NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`.

Vlen 256 (Muse Pi):
```
                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44   
```

### Effect of Point 4

To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`).

#### Reason for Changes:

This function works as follows:  
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. 

However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`.

Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.

***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

#### Test Result

`scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

`scalar` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.333   0.294     1.13   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.385   0.403     0.96   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   14.749  12.343     1.19   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   19.419  16.743     1.16   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  44.155  37.822     1.17   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  62.108  50.358     1.23   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011  85.769     1.15   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874    1.13   
```

`ui` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72   
```

AVX2 test:
```
                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51   
```

A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-17 14:16:09 +03:00

456 lines
12 KiB
C++

#include "perf_precomp.hpp"
namespace opencv_test
{
using namespace perf;
namespace {
typedef perf::TestBaseWithParam<size_t> VectorLength;
PERF_TEST_P(VectorLength, phase32f, testing::Values(128, 1000, 128*1024, 512*1024, 1024*1024))
{
size_t length = GetParam();
vector<float> X(length);
vector<float> Y(length);
vector<float> angle(length);
declare.in(X, Y, WARMUP_RNG).out(angle);
TEST_CYCLE_N(200) cv::phase(X, Y, angle, true);
SANITY_CHECK(angle, 5e-5);
}
PERF_TEST_P(VectorLength, phase64f, testing::Values(128, 1000, 128*1024, 512*1024, 1024*1024))
{
size_t length = GetParam();
vector<double> X(length);
vector<double> Y(length);
vector<double> angle(length);
declare.in(X, Y, WARMUP_RNG).out(angle);
TEST_CYCLE_N(200) cv::phase(X, Y, angle, true);
SANITY_CHECK(angle, 5e-5);
}
///////////// Magnitude /////////////
typedef Size_MatType MagnitudeFixture;
PERF_TEST_P(MagnitudeFixture, Magnitude,
testing::Combine(testing::Values(TYPICAL_MAT_SIZES), testing::Values(CV_32F, CV_64F)))
{
cv::Size size = std::get<0>(GetParam());
int type = std::get<1>(GetParam());
cv::Mat x(size, type);
cv::Mat y(size, type);
cv::Mat magnitude(size, type);
declare.in(x, y, WARMUP_RNG).out(magnitude);
TEST_CYCLE() cv::magnitude(x, y, magnitude);
SANITY_CHECK_NOTHING();
}
///////////// Cart to Polar /////////////
typedef Size_MatType CartToPolarFixture;
PERF_TEST_P(CartToPolarFixture, CartToPolar,
testing::Combine(testing::Values(TYPICAL_MAT_SIZES), testing::Values(CV_32F, CV_64F)))
{
cv::Size size = std::get<0>(GetParam());
int type = std::get<1>(GetParam());
cv::Mat x(size, type);
cv::Mat y(size, type);
cv::Mat magnitude(size, type);
cv::Mat angle(size, type);
declare.in(x, y, WARMUP_RNG).out(magnitude, angle);
TEST_CYCLE() cv::cartToPolar(x, y, magnitude, angle);
SANITY_CHECK_NOTHING();
}
///////////// Polar to Cart /////////////
typedef Size_MatType PolarToCartFixture;
PERF_TEST_P(PolarToCartFixture, PolarToCart,
testing::Combine(testing::Values(TYPICAL_MAT_SIZES), testing::Values(CV_32F, CV_64F)))
{
cv::Size size = std::get<0>(GetParam());
int type = std::get<1>(GetParam());
cv::Mat magnitude(size, type);
cv::Mat angle(size, type);
cv::Mat x(size, type);
cv::Mat y(size, type);
declare.in(magnitude, angle, WARMUP_RNG).out(x, y);
TEST_CYCLE() cv::polarToCart(magnitude, angle, x, y);
SANITY_CHECK_NOTHING();
}
// generates random vectors, performs Gram-Schmidt orthogonalization on them
Mat randomOrtho(int rows, int ftype, RNG& rng)
{
Mat result(rows, rows, ftype);
rng.fill(result, RNG::UNIFORM, cv::Scalar(-1), cv::Scalar(1));
for (int i = 0; i < rows; i++)
{
Mat v = result.row(i);
for (int j = 0; j < i; j++)
{
Mat p = result.row(j);
v -= p.dot(v) * p;
}
v = v * (1. / cv::norm(v));
}
return result;
}
template<typename FType>
Mat buildRandomMat(int rows, int cols, RNG& rng, int rank, bool symmetrical)
{
int mtype = cv::traits::Depth<FType>::value;
Mat u = randomOrtho(rows, mtype, rng);
Mat v = randomOrtho(cols, mtype, rng);
Mat s(rows, cols, mtype, Scalar(0));
std::vector<FType> singVals(rank);
rng.fill(singVals, RNG::UNIFORM, Scalar(0), Scalar(10));
std::sort(singVals.begin(), singVals.end());
auto singIter = singVals.rbegin();
for (int i = 0; i < rank; i++)
{
s.at<FType>(i, i) = *singIter++;
}
if (symmetrical)
return u * s * u.t();
else
return u * s * v.t();
}
Mat buildRandomMat(int rows, int cols, int mtype, RNG& rng, int rank, bool symmetrical)
{
if (mtype == CV_32F)
{
return buildRandomMat<float>(rows, cols, rng, rank, symmetrical);
}
else if (mtype == CV_64F)
{
return buildRandomMat<double>(rows, cols, rng, rank, symmetrical);
}
else
{
CV_Error(cv::Error::StsBadArg, "This type is not supported");
}
}
CV_ENUM(SolveDecompEnum, DECOMP_LU, DECOMP_SVD, DECOMP_EIG, DECOMP_CHOLESKY, DECOMP_QR)
enum RankMatrixOptions
{
RANK_HALF, RANK_MINUS_1, RANK_FULL
};
CV_ENUM(RankEnum, RANK_HALF, RANK_MINUS_1, RANK_FULL)
enum SolutionsOptions
{
NO_SOLUTIONS, ONE_SOLUTION, MANY_SOLUTIONS
};
CV_ENUM(SolutionsEnum, NO_SOLUTIONS, ONE_SOLUTION, MANY_SOLUTIONS)
typedef perf::TestBaseWithParam<std::tuple<int, RankEnum, MatDepth, SolveDecompEnum, bool, SolutionsEnum>> SolveTest;
PERF_TEST_P(SolveTest, randomMat, ::testing::Combine(
::testing::Values(31, 64, 100),
::testing::Values(RANK_HALF, RANK_MINUS_1, RANK_FULL),
::testing::Values(CV_32F, CV_64F),
::testing::Values(DECOMP_LU, DECOMP_SVD, DECOMP_EIG, DECOMP_CHOLESKY, DECOMP_QR),
::testing::Bool(), // normal
::testing::Values(NO_SOLUTIONS, ONE_SOLUTION, MANY_SOLUTIONS)
))
{
auto t = GetParam();
int size = std::get<0>(t);
auto rankEnum = std::get<1>(t);
int mtype = std::get<2>(t);
int method = std::get<3>(t);
bool normal = std::get<4>(t);
auto solutions = std::get<5>(t);
bool symmetrical = (method == DECOMP_CHOLESKY || method == DECOMP_LU);
if (normal)
{
method |= DECOMP_NORMAL;
}
int rank = size;
switch (rankEnum)
{
case RANK_HALF: rank /= 2; break;
case RANK_MINUS_1: rank -= 1; break;
default: break;
}
RNG& rng = theRNG();
Mat A = buildRandomMat(size, size, mtype, rng, rank, symmetrical);
Mat x(size, 1, mtype);
Mat b(size, 1, mtype);
switch (solutions)
{
// no solutions, let's make b random
case NO_SOLUTIONS:
{
rng.fill(b, RNG::UNIFORM, Scalar(-1), Scalar(1));
}
break;
// exactly 1 solution, let's combine b from A and x
case ONE_SOLUTION:
{
rng.fill(x, RNG::UNIFORM, Scalar(-10), Scalar(10));
b = A * x;
}
break;
// infinitely many solutions, let's make b zero
default:
{
b = 0;
}
break;
}
TEST_CYCLE() cv::solve(A, b, x, method);
SANITY_CHECK_NOTHING();
}
typedef perf::TestBaseWithParam<std::tuple<std::tuple<int, int>, RankEnum, MatDepth, bool, bool>> SvdTest;
PERF_TEST_P(SvdTest, decompose, ::testing::Combine(
::testing::Values(std::make_tuple(5, 15), std::make_tuple(32, 32), std::make_tuple(100, 100)),
::testing::Values(RANK_HALF, RANK_MINUS_1, RANK_FULL),
::testing::Values(CV_32F, CV_64F),
::testing::Bool(), // symmetrical
::testing::Bool() // needUV
))
{
auto t = GetParam();
auto rc = std::get<0>(t);
auto rankEnum = std::get<1>(t);
int mtype = std::get<2>(t);
bool symmetrical = std::get<3>(t);
bool needUV = std::get<4>(t);
int rows = std::get<0>(rc);
int cols = std::get<1>(rc);
if (symmetrical)
{
rows = max(rows, cols);
cols = rows;
}
int rank = std::min(rows, cols);
switch (rankEnum)
{
case RANK_HALF: rank /= 2; break;
case RANK_MINUS_1: rank -= 1; break;
default: break;
}
int flags = needUV ? 0 : SVD::NO_UV;
RNG& rng = theRNG();
Mat A = buildRandomMat(rows, cols, mtype, rng, rank, symmetrical);
TEST_CYCLE() cv::SVD svd(A, flags);
SANITY_CHECK_NOTHING();
}
PERF_TEST_P(SvdTest, backSubst, ::testing::Combine(
::testing::Values(std::make_tuple(5, 15), std::make_tuple(32, 32), std::make_tuple(100, 100)),
::testing::Values(RANK_HALF, RANK_MINUS_1, RANK_FULL),
::testing::Values(CV_32F, CV_64F),
// back substitution works the same regardless of source matrix properties
::testing::Values(true),
// back substitution has no sense without u and v
::testing::Values(true) // needUV
))
{
auto t = GetParam();
auto rc = std::get<0>(t);
auto rankEnum = std::get<1>(t);
int mtype = std::get<2>(t);
int rows = std::get<0>(rc);
int cols = std::get<1>(rc);
int rank = std::min(rows, cols);
switch (rankEnum)
{
case RANK_HALF: rank /= 2; break;
case RANK_MINUS_1: rank -= 1; break;
default: break;
}
RNG& rng = theRNG();
Mat A = buildRandomMat(rows, cols, mtype, rng, rank, /* symmetrical */ false);
cv::SVD svd(A);
// preallocate to not spend time on it during backSubst()
Mat dst(cols, 1, mtype);
Mat rhs(rows, 1, mtype);
rng.fill(rhs, RNG::UNIFORM, Scalar(-10), Scalar(10));
TEST_CYCLE() svd.backSubst(rhs, dst);
SANITY_CHECK_NOTHING();
}
typedef perf::TestBaseWithParam< testing::tuple<int, int, int> > KMeans;
PERF_TEST_P_(KMeans, single_iter)
{
RNG& rng = theRNG();
const int K = testing::get<0>(GetParam());
const int dims = testing::get<1>(GetParam());
const int N = testing::get<2>(GetParam());
const int attempts = 5;
Mat data(N, dims, CV_32F);
rng.fill(data, RNG::UNIFORM, -0.1, 0.1);
const int N0 = K;
Mat data0(N0, dims, CV_32F);
rng.fill(data0, RNG::UNIFORM, -1, 1);
for (int i = 0; i < N; i++)
{
int base = rng.uniform(0, N0);
cv::add(data0.row(base), data.row(i), data.row(i));
}
declare.in(data);
Mat labels, centers;
TEST_CYCLE()
{
kmeans(data, K, labels, TermCriteria(TermCriteria::MAX_ITER+TermCriteria::EPS, 1, 0),
attempts, KMEANS_PP_CENTERS, centers);
}
SANITY_CHECK_NOTHING();
}
PERF_TEST_P_(KMeans, good)
{
RNG& rng = theRNG();
const int K = testing::get<0>(GetParam());
const int dims = testing::get<1>(GetParam());
const int N = testing::get<2>(GetParam());
const int attempts = 5;
Mat data(N, dims, CV_32F);
rng.fill(data, RNG::UNIFORM, -0.1, 0.1);
const int N0 = K;
Mat data0(N0, dims, CV_32F);
rng.fill(data0, RNG::UNIFORM, -1, 1);
for (int i = 0; i < N; i++)
{
int base = rng.uniform(0, N0);
cv::add(data0.row(base), data.row(i), data.row(i));
}
declare.in(data);
Mat labels, centers;
TEST_CYCLE()
{
kmeans(data, K, labels, TermCriteria(TermCriteria::MAX_ITER+TermCriteria::EPS, 30, 0),
attempts, KMEANS_PP_CENTERS, centers);
}
SANITY_CHECK_NOTHING();
}
PERF_TEST_P_(KMeans, with_duplicates)
{
RNG& rng = theRNG();
const int K = testing::get<0>(GetParam());
const int dims = testing::get<1>(GetParam());
const int N = testing::get<2>(GetParam());
const int attempts = 5;
Mat data(N, dims, CV_32F, Scalar::all(0));
const int N0 = std::max(2, K * 2 / 3);
Mat data0(N0, dims, CV_32F);
rng.fill(data0, RNG::UNIFORM, -1, 1);
for (int i = 0; i < N; i++)
{
int base = rng.uniform(0, N0);
data0.row(base).copyTo(data.row(i));
}
declare.in(data);
Mat labels, centers;
TEST_CYCLE()
{
kmeans(data, K, labels, TermCriteria(TermCriteria::MAX_ITER+TermCriteria::EPS, 30, 0),
attempts, KMEANS_PP_CENTERS, centers);
}
SANITY_CHECK_NOTHING();
}
INSTANTIATE_TEST_CASE_P(/*nothing*/ , KMeans,
testing::Values(
// K clusters, dims, N points
testing::make_tuple(2, 3, 100000),
testing::make_tuple(4, 3, 500),
testing::make_tuple(4, 3, 1000),
testing::make_tuple(4, 3, 10000),
testing::make_tuple(8, 3, 1000),
testing::make_tuple(8, 16, 1000),
testing::make_tuple(8, 64, 1000),
testing::make_tuple(16, 16, 1000),
testing::make_tuple(16, 32, 1000),
testing::make_tuple(32, 16, 1000),
testing::make_tuple(32, 32, 1000),
testing::make_tuple(100, 2, 1000)
)
);
}
} // namespace