@endhtmlonly"
diff --git a/doc/opencv.bib b/doc/opencv.bib
index d44b0f5293..d0661e8d5f 100644
--- a/doc/opencv.bib
+++ b/doc/opencv.bib
@@ -850,12 +850,12 @@
journal = {IEEE Transactions on Robotics and Automation},
title = {Robot sensor calibration: solving AX=XB on the Euclidean group},
year = {1994},
+ month = oct,
volume = {10},
number = {5},
pages = {717-721},
doi = {10.1109/70.326576},
- ISSN = {1042-296X},
- month = {Oct}
+ issn = {1042-296X}
}
@inproceedings{PM03,
author = {P{\'e}rez, Patrick and Gangnet, Michel and Blake, Andrew},
@@ -1051,12 +1051,12 @@
journal = {IEEE Transactions on Robotics and Automation},
title = {A new technique for fully autonomous and efficient 3D robotics hand/eye calibration},
year = {1989},
+ month = jun,
volume = {5},
number = {3},
pages = {345-358},
doi = {10.1109/70.34770},
- ISSN = {1042-296X},
- month = {June}
+ issn = {1042-296X}
}
@inproceedings{UES01,
author = {Uyttendaele, Matthew and Eden, Ashley and Skeliski, R},
@@ -1324,3 +1324,13 @@
pages={5551--5560},
year={2017}
}
+@article{umeyama1991least,
+ title={Least-squares estimation of transformation parameters between two point patterns},
+ author={Umeyama, Shinji},
+ journal={IEEE Computer Architecture Letters},
+ volume={13},
+ number={04},
+ pages={376--380},
+ year={1991},
+ publisher={IEEE Computer Society}
+}
diff --git a/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.markdown b/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.markdown
index e24e692087..60e5686934 100644
--- a/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.markdown
+++ b/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.markdown
@@ -40,12 +40,12 @@ using **cv.Sobel()**).
Then comes the main part. After this, they created a score, basically an equation, which
determines if a window can contain a corner or not.
-\f[R = det(M) - k(trace(M))^2\f]
+\f[R = \det(M) - k(\operatorname{trace}(M))^2\f]
where
- - \f$det(M) = \lambda_1 \lambda_2\f$
- - \f$trace(M) = \lambda_1 + \lambda_2\f$
- - \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigenvalues of M
+ - \f$\det(M) = \lambda_1 \lambda_2\f$
+ - \f$\operatorname{trace}(M) = \lambda_1 + \lambda_2\f$
+ - \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigenvalues of \f$M\f$
So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.
diff --git a/doc/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.markdown b/doc/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.markdown
index 1229581ce6..c5d29493e4 100644
--- a/doc/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.markdown
+++ b/doc/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.markdown
@@ -20,7 +20,7 @@ Harris Corner Detector. The scoring function in Harris Corner Detector was given
Instead of this, Shi-Tomasi proposed:
-\f[R = min(\lambda_1, \lambda_2)\f]
+\f[R = \min(\lambda_1, \lambda_2)\f]
If it is a greater than a threshold value, it is considered as a corner. If we plot it in
\f$\lambda_1 - \lambda_2\f$ space as we did in Harris Corner Detector, we get an image as below:
@@ -28,7 +28,7 @@ If it is a greater than a threshold value, it is considered as a corner. If we p

From the figure, you can see that only when \f$\lambda_1\f$ and \f$\lambda_2\f$ are above a minimum value,
-\f$\lambda_{min}\f$, it is considered as a corner(green region).
+\f$\lambda_{\min}\f$, it is considered as a corner(green region).
Code
----
diff --git a/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.markdown b/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.markdown
index dee4df774a..bbbae6a3e6 100644
--- a/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.markdown
+++ b/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.markdown
@@ -156,7 +156,7 @@ sift = cv.SIFT_create()
kp, des = sift.detectAndCompute(gray,None)
@endcode
Here kp will be a list of keypoints and des is a numpy array of shape
-\f$Number\_of\_Keypoints \times 128\f$.
+\f$\text{(Number of Keypoints)} \times 128\f$.
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images.
That we will learn in coming chapters.
diff --git a/doc/tools/html_functions.py b/doc/tools/html_functions.py
index b76639cea5..204f6d1c1b 100644
--- a/doc/tools/html_functions.py
+++ b/doc/tools/html_functions.py
@@ -107,17 +107,10 @@ def add_signature_to_table(soup, table, signature, language, type):
""" Add a signature to an html table"""
row = soup.new_tag('tr')
row.append(soup.new_tag('td', style='width: 20px;'))
-
- if 'ret' in signature:
- row.append(append(soup.new_tag('td'), signature['ret']))
- row.append(append(soup.new_tag('td'), '='))
- else:
- row.append(soup.new_tag('td')) # return values
- row.append(soup.new_tag('td')) # '='
-
row.append(append(soup.new_tag('td'), signature['name'] + '('))
row.append(append(soup.new_tag('td', **{'class': 'paramname'}), signature['arg']))
- row.append(append(soup.new_tag('td'), ')'))
+ row.append(append(soup.new_tag('td'), ') -> '))
+ row.append(append(soup.new_tag('td'), signature['ret']))
table.append(row)
diff --git a/doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown b/doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown
index 58419f8618..1bba591074 100644
--- a/doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown
+++ b/doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown
@@ -87,7 +87,7 @@ The tutorial consists of two main programs:
The application starts up extracting the ORB features and descriptors from the input image and
then uses the mesh along with the [Möller–Trumbore intersection
- algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
+ algorithm](http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
are stored in different lists in a file with YAML format which each row is a different point. The
technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
diff --git a/doc/tutorials/introduction/config_reference/config_reference.markdown b/doc/tutorials/introduction/config_reference/config_reference.markdown
index 1d4f426c8f..438cc70288 100644
--- a/doc/tutorials/introduction/config_reference/config_reference.markdown
+++ b/doc/tutorials/introduction/config_reference/config_reference.markdown
@@ -396,13 +396,14 @@ There are multiple less popular frameworks which can be used to read and write v
### videoio plugins
-Some _videoio_ backends can be built as plugins thus breaking strict dependency on third-party libraries and making them optional at runtime. Following options can be used to control this mechanism:
+Since version 4.1.0 some _videoio_ backends can be built as plugins thus breaking strict dependency on third-party libraries and making them optional at runtime. Following options can be used to control this mechanism:
| Option | Default | Description |
| --------| ------ | ------- |
| `VIDEOIO_ENABLE_PLUGINS` | _ON_ | Enable or disable plugins completely. |
| `VIDEOIO_PLUGIN_LIST` | _empty_ | Comma- or semicolon-separated list of backend names to be compiled as plugins. Supported names are _ffmpeg_, _gstreamer_, _msmf_, _mfx_ and _all_. |
-| `VIDEOIO_ENABLE_STRICT_PLUGIN_CHECK` | _ON_ | Enable strict runtime version check to only allow plugins built with the same version of OpenCV. |
+
+Check @ref tutorial_general_install for standalone plugins build instructions.
## Parallel processing {#tutorial_config_reference_func_core}
@@ -421,6 +422,17 @@ Some of OpenCV algorithms can use multithreading to accelerate processing. OpenC
@note OpenCV can download and build TBB library from GitHub, this functionality can be enabled with the `BUILD_TBB` option.
+### Threading plugins
+
+Since version 4.5.2 OpenCV supports dynamically loaded threading backends. At this moment only separate compilation process is supported: first you have to build OpenCV with some _default_ parallel backend (e.g. pthreads), then build each plugin and copy resulting binaries to the _lib_ or _bin_ folder.
+
+| Option | Default | Description |
+| ------ | ------- | ----------- |
+| PARALLEL_ENABLE_PLUGINS | ON | Enable plugin support, if this option is disabled OpenCV will not try to load anything |
+
+Check @ref tutorial_general_install for standalone plugins build instructions.
+
+
## GUI backends (highgui module) {#tutorial_config_reference_highgui}
OpenCV relies on various GUI libraries for window drawing.
@@ -442,6 +454,18 @@ OpenCV relies on various GUI libraries for window drawing.
OpenGL integration can be used to draw HW-accelerated windows with following backends: GTK, WIN32 and Qt. And enables basic interoperability with OpenGL, see @ref core_opengl and @ref highgui_opengl for details.
+### highgui plugins
+
+Since OpenCV 4.5.3 GTK backend can be build as a dynamically loaded plugin. Following options can be used to control this mechanism:
+
+| Option | Default | Description |
+| --------| ------ | ------- |
+| `HIGHGUI_ENABLE_PLUGINS` | _ON_ | Enable or disable plugins completely. |
+| `HIGHGUI_PLUGIN_LIST` | _empty_ | Comma- or semicolon-separated list of backend names to be compiled as plugins. Supported names are _gtk_, _gtk2_, _gtk3_, and _all_. |
+
+Check @ref tutorial_general_install for standalone plugins build instructions.
+
+
## Deep learning neural networks inference backends and options (dnn module) {#tutorial_config_reference_dnn}
OpenCV have own DNN inference module which have own build-in engine, but can also use other libraries for optimized processing. Multiple backends can be enabled in single build. Selection happens at runtime automatically or manually.
diff --git a/doc/tutorials/introduction/general_install/general_install.markdown b/doc/tutorials/introduction/general_install/general_install.markdown
index e8c93f430e..7b0c5d2b06 100644
--- a/doc/tutorials/introduction/general_install/general_install.markdown
+++ b/doc/tutorials/introduction/general_install/general_install.markdown
@@ -105,7 +105,7 @@ cmake --build
make
```
-## Step 3: Install {#tutorial_general_install_sources_4}
+## (optional) Step 3: Install {#tutorial_general_install_sources_4}
During installation procedure build results and other files from build directory will be copied to the install location. Default installation location is `/usr/local` on UNIX and `C:/Program Files` on Windows. This location can be changed at the configuration step by setting `CMAKE_INSTALL_PREFIX` option. To perform installation run the following command:
```
@@ -117,3 +117,32 @@ This step is optional, OpenCV can be used directly from the build directory.
@note
If the installation root location is a protected system directory, so the installation process must be run with superuser or administrator privileges (e.g. `sudo cmake ...`).
+
+
+## (optional) Step 4: Build plugins {#tutorial_general_install_plugins_4}
+
+It is possible to decouple some of OpenCV dependencies and make them optional by extracting parts of the code into dynamically-loaded plugins. It helps to produce adaptive binary distributions which can work on systems with less dependencies and extend functionality just by installing missing libraries. For now modules _core_, _videoio_ and _highgui_ support this mechanism for some of their dependencies. In some cases it is possible to build plugins together with OpenCV by setting options like `VIDEOIO_PLUGIN_LIST` or `HIGHGUI_PLUGIN_LIST`, more options related to this scenario can be found in the @ref tutorial_config_reference. In other cases plugins should be built separately in their own build procedure and this section describes such standalone build process.
+
+@note It is recommended to use compiler, configuration and build options which are compatible to the one used for OpenCV build, otherwise resulting library can refuse to load or cause other runtime problems. Note that some functionality can be limited or work slower when backends are loaded dynamically due to extra barrier between OpenCV and corresponding third-party library.
+
+Build procedure is similar to the main OpenCV build, but you have to use special CMake projects located in corresponding subdirectories, these folders can also contain reference scripts and Docker images. It is important to use `opencv__` name prefix for plugins so that loader is able to find them. Each supported prefix can be used to load only one library, however multiple candidates can be probed for a single prefix. For example, you can have _libopencv_videoio_ffmpeg_3.so_ and _libopencv_videoio_ffmpeg_4.so_ plugins and the first one which can be loaded successfully will occupy internal slot and stop probing process. Possible prefixes and project locations are presented in the table below:
+
+| module | backends | location |
+| ------ | -------- | -------- |
+| core | parallel_tbb, parallel_onetbb, parallel_openmp | _opencv/modules/core/misc/plugins_ |
+| highgui | gtk, gtk2, gtk3 | _opencv/modules/highgui/misc/plugins_ |
+| videoio | ffmpeg, gstreamer, intel_mfx, msmf | _opencv/modules/videoio/misc_ |
+
+Example:
+```.sh
+# set-up environment for TBB detection, for example:
+# export TBB_DIR=
+cmake -G \
+ -DOPENCV_PLUGIN_NAME=opencv_core_tbb_ \
+ -DOPENCV_PLUGIN_DESTINATION= \
+ -DCMAKE_BUILD_TYPE= \
+ /modules/core/misc/plugins/parallel_tbb
+cmake --build . --config
+```
+
+@note On Windows plugins must be linked with existing OpenCV build. Set `OpenCV_DIR` environment or CMake variable to the directory with _OpenCVConfig.cmake_ file, it can be OpenCV build directory or some path in the location where you performed installation.
diff --git a/doc/tutorials/introduction/linux_install/linux_install.markdown b/doc/tutorials/introduction/linux_install/linux_install.markdown
index af06810cdd..46f67ac61c 100644
--- a/doc/tutorials/introduction/linux_install/linux_install.markdown
+++ b/doc/tutorials/introduction/linux_install/linux_install.markdown
@@ -108,7 +108,7 @@ CMake package files will be located in the build root:
## Install
@warning
-Installation process only copies files to predefined locations and do minor patching. Library installed using this method is not integrated into the system package registry and can not be uninstalled automatically. We do not recommend system-wide installation to regular users due to possible conflicts with system packages.
+The installation process only copies files to predefined locations and does minor patching. Installing using this method does not integrate opencv into the system package registry and thus, for example, opencv can not be uninstalled automatically. We do not recommend system-wide installation to regular users due to possible conflicts with system packages.
By default OpenCV will be installed to the `/usr/local` directory, all files will be copied to following locations:
* `/usr/local/bin` - executable files
diff --git a/modules/3d/include/opencv2/3d.hpp b/modules/3d/include/opencv2/3d.hpp
index 6984b705a2..7fdd4cc4e4 100644
--- a/modules/3d/include/opencv2/3d.hpp
+++ b/modules/3d/include/opencv2/3d.hpp
@@ -1852,6 +1852,33 @@ CV_EXPORTS_W int estimateAffine3D(InputArray src, InputArray dst,
OutputArray out, OutputArray inliers,
double ransacThreshold = 3, double confidence = 0.99);
+/** @brief Computes an optimal affine transformation between two 3D point sets.
+
+It computes \f$R,s,t\f$ minimizing \f$\sum{i} dst_i - c \cdot R \cdot src_i \f$
+where \f$R\f$ is a 3x3 rotation matrix, \f$t\f$ is a 3x1 translation vector and \f$s\f$ is a
+scalar size value. This is an implementation of the algorithm by Umeyama \cite umeyama1991least .
+The estimated affine transform has a homogeneous scale which is a subclass of affine
+transformations with 7 degrees of freedom. The paired point sets need to comprise at least 3
+points each.
+
+@param src First input 3D point set.
+@param dst Second input 3D point set.
+@param scale If null is passed, the scale parameter c will be assumed to be 1.0.
+Else the pointed-to variable will be set to the optimal scale.
+@param force_rotation If true, the returned rotation will never be a reflection.
+This might be unwanted, e.g. when optimizing a transform between a right- and a
+left-handed coordinate system.
+@return 3D affine transformation matrix \f$3 \times 4\f$ of the form
+\f[T =
+\begin{bmatrix}
+R & t\\
+\end{bmatrix}
+\f]
+
+ */
+CV_EXPORTS_W cv::Mat estimateAffine3D(InputArray src, InputArray dst,
+ CV_OUT double* scale = nullptr, bool force_rotation = true);
+
/** @brief Computes an optimal translation between two 3D point sets.
*
* It computes
diff --git a/modules/3d/src/ptsetreg.cpp b/modules/3d/src/ptsetreg.cpp
index 04f665fdab..6482d872be 100644
--- a/modules/3d/src/ptsetreg.cpp
+++ b/modules/3d/src/ptsetreg.cpp
@@ -899,6 +899,86 @@ int estimateAffine3D(InputArray _from, InputArray _to,
return createRANSACPointSetRegistrator(makePtr(), 4, ransacThreshold, confidence)->run(dFrom, dTo, _out, _inliers);
}
+Mat estimateAffine3D(InputArray _from, InputArray _to,
+ CV_OUT double* _scale, bool force_rotation)
+{
+ CV_INSTRUMENT_REGION();
+ Mat from = _from.getMat(), to = _to.getMat();
+ int count = from.checkVector(3);
+
+ CV_CheckGE(count, 3, "Umeyama algorithm needs at least 3 points for affine transformation estimation.");
+ CV_CheckEQ(to.checkVector(3), count, "Point sets need to have the same size");
+ from = from.reshape(1, count);
+ to = to.reshape(1, count);
+ if(from.type() != CV_64F)
+ from.convertTo(from, CV_64F);
+ if(to.type() != CV_64F)
+ to.convertTo(to, CV_64F);
+
+ const double one_over_n = 1./count;
+
+ const auto colwise_mean = [one_over_n](const Mat& m)
+ {
+ Mat my;
+ reduce(m, my, 0, REDUCE_SUM, CV_64F);
+ return my * one_over_n;
+ };
+
+ const auto demean = [count](const Mat& A, const Mat& mean)
+ {
+ Mat A_centered = Mat::zeros(count, 3, CV_64F);
+ for(int i = 0; i < count; i++)
+ {
+ A_centered.row(i) = A.row(i) - mean;
+ }
+ return A_centered;
+ };
+
+ Mat from_mean = colwise_mean(from);
+ Mat to_mean = colwise_mean(to);
+
+ Mat from_centered = demean(from, from_mean);
+ Mat to_centered = demean(to, to_mean);
+
+ Mat cov = to_centered.t() * from_centered * one_over_n;
+
+ Mat u,d,vt;
+ SVD::compute(cov, d, u, vt, SVD::MODIFY_A | SVD::FULL_UV);
+
+ CV_CheckGE(countNonZero(d), 2, "Points cannot be colinear");
+
+ Mat S = Mat::eye(3, 3, CV_64F);
+ // det(d) can only ever be >=0, so we can always use this here (compared to the original formula by Umeyama)
+ if (force_rotation && (determinant(u) * determinant(vt) < 0))
+ {
+ S.at(2, 2) = -1;
+ }
+ Mat rmat = u*S*vt;
+
+ double scale = 1.0;
+ if (_scale)
+ {
+ double var_from = 0.;
+ scale = 0.;
+ for(int i = 0; i < 3; i++)
+ {
+ var_from += norm(from_centered.col(i), NORM_L2SQR);
+ scale += d.at(i, 0) * S.at(i, i);
+ }
+ double inverse_var = count / var_from;
+ scale *= inverse_var;
+ *_scale = scale;
+ }
+ Mat new_to = scale * rmat * from_mean.t();
+
+ Mat transform;
+ transform.create(3, 4, CV_64F);
+ Mat r_part(transform(Rect(0, 0, 3, 3)));
+ rmat.copyTo(r_part);
+ transform.col(3) = to_mean.t() - new_to;
+ return transform;
+}
+
int estimateTranslation3D(InputArray _from, InputArray _to,
OutputArray _out, OutputArray _inliers,
double ransacThreshold, double confidence)
diff --git a/modules/3d/src/solvepnp.cpp b/modules/3d/src/solvepnp.cpp
index 03fb6f88c0..e358557859 100644
--- a/modules/3d/src/solvepnp.cpp
+++ b/modules/3d/src/solvepnp.cpp
@@ -402,7 +402,14 @@ bool solvePnPRansac( InputArray objectPoints, InputArray imagePoints,
Ptr ransac_output;
if (usac::run(model_params, imagePoints, objectPoints, model_params->getRandomGeneratorState(),
ransac_output, cameraMatrix, noArray(), distCoeffs, noArray())) {
- usac::saveMask(inliers, ransac_output->getInliersMask());
+ if (inliers.needed()) {
+ const auto &inliers_mask = ransac_output->getInliersMask();
+ Mat inliers_;
+ for (int i = 0; i < (int)inliers_mask.size(); i++)
+ if (inliers_mask[i])
+ inliers_.push_back(i);
+ inliers_.copyTo(inliers);
+ }
const Mat &model = ransac_output->getModel();
model.col(0).copyTo(rvec);
model.col(1).copyTo(tvec);
diff --git a/modules/3d/src/usac/ransac_solvers.cpp b/modules/3d/src/usac/ransac_solvers.cpp
index 0c7637d582..b7f3e6e0c1 100644
--- a/modules/3d/src/usac/ransac_solvers.cpp
+++ b/modules/3d/src/usac/ransac_solvers.cpp
@@ -408,10 +408,11 @@ int mergePoints (InputArray pts1_, InputArray pts2_, Mat &pts, bool ispnp) {
void saveMask (OutputArray mask, const std::vector &inliers_mask) {
if (mask.needed()) {
const int points_size = (int) inliers_mask.size();
- mask.create(points_size, 1, CV_8U);
- auto * maskptr = mask.getMat().ptr();
+ Mat tmp_mask(points_size, 1, CV_8U);
+ auto * maskptr = tmp_mask.ptr();
for (int i = 0; i < points_size; i++)
maskptr[i] = (uchar) inliers_mask[i];
+ tmp_mask.copyTo(mask);
}
}
void setParameters (Ptr ¶ms, EstimationMethod estimator, const UsacParams &usac_params,
@@ -538,23 +539,26 @@ Mat findEssentialMat (InputArray points1, InputArray points2, InputArray cameraM
bool solvePnPRansac( InputArray objectPoints, InputArray imagePoints,
InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec,
bool /*useExtrinsicGuess*/, int max_iters, float thr, double conf,
- OutputArray mask, int method) {
+ OutputArray inliers, int method) {
Ptr params;
setParameters(method, params, cameraMatrix.empty() ? EstimationMethod ::P6P : EstimationMethod ::P3P,
- thr, max_iters, conf, mask.needed());
+ thr, max_iters, conf, inliers.needed());
Ptr ransac_output;
if (run(params, imagePoints, objectPoints, params->getRandomGeneratorState(),
ransac_output, cameraMatrix, noArray(), distCoeffs, noArray())) {
- saveMask(mask, ransac_output->getInliersMask());
+ if (inliers.needed()) {
+ const auto &inliers_mask = ransac_output->getInliersMask();
+ Mat inliers_;
+ for (int i = 0; i < (int)inliers_mask.size(); i++)
+ if (inliers_mask[i])
+ inliers_.push_back(i);
+ inliers_.copyTo(inliers);
+ }
const Mat &model = ransac_output->getModel();
model.col(0).copyTo(rvec);
model.col(1).copyTo(tvec);
return true;
}
- if (mask.needed()){
- mask.create(std::max(objectPoints.getMat().rows, objectPoints.getMat().cols), 1, CV_8U);
- mask.setTo(Scalar::all(0));
- }
return false;
}
diff --git a/modules/3d/test/test_affine3d_estimator.cpp b/modules/3d/test/test_affine3d_estimator.cpp
index 521b01ac08..f5a118da5d 100644
--- a/modules/3d/test/test_affine3d_estimator.cpp
+++ b/modules/3d/test/test_affine3d_estimator.cpp
@@ -41,6 +41,7 @@
//M*/
#include "test_precomp.hpp"
+#include "opencv2/core/affine.hpp"
namespace opencv_test { namespace {
@@ -201,4 +202,25 @@ TEST(Calib3d_EstimateAffine3D, regression_16007)
EXPECT_EQ(1, res);
}
+TEST(Calib3d_EstimateAffine3D, umeyama_3_pt)
+{
+ std::vector points = {{{0.80549149, 0.8225781, 0.79949521},
+ {0.28906756, 0.57158557, 0.9864789},
+ {0.58266182, 0.65474983, 0.25078834}}};
+ cv::Mat R = (cv::Mat_(3,3) << 0.9689135, -0.0232753, 0.2463025,
+ 0.0236362, 0.9997195, 0.0014915,
+ -0.2462682, 0.0043765, 0.9691918);
+ cv::Vec3d t(1., 2., 3.);
+ cv::Affine3d transform(R, t);
+ std::vector transformed_points(points.size());
+ std::transform(points.begin(), points.end(), transformed_points.begin(), [transform](const cv::Vec3d v){return transform * v;});
+ double scale;
+ cv::Mat trafo_est = estimateAffine3D(points, transformed_points, &scale);
+ Mat R_est(trafo_est(Rect(0, 0, 3, 3)));
+ EXPECT_LE(cvtest::norm(R_est, R, NORM_INF), 1e-6);
+ Vec3d t_est = trafo_est.col(3);
+ EXPECT_LE(cvtest::norm(t_est, t, NORM_INF), 1e-6);
+ EXPECT_NEAR(scale, 1.0, 1e-6);
+}
+
}} // namespace
diff --git a/modules/3d/test/test_usac.cpp b/modules/3d/test/test_usac.cpp
index fb5641bd1e..6e6d8cecf3 100644
--- a/modules/3d/test/test_usac.cpp
+++ b/modules/3d/test/test_usac.cpp
@@ -345,11 +345,16 @@ TEST(usac_P3P, accuracy) {
log(1 - pow(inl_ratio, 3 /* sample size */));
for (auto flag : flags) {
+ std::vector inliers;
cv::Mat rvec, tvec, mask, R, P;
CV_Assert(cv::solvePnPRansac(obj_pts, img_pts, K1, cv::noArray(), rvec, tvec,
- false, (int)max_iters, (float)thr, conf, mask, flag));
+ false, (int)max_iters, (float)thr, conf, inliers, flag));
cv::Rodrigues(rvec, R);
cv::hconcat(K1 * R, K1 * tvec, P);
+ mask.create(pts_size, 1, CV_8U);
+ mask.setTo(Scalar::all(0));
+ for (auto inl : inliers)
+ mask.at(inl) = true;
checkInliersMask(TestSolver ::PnP, inl_size, thr, img_pts, obj_pts, P, mask);
}
}
@@ -416,19 +421,27 @@ TEST(usac_testUsacParams, accuracy) {
// CV_Error(cv::Error::StsError, "Essential matrix estimation failed!");
}
+ std::vector inliers(pts_size);
// P3P
inl_size = generatePoints(rng, pts1, pts2, K1, K2, false, pts_size, TestSolver::PnP,
getInlierRatio(usac_params.maxIterations, 3, usac_params.confidence), 0.01, gt_inliers);
- CV_Assert(cv::solvePnPRansac(pts2, pts1, K1, dist_coeff, rvec, tvec, mask, usac_params));
+ CV_Assert(cv::solvePnPRansac(pts2, pts1, K1, dist_coeff, rvec, tvec, inliers, usac_params));
cv::Rodrigues(rvec, R); cv::hconcat(K1 * R, K1 * tvec, model);
+ mask.create(pts_size, 1, CV_8U);
+ mask.setTo(Scalar::all(0));
+ for (auto inl : inliers)
+ mask.at(inl) = true;
checkInliersMask(TestSolver::PnP, inl_size, usac_params.threshold, pts1, pts2, model, mask);
// P6P
inl_size = generatePoints(rng, pts1, pts2, K1, K2, false, pts_size, TestSolver::PnP,
getInlierRatio(usac_params.maxIterations, 6, usac_params.confidence), 0.1, gt_inliers);
cv::Mat K_est;
- CV_Assert(cv::solvePnPRansac(pts2, pts1, K_est, dist_coeff, rvec, tvec, mask, usac_params));
+ CV_Assert(cv::solvePnPRansac(pts2, pts1, K_est, dist_coeff, rvec, tvec, inliers, usac_params));
cv::Rodrigues(rvec, R); cv::hconcat(K_est * R, K_est * tvec, model);
+ mask.setTo(Scalar::all(0));
+ for (auto inl : inliers)
+ mask.at(inl) = true;
checkInliersMask(TestSolver::PnP, inl_size, usac_params.threshold, pts1, pts2, model, mask);
// Affine2D
diff --git a/modules/calib/include/opencv2/calib.hpp b/modules/calib/include/opencv2/calib.hpp
index efcdd5d9e1..eab7097d1b 100644
--- a/modules/calib/include/opencv2/calib.hpp
+++ b/modules/calib/include/opencv2/calib.hpp
@@ -738,8 +738,8 @@ concatenated together.
@param imageSize Size of the image used only to initialize the camera intrinsic matrix.
@param cameraMatrix Input/output 3x3 floating-point camera intrinsic matrix
\f$\cameramatrix{A}\f$ . If @ref CALIB_USE_INTRINSIC_GUESS
-and/or @ref CALIB_FIX_ASPECT_RATIO are specified, some or all of fx, fy, cx, cy must be
-initialized before calling the function.
+and/or @ref CALIB_FIX_ASPECT_RATIO, @ref CALIB_FIX_PRINCIPAL_POINT or @ref CALIB_FIX_FOCAL_LENGTH
+are specified, some or all of fx, fy, cx, cy must be initialized before calling the function.
@param distCoeffs Input/output vector of distortion coefficients
\f$\distcoeffs\f$.
@param rvecs Output vector of rotation vectors (@ref Rodrigues ) estimated for each pattern view
@@ -765,7 +765,7 @@ the number of pattern views. \f$R_i, T_i\f$ are concatenated 1x3 vectors.
fx, fy, cx, cy that are optimized further. Otherwise, (cx, cy) is initially set to the image
center ( imageSize is used), and focal distances are computed in a least-squares fashion.
Note, that if intrinsic parameters are known, there is no need to use this function just to
-estimate extrinsic parameters. Use solvePnP instead.
+estimate extrinsic parameters. Use @ref solvePnP instead.
- @ref CALIB_FIX_PRINCIPAL_POINT The principal point is not changed during the global
optimization. It stays at the center or at a different location specified when
@ref CALIB_USE_INTRINSIC_GUESS is set too.
@@ -775,24 +775,23 @@ ratio fx/fy stays the same as in the input cameraMatrix . When
ignored, only their ratio is computed and used further.
- @ref CALIB_ZERO_TANGENT_DIST Tangential distortion coefficients \f$(p_1, p_2)\f$ are set
to zeros and stay zero.
+- @ref CALIB_FIX_FOCAL_LENGTH The focal length is not changed during the global optimization if
+ @ref CALIB_USE_INTRINSIC_GUESS is set.
- @ref CALIB_FIX_K1,..., @ref CALIB_FIX_K6 The corresponding radial distortion
coefficient is not changed during the optimization. If @ref CALIB_USE_INTRINSIC_GUESS is
set, the coefficient from the supplied distCoeffs matrix is used. Otherwise, it is set to 0.
- @ref CALIB_RATIONAL_MODEL Coefficients k4, k5, and k6 are enabled. To provide the
backward compatibility, this extra flag should be explicitly specified to make the
-calibration function use the rational model and return 8 coefficients. If the flag is not
-set, the function computes and returns only 5 distortion coefficients.
+calibration function use the rational model and return 8 coefficients or more.
- @ref CALIB_THIN_PRISM_MODEL Coefficients s1, s2, s3 and s4 are enabled. To provide the
backward compatibility, this extra flag should be explicitly specified to make the
-calibration function use the thin prism model and return 12 coefficients. If the flag is not
-set, the function computes and returns only 5 distortion coefficients.
+calibration function use the thin prism model and return 12 coefficients or more.
- @ref CALIB_FIX_S1_S2_S3_S4 The thin prism distortion coefficients are not changed during
the optimization. If @ref CALIB_USE_INTRINSIC_GUESS is set, the coefficient from the
supplied distCoeffs matrix is used. Otherwise, it is set to 0.
- @ref CALIB_TILTED_MODEL Coefficients tauX and tauY are enabled. To provide the
backward compatibility, this extra flag should be explicitly specified to make the
-calibration function use the tilted sensor model and return 14 coefficients. If the flag is not
-set, the function computes and returns only 5 distortion coefficients.
+calibration function use the tilted sensor model and return 14 coefficients.
- @ref CALIB_FIX_TAUX_TAUY The coefficients of the tilted sensor model are not changed during
the optimization. If @ref CALIB_USE_INTRINSIC_GUESS is set, the coefficient from the
supplied distCoeffs matrix is used. Otherwise, it is set to 0.
@@ -817,12 +816,12 @@ The algorithm performs the following steps:
zeros initially unless some of CALIB_FIX_K? are specified.
- Estimate the initial camera pose as if the intrinsic parameters have been already known. This is
- done using solvePnP .
+ done using @ref solvePnP .
- Run the global Levenberg-Marquardt optimization algorithm to minimize the reprojection error,
that is, the total sum of squared distances between the observed feature points imagePoints and
the projected (using the current estimates for camera parameters and the poses) object points
- objectPoints. See projectPoints for details.
+ objectPoints. See @ref projectPoints for details.
@note
If you use a non-square (i.e. non-N-by-N) grid and @ref findChessboardCorners for calibration,
diff --git a/modules/core/include/opencv2/core/cv_cpu_dispatch.h b/modules/core/include/opencv2/core/cv_cpu_dispatch.h
index fe15e51e4e..8365b10ba9 100644
--- a/modules/core/include/opencv2/core/cv_cpu_dispatch.h
+++ b/modules/core/include/opencv2/core/cv_cpu_dispatch.h
@@ -142,6 +142,11 @@
# define CV_NEON 1
#endif
+#if defined(__riscv) && defined(__riscv_vector) && defined(__riscv_vector_071)
+# include
+# define CV_RVV071 1
+#endif
+
#if defined(__ARM_NEON__) || defined(__aarch64__)
# include
#endif
@@ -338,6 +343,10 @@ struct VZeroUpperGuard {
# define CV_NEON 0
#endif
+#ifndef CV_RVV071
+# define CV_RVV071 0
+#endif
+
#ifndef CV_VSX
# define CV_VSX 0
#endif
diff --git a/modules/core/include/opencv2/core/cvdef.h b/modules/core/include/opencv2/core/cvdef.h
index 6a55995fc9..0b3d5328f1 100644
--- a/modules/core/include/opencv2/core/cvdef.h
+++ b/modules/core/include/opencv2/core/cvdef.h
@@ -271,6 +271,8 @@ namespace cv {
#define CV_CPU_MSA 150
+#define CV_CPU_RISCVV 170
+
#define CV_CPU_VSX 200
#define CV_CPU_VSX3 201
@@ -325,6 +327,8 @@ enum CpuFeatures {
CPU_MSA = 150,
+ CPU_RISCVV = 170,
+
CPU_VSX = 200,
CPU_VSX3 = 201,
@@ -681,7 +685,7 @@ __CV_ENUM_FLAGS_BITWISE_XOR_EQ (EnumType, EnumType)
# define CV_XADD(addr, delta) (int)_InterlockedExchangeAdd((long volatile*)addr, delta)
#else
#ifdef OPENCV_FORCE_UNSAFE_XADD
- CV_INLINE CV_XADD(int* addr, int delta) { int tmp = *addr; *addr += delta; return tmp; }
+ CV_INLINE int CV_XADD(int* addr, int delta) { int tmp = *addr; *addr += delta; return tmp; }
#else
#error "OpenCV: can't define safe CV_XADD macro for current platform (unsupported). Define CV_XADD macro through custom port header (see OPENCV_INCLUDE_PORT_FILE)"
#endif
diff --git a/modules/core/include/opencv2/core/hal/intrin.hpp b/modules/core/include/opencv2/core/hal/intrin.hpp
index 6f5b8e1788..ac331f2154 100644
--- a/modules/core/include/opencv2/core/hal/intrin.hpp
+++ b/modules/core/include/opencv2/core/hal/intrin.hpp
@@ -200,7 +200,7 @@ using namespace CV_CPU_OPTIMIZATION_HAL_NAMESPACE;
# undef CV_RVV
#endif
-#if (CV_SSE2 || CV_NEON || CV_VSX || CV_MSA || CV_WASM_SIMD || CV_RVV) && !defined(CV_FORCE_SIMD128_CPP)
+#if (CV_SSE2 || CV_NEON || CV_VSX || CV_MSA || CV_WASM_SIMD || CV_RVV071 || CV_RVV) && !defined(CV_FORCE_SIMD128_CPP)
#define CV__SIMD_FORWARD 128
#include "opencv2/core/hal/intrin_forward.hpp"
#endif
@@ -214,6 +214,10 @@ using namespace CV_CPU_OPTIMIZATION_HAL_NAMESPACE;
#include "opencv2/core/hal/intrin_neon.hpp"
+#elif CV_RVV071 && !defined(CV_FORCE_SIMD128_CPP)
+#define CV_SIMD128_CPP 0
+#include "opencv2/core/hal/intrin_rvv071.hpp"
+
#elif CV_VSX && !defined(CV_FORCE_SIMD128_CPP)
#include "opencv2/core/hal/intrin_vsx.hpp"
diff --git a/modules/core/include/opencv2/core/hal/intrin_neon.hpp b/modules/core/include/opencv2/core/hal/intrin_neon.hpp
index 785648575a..e17972a3fc 100644
--- a/modules/core/include/opencv2/core/hal/intrin_neon.hpp
+++ b/modules/core/include/opencv2/core/hal/intrin_neon.hpp
@@ -538,49 +538,81 @@ inline void v_mul_expand(const v_int8x16& a, const v_int8x16& b,
v_int16x8& c, v_int16x8& d)
{
c.val = vmull_s8(vget_low_s8(a.val), vget_low_s8(b.val));
+#if CV_NEON_AARCH64
+ d.val = vmull_high_s8(a.val, b.val);
+#else // #if CV_NEON_AARCH64
d.val = vmull_s8(vget_high_s8(a.val), vget_high_s8(b.val));
+#endif // #if CV_NEON_AARCH64
}
inline void v_mul_expand(const v_uint8x16& a, const v_uint8x16& b,
v_uint16x8& c, v_uint16x8& d)
{
c.val = vmull_u8(vget_low_u8(a.val), vget_low_u8(b.val));
+#if CV_NEON_AARCH64
+ d.val = vmull_high_u8(a.val, b.val);
+#else // #if CV_NEON_AARCH64
d.val = vmull_u8(vget_high_u8(a.val), vget_high_u8(b.val));
+#endif // #if CV_NEON_AARCH64
}
inline void v_mul_expand(const v_int16x8& a, const v_int16x8& b,
v_int32x4& c, v_int32x4& d)
{
c.val = vmull_s16(vget_low_s16(a.val), vget_low_s16(b.val));
+#if CV_NEON_AARCH64
+ d.val = vmull_high_s16(a.val, b.val);
+#else // #if CV_NEON_AARCH64
d.val = vmull_s16(vget_high_s16(a.val), vget_high_s16(b.val));
+#endif // #if CV_NEON_AARCH64
}
inline void v_mul_expand(const v_uint16x8& a, const v_uint16x8& b,
v_uint32x4& c, v_uint32x4& d)
{
c.val = vmull_u16(vget_low_u16(a.val), vget_low_u16(b.val));
+#if CV_NEON_AARCH64
+ d.val = vmull_high_u16(a.val, b.val);
+#else // #if CV_NEON_AARCH64
d.val = vmull_u16(vget_high_u16(a.val), vget_high_u16(b.val));
+#endif // #if CV_NEON_AARCH64
}
inline void v_mul_expand(const v_uint32x4& a, const v_uint32x4& b,
v_uint64x2& c, v_uint64x2& d)
{
c.val = vmull_u32(vget_low_u32(a.val), vget_low_u32(b.val));
+#if CV_NEON_AARCH64
+ d.val = vmull_high_u32(a.val, b.val);
+#else // #if CV_NEON_AARCH64
d.val = vmull_u32(vget_high_u32(a.val), vget_high_u32(b.val));
+#endif // #if CV_NEON_AARCH64
}
inline v_int16x8 v_mul_hi(const v_int16x8& a, const v_int16x8& b)
{
return v_int16x8(vcombine_s16(
vshrn_n_s32(vmull_s16( vget_low_s16(a.val), vget_low_s16(b.val)), 16),
- vshrn_n_s32(vmull_s16(vget_high_s16(a.val), vget_high_s16(b.val)), 16)
+ vshrn_n_s32(
+#if CV_NEON_AARCH64
+ vmull_high_s16(a.val, b.val)
+#else // #if CV_NEON_AARCH64
+ vmull_s16(vget_high_s16(a.val), vget_high_s16(b.val))
+#endif // #if CV_NEON_AARCH64
+ , 16)
));
}
inline v_uint16x8 v_mul_hi(const v_uint16x8& a, const v_uint16x8& b)
{
return v_uint16x8(vcombine_u16(
vshrn_n_u32(vmull_u16( vget_low_u16(a.val), vget_low_u16(b.val)), 16),
- vshrn_n_u32(vmull_u16(vget_high_u16(a.val), vget_high_u16(b.val)), 16)
+ vshrn_n_u32(
+#if CV_NEON_AARCH64
+ vmull_high_u16(a.val, b.val)
+#else // #if CV_NEON_AARCH64
+ vmull_u16(vget_high_u16(a.val), vget_high_u16(b.val))
+#endif // #if CV_NEON_AARCH64
+ , 16)
));
}
@@ -1254,29 +1286,56 @@ OPENCV_HAL_IMPL_NEON_LOADSTORE_OP(v_float64x2, double, f64)
inline unsigned v_reduce_sum(const v_uint8x16& a)
{
+#if CV_NEON_AARCH64
+ uint16_t t0 = vaddlvq_u8(a.val);
+ return t0;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(vpaddlq_u8(a.val));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline int v_reduce_sum(const v_int8x16& a)
{
+#if CV_NEON_AARCH64
+ int16_t t0 = vaddlvq_s8(a.val);
+ return t0;
+#else // #if CV_NEON_AARCH64
int32x4_t t0 = vpaddlq_s16(vpaddlq_s8(a.val));
int32x2_t t1 = vpadd_s32(vget_low_s32(t0), vget_high_s32(t0));
return vget_lane_s32(vpadd_s32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sum(const v_uint16x8& a)
{
+#if CV_NEON_AARCH64
+ uint32_t t0 = vaddlvq_u16(a.val);
+ return t0;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(a.val);
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline int v_reduce_sum(const v_int16x8& a)
{
+#if CV_NEON_AARCH64
+ int32_t t0 = vaddlvq_s16(a.val);
+ return t0;
+#else // #if CV_NEON_AARCH64
int32x4_t t0 = vpaddlq_s16(a.val);
int32x2_t t1 = vpadd_s32(vget_low_s32(t0), vget_high_s32(t0));
return vget_lane_s32(vpadd_s32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
+#if CV_NEON_AARCH64
+#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
+inline scalartype v_reduce_##func(const _Tpvec& a) \
+{ \
+ return v##vectorfunc##vq_##suffix(a.val); \
+}
+#else // #if CV_NEON_AARCH64
#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
inline scalartype v_reduce_##func(const _Tpvec& a) \
{ \
@@ -1285,12 +1344,20 @@ inline scalartype v_reduce_##func(const _Tpvec& a) \
a0 = vp##vectorfunc##_##suffix(a0, a0); \
return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, a0),0); \
}
+#endif // #if CV_NEON_AARCH64
OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(v_uint8x16, uint8x8, uchar, max, max, u8)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(v_uint8x16, uint8x8, uchar, min, min, u8)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(v_int8x16, int8x8, schar, max, max, s8)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(v_int8x16, int8x8, schar, min, min, s8)
+#if CV_NEON_AARCH64
+#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
+inline scalartype v_reduce_##func(const _Tpvec& a) \
+{ \
+ return v##vectorfunc##vq_##suffix(a.val); \
+}
+#else // #if CV_NEON_AARCH64
#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
inline scalartype v_reduce_##func(const _Tpvec& a) \
{ \
@@ -1298,18 +1365,27 @@ inline scalartype v_reduce_##func(const _Tpvec& a) \
a0 = vp##vectorfunc##_##suffix(a0, a0); \
return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, a0),0); \
}
+#endif // #if CV_NEON_AARCH64
OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(v_uint16x8, uint16x4, ushort, max, max, u16)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(v_uint16x8, uint16x4, ushort, min, min, u16)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(v_int16x8, int16x4, short, max, max, s16)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(v_int16x8, int16x4, short, min, min, s16)
+#if CV_NEON_AARCH64
+#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
+inline scalartype v_reduce_##func(const _Tpvec& a) \
+{ \
+ return v##vectorfunc##vq_##suffix(a.val); \
+}
+#else // #if CV_NEON_AARCH64
#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
inline scalartype v_reduce_##func(const _Tpvec& a) \
{ \
_Tpnvec##_t a0 = vp##vectorfunc##_##suffix(vget_low_##suffix(a.val), vget_high_##suffix(a.val)); \
return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, vget_high_##suffix(a.val)),0); \
}
+#endif // #if CV_NEON_AARCH64
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_uint32x4, uint32x2, unsigned, sum, add, u32)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_uint32x4, uint32x2, unsigned, max, max, u32)
@@ -1322,9 +1398,21 @@ OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_float32x4, float32x2, float, max, max, f32)
OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(v_float32x4, float32x2, float, min, min, f32)
inline uint64 v_reduce_sum(const v_uint64x2& a)
-{ return vget_lane_u64(vadd_u64(vget_low_u64(a.val), vget_high_u64(a.val)),0); }
+{
+#if CV_NEON_AARCH64
+ return vaddvq_u64(a.val);
+#else // #if CV_NEON_AARCH64
+ return vget_lane_u64(vadd_u64(vget_low_u64(a.val), vget_high_u64(a.val)),0);
+#endif // #if CV_NEON_AARCH64
+}
inline int64 v_reduce_sum(const v_int64x2& a)
-{ return vget_lane_s64(vadd_s64(vget_low_s64(a.val), vget_high_s64(a.val)),0); }
+{
+#if CV_NEON_AARCH64
+ return vaddvq_s64(a.val);
+#else // #if CV_NEON_AARCH64
+ return vget_lane_s64(vadd_s64(vget_low_s64(a.val), vget_high_s64(a.val)),0);
+#endif // #if CV_NEON_AARCH64
+}
#if CV_SIMD128_64F
inline double v_reduce_sum(const v_float64x2& a)
{
@@ -1335,6 +1423,11 @@ inline double v_reduce_sum(const v_float64x2& a)
inline v_float32x4 v_reduce_sum4(const v_float32x4& a, const v_float32x4& b,
const v_float32x4& c, const v_float32x4& d)
{
+#if CV_NEON_AARCH64
+ float32x4_t ab = vpaddq_f32(a.val, b.val); // a0+a1 a2+a3 b0+b1 b2+b3
+ float32x4_t cd = vpaddq_f32(c.val, d.val); // c0+c1 d0+d1 c2+c3 d2+d3
+ return v_float32x4(vpaddq_f32(ab, cd)); // sumA sumB sumC sumD
+#else // #if CV_NEON_AARCH64
float32x4x2_t ab = vtrnq_f32(a.val, b.val);
float32x4x2_t cd = vtrnq_f32(c.val, d.val);
@@ -1345,49 +1438,91 @@ inline v_float32x4 v_reduce_sum4(const v_float32x4& a, const v_float32x4& b,
float32x4_t v1 = vcombine_f32(vget_high_f32(u0), vget_high_f32(u1));
return v_float32x4(vaddq_f32(v0, v1));
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_uint8x16& a, const v_uint8x16& b)
{
+#if CV_NEON_AARCH64
+ uint8x16_t t0 = vabdq_u8(a.val, b.val);
+ uint16_t t1 = vaddlvq_u8(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(vpaddlq_u8(vabdq_u8(a.val, b.val)));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_int8x16& a, const v_int8x16& b)
{
+#if CV_NEON_AARCH64
+ uint8x16_t t0 = vreinterpretq_u8_s8(vabdq_s8(a.val, b.val));
+ uint16_t t1 = vaddlvq_u8(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(vpaddlq_u8(vreinterpretq_u8_s8(vabdq_s8(a.val, b.val))));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_uint16x8& a, const v_uint16x8& b)
{
+#if CV_NEON_AARCH64
+ uint16x8_t t0 = vabdq_u16(a.val, b.val);
+ uint32_t t1 = vaddlvq_u16(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(vabdq_u16(a.val, b.val));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_int16x8& a, const v_int16x8& b)
{
+#if CV_NEON_AARCH64
+ uint16x8_t t0 = vreinterpretq_u16_s16(vabdq_s16(a.val, b.val));
+ uint32_t t1 = vaddlvq_u16(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vpaddlq_u16(vreinterpretq_u16_s16(vabdq_s16(a.val, b.val)));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_uint32x4& a, const v_uint32x4& b)
{
+#if CV_NEON_AARCH64
+ uint32x4_t t0 = vabdq_u32(a.val, b.val);
+ uint32_t t1 = vaddvq_u32(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vabdq_u32(a.val, b.val);
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline unsigned v_reduce_sad(const v_int32x4& a, const v_int32x4& b)
{
+#if CV_NEON_AARCH64
+ uint32x4_t t0 = vreinterpretq_u32_s32(vabdq_s32(a.val, b.val));
+ uint32_t t1 = vaddvq_u32(t0);
+ return t1;
+#else // #if CV_NEON_AARCH64
uint32x4_t t0 = vreinterpretq_u32_s32(vabdq_s32(a.val, b.val));
uint32x2_t t1 = vpadd_u32(vget_low_u32(t0), vget_high_u32(t0));
return vget_lane_u32(vpadd_u32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline float v_reduce_sad(const v_float32x4& a, const v_float32x4& b)
{
+#if CV_NEON_AARCH64
+ float32x4_t t0 = vabdq_f32(a.val, b.val);
+ return vaddvq_f32(t0);
+#else // #if CV_NEON_AARCH64
float32x4_t t0 = vabdq_f32(a.val, b.val);
float32x2_t t1 = vpadd_f32(vget_low_f32(t0), vget_high_f32(t0));
return vget_lane_f32(vpadd_f32(t1, t1), 0);
+#endif // #if CV_NEON_AARCH64
}
inline v_uint8x16 v_popcount(const v_uint8x16& a)
@@ -1409,30 +1544,54 @@ inline v_uint64x2 v_popcount(const v_int64x2& a)
inline int v_signmask(const v_uint8x16& a)
{
+#if CV_NEON_AARCH64
+ const int8x16_t signPosition = {0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7};
+ const uint8x16_t byteOrder = {0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15};
+ uint8x16_t v0 = vshlq_u8(vshrq_n_u8(a.val, 7), signPosition);
+ uint8x16_t v1 = vqtbl1q_u8(v0, byteOrder);
+ uint32_t t0 = vaddlvq_u16(vreinterpretq_u16_u8(v1));
+ return t0;
+#else // #if CV_NEON_AARCH64
int8x8_t m0 = vcreate_s8(CV_BIG_UINT(0x0706050403020100));
uint8x16_t v0 = vshlq_u8(vshrq_n_u8(a.val, 7), vcombine_s8(m0, m0));
uint64x2_t v1 = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(v0)));
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 8);
+#endif // #if CV_NEON_AARCH64
}
+
inline int v_signmask(const v_int8x16& a)
{ return v_signmask(v_reinterpret_as_u8(a)); }
inline int v_signmask(const v_uint16x8& a)
{
+#if CV_NEON_AARCH64
+ const int16x8_t signPosition = {0,1,2,3,4,5,6,7};
+ uint16x8_t v0 = vshlq_u16(vshrq_n_u16(a.val, 15), signPosition);
+ uint32_t t0 = vaddlvq_u16(v0);
+ return t0;
+#else // #if CV_NEON_AARCH64
int16x4_t m0 = vcreate_s16(CV_BIG_UINT(0x0003000200010000));
uint16x8_t v0 = vshlq_u16(vshrq_n_u16(a.val, 15), vcombine_s16(m0, m0));
uint64x2_t v1 = vpaddlq_u32(vpaddlq_u16(v0));
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 4);
+#endif // #if CV_NEON_AARCH64
}
inline int v_signmask(const v_int16x8& a)
{ return v_signmask(v_reinterpret_as_u16(a)); }
inline int v_signmask(const v_uint32x4& a)
{
+#if CV_NEON_AARCH64
+ const int32x4_t signPosition = {0,1,2,3};
+ uint32x4_t v0 = vshlq_u32(vshrq_n_u32(a.val, 31), signPosition);
+ uint32_t t0 = vaddvq_u32(v0);
+ return t0;
+#else // #if CV_NEON_AARCH64
int32x2_t m0 = vcreate_s32(CV_BIG_UINT(0x0000000100000000));
uint32x4_t v0 = vshlq_u32(vshrq_n_u32(a.val, 31), vcombine_s32(m0, m0));
uint64x2_t v1 = vpaddlq_u32(v0);
return (int)vgetq_lane_u64(v1, 0) + ((int)vgetq_lane_u64(v1, 1) << 2);
+#endif // #if CV_NEON_AARCH64
}
inline int v_signmask(const v_int32x4& a)
{ return v_signmask(v_reinterpret_as_u32(a)); }
@@ -1440,9 +1599,16 @@ inline int v_signmask(const v_float32x4& a)
{ return v_signmask(v_reinterpret_as_u32(a)); }
inline int v_signmask(const v_uint64x2& a)
{
+#if CV_NEON_AARCH64
+ const int64x2_t signPosition = {0,1};
+ uint64x2_t v0 = vshlq_u64(vshrq_n_u64(a.val, 63), signPosition);
+ uint64_t t0 = vaddvq_u64(v0);
+ return t0;
+#else // #if CV_NEON_AARCH64
int64x1_t m0 = vdup_n_s64(0);
uint64x2_t v0 = vshlq_u64(vshrq_n_u64(a.val, 63), vcombine_s64(m0, m0));
return (int)vgetq_lane_u64(v0, 0) + ((int)vgetq_lane_u64(v0, 1) << 1);
+#endif // #if CV_NEON_AARCH64
}
inline int v_signmask(const v_int64x2& a)
{ return v_signmask(v_reinterpret_as_u64(a)); }
@@ -1464,19 +1630,31 @@ inline int v_scan_forward(const v_uint64x2& a) { return trailingZeros32(v_signma
inline int v_scan_forward(const v_float64x2& a) { return trailingZeros32(v_signmask(a)); }
#endif
-#define OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(_Tpvec, suffix, shift) \
-inline bool v_check_all(const v_##_Tpvec& a) \
-{ \
- _Tpvec##_t v0 = vshrq_n_##suffix(vmvnq_##suffix(a.val), shift); \
- uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
- return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) == 0; \
-} \
-inline bool v_check_any(const v_##_Tpvec& a) \
-{ \
- _Tpvec##_t v0 = vshrq_n_##suffix(a.val, shift); \
- uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
- return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) != 0; \
-}
+#if CV_NEON_AARCH64
+ #define OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(_Tpvec, suffix, shift) \
+ inline bool v_check_all(const v_##_Tpvec& a) \
+ { \
+ return (vminvq_##suffix(a.val) >> shift) != 0; \
+ } \
+ inline bool v_check_any(const v_##_Tpvec& a) \
+ { \
+ return (vmaxvq_##suffix(a.val) >> shift) != 0; \
+ }
+#else // #if CV_NEON_AARCH64
+ #define OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(_Tpvec, suffix, shift) \
+ inline bool v_check_all(const v_##_Tpvec& a) \
+ { \
+ _Tpvec##_t v0 = vshrq_n_##suffix(vmvnq_##suffix(a.val), shift); \
+ uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
+ return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) == 0; \
+ } \
+ inline bool v_check_any(const v_##_Tpvec& a) \
+ { \
+ _Tpvec##_t v0 = vshrq_n_##suffix(a.val, shift); \
+ uint64x2_t v1 = vreinterpretq_u64_##suffix(v0); \
+ return (vgetq_lane_u64(v1, 0) | vgetq_lane_u64(v1, 1)) != 0; \
+ }
+#endif // #if CV_NEON_AARCH64
OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(uint8x16, u8, 7)
OPENCV_HAL_IMPL_NEON_CHECK_ALLANY(uint16x8, u16, 15)
@@ -1829,6 +2007,37 @@ inline v_int32x4 v_trunc(const v_float64x2& a)
}
#endif
+#if CV_NEON_AARCH64
+#define OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(_Tpvec, suffix) \
+inline void v_transpose4x4(const v_##_Tpvec& a0, const v_##_Tpvec& a1, \
+ const v_##_Tpvec& a2, const v_##_Tpvec& a3, \
+ v_##_Tpvec& b0, v_##_Tpvec& b1, \
+ v_##_Tpvec& b2, v_##_Tpvec& b3) \
+{ \
+ /* -- Pass 1: 64b transpose */ \
+ _Tpvec##_t t0 = vreinterpretq_##suffix##32_##suffix##64( \
+ vtrn1q_##suffix##64(vreinterpretq_##suffix##64_##suffix##32(a0.val), \
+ vreinterpretq_##suffix##64_##suffix##32(a2.val))); \
+ _Tpvec##_t t1 = vreinterpretq_##suffix##32_##suffix##64( \
+ vtrn1q_##suffix##64(vreinterpretq_##suffix##64_##suffix##32(a1.val), \
+ vreinterpretq_##suffix##64_##suffix##32(a3.val))); \
+ _Tpvec##_t t2 = vreinterpretq_##suffix##32_##suffix##64( \
+ vtrn2q_##suffix##64(vreinterpretq_##suffix##64_##suffix##32(a0.val), \
+ vreinterpretq_##suffix##64_##suffix##32(a2.val))); \
+ _Tpvec##_t t3 = vreinterpretq_##suffix##32_##suffix##64( \
+ vtrn2q_##suffix##64(vreinterpretq_##suffix##64_##suffix##32(a1.val), \
+ vreinterpretq_##suffix##64_##suffix##32(a3.val))); \
+ /* -- Pass 2: 32b transpose */ \
+ b0.val = vtrn1q_##suffix##32(t0, t1); \
+ b1.val = vtrn2q_##suffix##32(t0, t1); \
+ b2.val = vtrn1q_##suffix##32(t2, t3); \
+ b3.val = vtrn2q_##suffix##32(t2, t3); \
+}
+
+OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(uint32x4, u)
+OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(int32x4, s)
+OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(float32x4, f)
+#else // #if CV_NEON_AARCH64
#define OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(_Tpvec, suffix) \
inline void v_transpose4x4(const v_##_Tpvec& a0, const v_##_Tpvec& a1, \
const v_##_Tpvec& a2, const v_##_Tpvec& a3, \
@@ -1854,6 +2063,7 @@ inline void v_transpose4x4(const v_##_Tpvec& a0, const v_##_Tpvec& a1, \
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(uint32x4, u32)
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(int32x4, s32)
OPENCV_HAL_IMPL_NEON_TRANSPOSE4x4(float32x4, f32)
+#endif // #if CV_NEON_AARCH64
#define OPENCV_HAL_IMPL_NEON_INTERLEAVED(_Tpvec, _Tp, suffix) \
inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec& a, v_##_Tpvec& b) \
diff --git a/modules/core/include/opencv2/core/hal/intrin_rvv071.hpp b/modules/core/include/opencv2/core/hal/intrin_rvv071.hpp
new file mode 100644
index 0000000000..2bdc622ffd
--- /dev/null
+++ b/modules/core/include/opencv2/core/hal/intrin_rvv071.hpp
@@ -0,0 +1,2545 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+
+// Copyright (C) 2015, PingTouGe Semiconductor Co., Ltd., all rights reserved.
+
+#ifndef OPENCV_HAL_INTRIN_RISCVV_HPP
+#define OPENCV_HAL_INTRIN_RISCVV_HPP
+
+#include
+#include
+#include "opencv2/core/utility.hpp"
+
+namespace cv
+{
+
+//! @cond IGNORED
+
+CV_CPU_OPTIMIZATION_HAL_NAMESPACE_BEGIN
+
+#define CV_SIMD128 1
+#define CV_SIMD128_64F 1
+//////////// Types ////////////
+struct v_uint8x16
+{
+ typedef uchar lane_type;
+ enum { nlanes = 16 };
+
+ v_uint8x16() {}
+ explicit v_uint8x16(vuint8m1_t v) : val(v) {}
+ v_uint8x16(uchar v0, uchar v1, uchar v2, uchar v3, uchar v4, uchar v5, uchar v6, uchar v7,
+ uchar v8, uchar v9, uchar v10, uchar v11, uchar v12, uchar v13, uchar v14, uchar v15)
+ {
+ uchar v[] = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15};
+ val = (vuint8m1_t)vle_v_u8m1((unsigned char*)v, 16);
+ }
+ uchar get0() const
+ {
+ return vmv_x_s_u8m1_u8(val, 16);
+ }
+
+ vuint8m1_t val;
+};
+
+struct v_int8x16
+{
+ typedef schar lane_type;
+ enum { nlanes = 16 };
+
+ v_int8x16() {}
+ explicit v_int8x16(vint8m1_t v) : val(v) {}
+ v_int8x16(schar v0, schar v1, schar v2, schar v3, schar v4, schar v5, schar v6, schar v7,
+ schar v8, schar v9, schar v10, schar v11, schar v12, schar v13, schar v14, schar v15)
+ {
+ schar v[] = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15};
+ val = (vint8m1_t)vle_v_i8m1((schar*)v, 16);
+ }
+ schar get0() const
+ {
+ return vmv_x_s_i8m1_i8(val, 16);
+ }
+
+ vint8m1_t val;
+};
+
+struct v_uint16x8
+{
+ typedef ushort lane_type;
+ enum { nlanes = 8 };
+
+ v_uint16x8() {}
+ explicit v_uint16x8(vuint16m1_t v) : val(v) {}
+ v_uint16x8(ushort v0, ushort v1, ushort v2, ushort v3, ushort v4, ushort v5, ushort v6, ushort v7)
+ {
+ ushort v[] = {v0, v1, v2, v3, v4, v5, v6, v7};
+ val = (vuint16m1_t)vle_v_u16m1((unsigned short*)v, 8);
+ }
+ ushort get0() const
+ {
+ return vmv_x_s_u16m1_u16(val, 8);
+ }
+
+ vuint16m1_t val;
+};
+
+struct v_int16x8
+{
+ typedef short lane_type;
+ enum { nlanes = 8 };
+
+ v_int16x8() {}
+ explicit v_int16x8(vint16m1_t v) : val(v) {}
+ v_int16x8(short v0, short v1, short v2, short v3, short v4, short v5, short v6, short v7)
+ {
+ short v[] = {v0, v1, v2, v3, v4, v5, v6, v7};
+ val = (vint16m1_t)vle_v_i16m1((signed short*)v, 8);
+ }
+ short get0() const
+ {
+ return vmv_x_s_i16m1_i16(val, 8);
+ }
+
+ vint16m1_t val;
+};
+
+struct v_uint32x4
+{
+ typedef unsigned lane_type;
+ enum { nlanes = 4 };
+
+ v_uint32x4() {}
+ explicit v_uint32x4(vuint32m1_t v) : val(v) {}
+ v_uint32x4(unsigned v0, unsigned v1, unsigned v2, unsigned v3)
+ {
+ unsigned v[] = {v0, v1, v2, v3};
+ val = (vuint32m1_t)vle_v_u32m1((unsigned int*)v, 4);
+ }
+ unsigned get0() const
+ {
+ return vmv_x_s_u32m1_u32(val, 4);
+ }
+
+ vuint32m1_t val;
+};
+
+struct v_int32x4
+{
+ typedef int lane_type;
+ enum { nlanes = 4 };
+
+ v_int32x4() {}
+ explicit v_int32x4(vint32m1_t v) : val(v) {}
+ v_int32x4(int v0, int v1, int v2, int v3)
+ {
+ int v[] = {v0, v1, v2, v3};
+ val = (vint32m1_t)vle_v_i32m1((signed int*)v, 4);
+ }
+ int get0() const
+ {
+ return vmv_x_s_i32m1_i32(val, 4);
+ }
+ vint32m1_t val;
+};
+
+struct v_float32x4
+{
+ typedef float lane_type;
+ enum { nlanes = 4 };
+
+ v_float32x4() {}
+ explicit v_float32x4(vfloat32m1_t v) : val(v) {}
+ v_float32x4(float v0, float v1, float v2, float v3)
+ {
+ float v[] = {v0, v1, v2, v3};
+ val = (vfloat32m1_t)vle_v_f32m1((float*)v, 4);
+ }
+ float get0() const
+ {
+ return vfmv_f_s_f32m1_f32(val, 4);
+ }
+ vfloat32m1_t val;
+};
+
+struct v_uint64x2
+{
+ typedef uint64 lane_type;
+ enum { nlanes = 2 };
+
+ v_uint64x2() {}
+ explicit v_uint64x2(vuint64m1_t v) : val(v) {}
+ v_uint64x2(uint64 v0, uint64 v1)
+ {
+ uint64 v[] = {v0, v1};
+ val = (vuint64m1_t)vle_v_u64m1((unsigned long*)v, 2);
+ }
+ uint64 get0() const
+ {
+ return vmv_x_s_u64m1_u64(val, 2);
+ }
+ vuint64m1_t val;
+};
+
+struct v_int64x2
+{
+ typedef int64 lane_type;
+ enum { nlanes = 2 };
+
+ v_int64x2() {}
+ explicit v_int64x2(vint64m1_t v) : val(v) {}
+ v_int64x2(int64 v0, int64 v1)
+ {
+ int64 v[] = {v0, v1};
+ val = (vint64m1_t)vle_v_i64m1((long*)v, 2);
+ }
+ int64 get0() const
+ {
+ return vmv_x_s_i64m1_i64(val, 2);
+ }
+ vint64m1_t val;
+};
+
+struct v_float64x2
+{
+ typedef double lane_type;
+ enum { nlanes = 2 };
+
+ v_float64x2() {}
+ explicit v_float64x2(vfloat64m1_t v) : val(v) {}
+ v_float64x2(double v0, double v1)
+ {
+ double v[] = {v0, v1};
+ val = (vfloat64m1_t)vle_v_f64m1((double*)v, 2);
+ }
+ double get0() const
+ {
+ return vfmv_f_s_f64m1_f64(val, 2);
+ }
+ vfloat64m1_t val;
+};
+
+#define OPENCV_HAL_IMPL_RISCVV_INIT(_Tpv, _Tp, suffix) \
+inline _Tp##m1_t vreinterpretq_##suffix##_##suffix(_Tp##m1_t v) { return v; } \
+inline v_uint8x16 v_reinterpret_as_u8(const v_##_Tpv& v) { return v_uint8x16((vuint8m1_t)(v.val)); } \
+inline v_int8x16 v_reinterpret_as_s8(const v_##_Tpv& v) { return v_int8x16((vint8m1_t)(v.val)); } \
+inline v_uint16x8 v_reinterpret_as_u16(const v_##_Tpv& v) { return v_uint16x8((vuint16m1_t)(v.val)); } \
+inline v_int16x8 v_reinterpret_as_s16(const v_##_Tpv& v) { return v_int16x8((vint16m1_t)(v.val)); } \
+inline v_uint32x4 v_reinterpret_as_u32(const v_##_Tpv& v) { return v_uint32x4((vuint32m1_t)(v.val)); } \
+inline v_int32x4 v_reinterpret_as_s32(const v_##_Tpv& v) { return v_int32x4((vint32m1_t)(v.val)); } \
+inline v_uint64x2 v_reinterpret_as_u64(const v_##_Tpv& v) { return v_uint64x2((vuint64m1_t)(v.val)); } \
+inline v_int64x2 v_reinterpret_as_s64(const v_##_Tpv& v) { return v_int64x2((vint64m1_t)(v.val)); } \
+inline v_float32x4 v_reinterpret_as_f32(const v_##_Tpv& v) { return v_float32x4((vfloat32m1_t)(v.val)); }\
+inline v_float64x2 v_reinterpret_as_f64(const v_##_Tpv& v) { return v_float64x2((vfloat64m1_t)(v.val)); }
+
+
+OPENCV_HAL_IMPL_RISCVV_INIT(uint8x16, vuint8, u8)
+OPENCV_HAL_IMPL_RISCVV_INIT(int8x16, vint8, s8)
+OPENCV_HAL_IMPL_RISCVV_INIT(uint16x8, vuint16, u16)
+OPENCV_HAL_IMPL_RISCVV_INIT(int16x8, vint16, s16)
+OPENCV_HAL_IMPL_RISCVV_INIT(uint32x4, vuint32, u32)
+OPENCV_HAL_IMPL_RISCVV_INIT(int32x4, vint32, s32)
+OPENCV_HAL_IMPL_RISCVV_INIT(uint64x2, vuint64, u64)
+OPENCV_HAL_IMPL_RISCVV_INIT(int64x2, vint64, s64)
+OPENCV_HAL_IMPL_RISCVV_INIT(float64x2, vfloat64, f64)
+OPENCV_HAL_IMPL_RISCVV_INIT(float32x4, vfloat32, f32)
+#define OPENCV_HAL_IMPL_RISCVV_INIT_SET(__Tp, _Tp, suffix, len, num) \
+inline v_##_Tp##x##num v_setzero_##suffix() { return v_##_Tp##x##num((v##_Tp##m1_t){0}); } \
+inline v_##_Tp##x##num v_setall_##suffix(__Tp v) { return v_##_Tp##x##num(vmv_v_x_##len##m1(v, num)); }
+
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(uchar, uint8, u8, u8, 16)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(char, int8, s8, i8, 16)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(ushort, uint16, u16, u16, 8)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(short, int16, s16, i16, 8)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(unsigned int, uint32, u32, u32, 4)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(int, int32, s32, i32, 4)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(unsigned long, uint64, u64, u64, 2)
+OPENCV_HAL_IMPL_RISCVV_INIT_SET(long, int64, s64, i64, 2)
+inline v_float32x4 v_setzero_f32() { return v_float32x4((vfloat32m1_t){0}); }
+inline v_float32x4 v_setall_f32(float v) { return v_float32x4(vfmv_v_f_f32m1(v, 4)); }
+
+inline v_float64x2 v_setzero_f64() { return v_float64x2(vfmv_v_f_f64m1(0, 2)); }
+inline v_float64x2 v_setall_f64(double v) { return v_float64x2(vfmv_v_f_f64m1(v, 2)); }
+
+
+#define OPENCV_HAL_IMPL_RISCVV_BIN_OP(bin_op, _Tpvec, intrin) \
+inline _Tpvec operator bin_op (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ return _Tpvec(intrin(a.val, b.val)); \
+} \
+inline _Tpvec& operator bin_op##= (_Tpvec& a, const _Tpvec& b) \
+{ \
+ a.val = intrin(a.val, b.val); \
+ return a; \
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_BIN_OPN(bin_op, _Tpvec, intrin, num) \
+inline _Tpvec operator bin_op (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ return _Tpvec(intrin(a.val, b.val, num)); \
+} \
+inline _Tpvec& operator bin_op##= (_Tpvec& a, const _Tpvec& b) \
+{ \
+ a.val = intrin(a.val, b.val, num); \
+ return a; \
+}
+
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_uint8x16, vsaddu_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_uint8x16, vssubu_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_int8x16, vsadd_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_int8x16, vssub_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_uint16x8, vsaddu_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_uint16x8, vssubu_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_int16x8, vsadd_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_int16x8, vssub_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_int32x4, vsadd_vv_i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_int32x4, vssub_vv_i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(*, v_int32x4, vmul_vv_i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_uint32x4, vadd_vv_u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_uint32x4, vsub_vv_u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(*, v_uint32x4, vmul_vv_u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_int64x2, vsadd_vv_i64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_int64x2, vssub_vv_i64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_uint64x2, vadd_vv_u64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_uint64x2, vsub_vv_u64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_float32x4, vfadd_vv_f32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_float32x4, vfsub_vv_f32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(*, v_float32x4, vfmul_vv_f32m1, 4)
+inline v_float32x4 operator / (const v_float32x4& a, const v_float32x4& b)
+{
+ return v_float32x4(vfdiv_vv_f32m1(a.val, b.val, 4));
+}
+inline v_float32x4& operator /= (v_float32x4& a, const v_float32x4& b)
+{
+ a.val = vfdiv_vv_f32m1(a.val, b.val, 4);
+ return a;
+}
+
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(+, v_float64x2, vfadd_vv_f64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(-, v_float64x2, vfsub_vv_f64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BIN_OPN(*, v_float64x2, vfmul_vv_f64m1, 2)
+inline v_float64x2 operator / (const v_float64x2& a, const v_float64x2& b)
+{
+ return v_float64x2(vfdiv_vv_f64m1(a.val, b.val, 2));
+}
+inline v_float64x2& operator /= (v_float64x2& a, const v_float64x2& b)
+{
+ a.val = vfdiv_vv_f64m1(a.val, b.val, 2);
+ return a;
+}
+// TODO: exp, log, sin, cos
+
+#define OPENCV_HAL_IMPL_RISCVV_BIN_FUNC(_Tpvec, func, intrin) \
+inline _Tpvec func(const _Tpvec& a, const _Tpvec& b) \
+{ \
+ return _Tpvec(intrin(a.val, b.val)); \
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(_Tpvec, func, intrin, num) \
+inline _Tpvec func(const _Tpvec& a, const _Tpvec& b) \
+{ \
+ return _Tpvec(intrin(a.val, b.val, num)); \
+}
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint8x16, v_min, vminu_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint8x16, v_max, vmaxu_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int8x16, v_min, vmin_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int8x16, v_max, vmax_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint16x8, v_min, vminu_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint16x8, v_max, vmaxu_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int16x8, v_min, vmin_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int16x8, v_max, vmax_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint32x4, v_min, vminu_vv_u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint32x4, v_max, vmaxu_vv_u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int32x4, v_min, vmin_vv_i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int32x4, v_max, vmax_vv_i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_float32x4, v_min, vfmin_vv_f32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_float32x4, v_max, vfmax_vv_f32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_float64x2, v_min, vfmin_vv_f64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_float64x2, v_max, vfmax_vv_f64m1, 2)
+
+inline v_float32x4 v_sqrt(const v_float32x4& x)
+{
+ return v_float32x4(vfsqrt_v_f32m1(x.val, 4));
+}
+
+inline v_float32x4 v_invsqrt(const v_float32x4& x)
+{
+ return v_float32x4(vfrdiv_vf_f32m1(vfsqrt_v_f32m1(x.val, 4), 1, 4));
+}
+
+inline v_float32x4 v_magnitude(const v_float32x4& a, const v_float32x4& b)
+{
+ v_float32x4 x(vfmacc_vv_f32m1(vfmul_vv_f32m1(a.val, a.val, 4), b.val, b.val, 4));
+ return v_sqrt(x);
+}
+
+inline v_float32x4 v_sqr_magnitude(const v_float32x4& a, const v_float32x4& b)
+{
+ return v_float32x4(vfmacc_vv_f32m1(vfmul_vv_f32m1(a.val, a.val, 4), b.val, b.val, 4));
+}
+
+inline v_float32x4 v_fma(const v_float32x4& a, const v_float32x4& b, const v_float32x4& c)
+{
+ return v_float32x4(vfmacc_vv_f32m1(c.val, a.val, b.val, 4));
+}
+
+inline v_int32x4 v_fma(const v_int32x4& a, const v_int32x4& b, const v_int32x4& c)
+{
+ return v_int32x4(vmacc_vv_i32m1(c.val, a.val, b.val, 4));
+}
+
+inline v_float32x4 v_muladd(const v_float32x4& a, const v_float32x4& b, const v_float32x4& c)
+{
+ return v_fma(a, b, c);
+}
+
+inline v_int32x4 v_muladd(const v_int32x4& a, const v_int32x4& b, const v_int32x4& c)
+{
+ return v_fma(a, b, c);
+}
+
+inline v_float32x4 v_matmul(const v_float32x4& v, const v_float32x4& m0,
+ const v_float32x4& m1, const v_float32x4& m2,
+ const v_float32x4& m3)
+{
+ vfloat32m1_t res = vfmul_vf_f32m1(m0.val, v.val[0], 4);//vmuli_f32(m0.val, v.val, 0);
+ res = vfmacc_vf_f32m1(res, v.val[1], m1.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ res = vfmacc_vf_f32m1(res, v.val[2], m2.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ res = vfmacc_vf_f32m1(res, v.val[3], m3.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ return v_float32x4(res);
+}
+
+inline v_float32x4 v_matmuladd(const v_float32x4& v, const v_float32x4& m0,
+ const v_float32x4& m1, const v_float32x4& m2,
+ const v_float32x4& a)
+{
+ vfloat32m1_t res = vfmul_vf_f32m1(m0.val, v.val[0], 4);//vmuli_f32(m0.val, v.val, 0);
+ res = vfmacc_vf_f32m1(res, v.val[1], m1.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ res = vfmacc_vf_f32m1(res, v.val[2], m2.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ res = vfadd_vv_f32m1(res, a.val, 4);//vmulai_f32(res, m1.val, v.val, 1);
+ return v_float32x4(res);
+}
+
+inline v_float64x2 v_sqrt(const v_float64x2& x)
+{
+ return v_float64x2(vfsqrt_v_f64m1(x.val, 2));
+}
+
+inline v_float64x2 v_invsqrt(const v_float64x2& x)
+{
+ return v_float64x2(vfrdiv_vf_f64m1(vfsqrt_v_f64m1(x.val, 2), 1, 2));
+}
+
+inline v_float64x2 v_magnitude(const v_float64x2& a, const v_float64x2& b)
+{
+ v_float64x2 x(vfmacc_vv_f64m1(vfmul_vv_f64m1(a.val, a.val, 2), b.val, b.val, 2));
+ return v_sqrt(x);
+}
+
+inline v_float64x2 v_sqr_magnitude(const v_float64x2& a, const v_float64x2& b)
+{
+ return v_float64x2(vfmacc_vv_f64m1(vfmul_vv_f64m1(a.val, a.val, 2), b.val, b.val, 2));
+}
+
+inline v_float64x2 v_fma(const v_float64x2& a, const v_float64x2& b, const v_float64x2& c)
+{
+ return v_float64x2(vfmacc_vv_f64m1(c.val, a.val, b.val, 2));
+}
+
+inline v_float64x2 v_muladd(const v_float64x2& a, const v_float64x2& b, const v_float64x2& c)
+{
+ return v_fma(a, b, c);
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(_Tpvec, suffix, num) \
+ OPENCV_HAL_IMPL_RISCVV_BIN_OPN(&, _Tpvec, vand_vv_##suffix, num) \
+ OPENCV_HAL_IMPL_RISCVV_BIN_OPN(|, _Tpvec, vor_vv_##suffix, num) \
+ OPENCV_HAL_IMPL_RISCVV_BIN_OPN(^, _Tpvec, vxor_vv_##suffix, num) \
+ inline _Tpvec operator ~ (const _Tpvec & a) \
+ { \
+ return _Tpvec(vnot_v_##suffix(a.val, num)); \
+ }
+
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_uint8x16, u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_uint16x8, u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_uint32x4, u32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_uint64x2, u64m1, 2)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_int8x16, i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_int16x8, i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_int32x4, i32m1, 4)
+OPENCV_HAL_IMPL_RISCVV_LOGIC_OPN(v_int64x2, i64m1, 2)
+
+#define OPENCV_HAL_IMPL_RISCVV_FLT_BIT_OP(bin_op, intrin) \
+inline v_float32x4 operator bin_op (const v_float32x4& a, const v_float32x4& b) \
+{ \
+ return v_float32x4(vfloat32m1_t(intrin(vint32m1_t(a.val), vint32m1_t(b.val), 4))); \
+} \
+inline v_float32x4& operator bin_op##= (v_float32x4& a, const v_float32x4& b) \
+{ \
+ a.val = vfloat32m1_t(intrin(vint32m1_t(a.val), vint32m1_t(b.val), 4)); \
+ return a; \
+}
+
+OPENCV_HAL_IMPL_RISCVV_FLT_BIT_OP(&, vand_vv_i32m1)
+OPENCV_HAL_IMPL_RISCVV_FLT_BIT_OP(|, vor_vv_i32m1)
+OPENCV_HAL_IMPL_RISCVV_FLT_BIT_OP(^, vxor_vv_i32m1)
+
+inline v_float32x4 operator ~ (const v_float32x4& a)
+{
+ return v_float32x4((vfloat32m1_t)(vnot_v_i32m1((vint32m1_t)(a.val), 4)));
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_FLT_64BIT_OP(bin_op, intrin) \
+inline v_float64x2 operator bin_op (const v_float64x2& a, const v_float64x2& b) \
+{ \
+ return v_float64x2(vfloat64m1_t(intrin(vint64m1_t(a.val), vint64m1_t(b.val), 2))); \
+} \
+inline v_float64x2& operator bin_op##= (v_float64x2& a, const v_float64x2& b) \
+{ \
+ a.val = vfloat64m1_t(intrin(vint64m1_t(a.val), vint64m1_t(b.val), 2)); \
+ return a; \
+}
+
+OPENCV_HAL_IMPL_RISCVV_FLT_64BIT_OP(&, vand_vv_i64m1)
+OPENCV_HAL_IMPL_RISCVV_FLT_64BIT_OP(|, vor_vv_i64m1)
+OPENCV_HAL_IMPL_RISCVV_FLT_64BIT_OP(^, vxor_vv_i64m1)
+
+inline v_float64x2 operator ~ (const v_float64x2& a)
+{
+ return v_float64x2((vfloat64m1_t)(vnot_v_i64m1((vint64m1_t)(a.val), 2)));
+}
+inline v_int16x8 v_mul_hi(const v_int16x8& a, const v_int16x8& b)
+{
+ return v_int16x8(vmulh_vv_i16m1(a.val, b.val, 8));
+}
+inline v_uint16x8 v_mul_hi(const v_uint16x8& a, const v_uint16x8& b)
+{
+ return v_uint16x8(vmulhu_vv_u16m1(a.val, b.val, 8));
+}
+
+//#define OPENCV_HAL_IMPL_RISCVV_ABS(_Tpuvec, _Tpsvec, usuffix, ssuffix) \
+//inline _Tpuvec v_abs(const _Tpsvec& a) { \
+// E##xm1_t mask=vmflt_vf_e32xm1_f32m1(x.val, 0.0, 4);
+
+//OPENCV_HAL_IMPL_RISCVV_ABS(v_uint8x16, v_int8x16, u8, s8)
+//OPENCV_HAL_IMPL_RISCVV_ABS(v_uint16x8, v_int16x8, u16, s16)
+//OPENCV_HAL_IMPL_RISCVV_ABS(v_uint32x4, v_int32x4, u32, s32)
+
+inline v_uint32x4 v_abs(v_int32x4 x)
+{
+ vbool32_t mask=vmslt_vx_i32m1_b32(x.val, 0, 4);
+ return v_uint32x4((vuint32m1_t)vrsub_vx_i32m1_m(mask, x.val, x.val, 0, 4));
+}
+
+inline v_uint16x8 v_abs(v_int16x8 x)
+{
+ vbool16_t mask=vmslt_vx_i16m1_b16(x.val, 0, 8);
+ return v_uint16x8((vuint16m1_t)vrsub_vx_i16m1_m(mask, x.val, x.val, 0, 8));
+}
+
+inline v_uint8x16 v_abs(v_int8x16 x)
+{
+ vbool8_t mask=vmslt_vx_i8m1_b8(x.val, 0, 16);
+ return v_uint8x16((vuint8m1_t)vrsub_vx_i8m1_m(mask, x.val, x.val, 0, 16));
+}
+
+inline v_float32x4 v_abs(v_float32x4 x)
+{
+ return (v_float32x4)vfsgnjx_vv_f32m1(x.val, x.val, 4);
+}
+
+inline v_float64x2 v_abs(v_float64x2 x)
+{
+ return (v_float64x2)vfsgnjx_vv_f64m1(x.val, x.val, 2);
+}
+
+inline v_float32x4 v_absdiff(const v_float32x4& a, const v_float32x4& b)
+{
+ vfloat32m1_t ret = vfsub_vv_f32m1(a.val, b.val, 4);
+ return (v_float32x4)vfsgnjx_vv_f32m1(ret, ret, 4);
+}
+
+inline v_float64x2 v_absdiff(const v_float64x2& a, const v_float64x2& b)
+{
+ vfloat64m1_t ret = vfsub_vv_f64m1(a.val, b.val, 2);
+ return (v_float64x2)vfsgnjx_vv_f64m1(ret, ret, 2);
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_ABSDIFF_U(bit, num) \
+inline v_uint##bit##x##num v_absdiff(v_uint##bit##x##num a, v_uint##bit##x##num b){ \
+ vuint##bit##m1_t vmax = vmaxu_vv_u##bit##m1(a.val, b.val, num); \
+ vuint##bit##m1_t vmin = vminu_vv_u##bit##m1(a.val, b.val, num); \
+ return v_uint##bit##x##num(vsub_vv_u##bit##m1(vmax, vmin, num));\
+}
+
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF_U(8, 16)
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF_U(16, 8)
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF_U(32, 4)
+
+/** Saturating absolute difference **/
+inline v_int8x16 v_absdiffs(v_int8x16 a, v_int8x16 b){
+ vint8m1_t vmax = vmax_vv_i8m1(a.val, b.val, 16);
+ vint8m1_t vmin = vmin_vv_i8m1(a.val, b.val, 16);
+ return v_int8x16(vssub_vv_i8m1(vmax, vmin, 16));
+}
+inline v_int16x8 v_absdiffs(v_int16x8 a, v_int16x8 b){
+ vint16m1_t vmax = vmax_vv_i16m1(a.val, b.val, 8);
+ vint16m1_t vmin = vmin_vv_i16m1(a.val, b.val, 8);
+ return v_int16x8(vssub_vv_i16m1(vmax, vmin, 8));
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_ABSDIFF(_Tpvec, _Tpv, num) \
+inline v_uint##_Tpvec v_absdiff(v_int##_Tpvec a, v_int##_Tpvec b){ \
+ vint##_Tpv##_t max = vmax_vv_i##_Tpv(a.val, b.val, num);\
+ vint##_Tpv##_t min = vmin_vv_i##_Tpv(a.val, b.val, num);\
+ return v_uint##_Tpvec((vuint##_Tpv##_t)vsub_vv_i##_Tpv(max, min, num)); \
+}
+
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF(8x16, 8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF(16x8, 16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_ABSDIFF(32x4, 32m1, 4)
+
+// Multiply and expand
+inline void v_mul_expand(const v_int8x16& a, const v_int8x16& b,
+ v_int16x8& c, v_int16x8& d)
+{
+ vint16m2_t res = vundefined_i16m2();
+ res = vwmul_vv_i16m2(a.val, b.val, 16);
+ c.val = vget_i16m2_i16m1(res, 0);
+ d.val = vget_i16m2_i16m1(res, 1);
+}
+
+inline void v_mul_expand(const v_uint8x16& a, const v_uint8x16& b,
+ v_uint16x8& c, v_uint16x8& d)
+{
+ vuint16m2_t res = vundefined_u16m2();
+ res = vwmulu_vv_u16m2(a.val, b.val, 16);
+ c.val = vget_u16m2_u16m1(res, 0);
+ d.val = vget_u16m2_u16m1(res, 1);
+}
+
+inline void v_mul_expand(const v_int16x8& a, const v_int16x8& b,
+ v_int32x4& c, v_int32x4& d)
+{
+ vint32m2_t res = vundefined_i32m2();
+ res = vwmul_vv_i32m2(a.val, b.val, 8);
+ c.val = vget_i32m2_i32m1(res, 0);
+ d.val = vget_i32m2_i32m1(res, 1);
+}
+
+inline void v_mul_expand(const v_uint16x8& a, const v_uint16x8& b,
+ v_uint32x4& c, v_uint32x4& d)
+{
+ vuint32m2_t res = vundefined_u32m2();
+ res = vwmulu_vv_u32m2(a.val, b.val, 8);
+ c.val = vget_u32m2_u32m1(res, 0);
+ d.val = vget_u32m2_u32m1(res, 1);
+}
+
+inline void v_mul_expand(const v_int32x4& a, const v_int32x4& b,
+ v_int64x2& c, v_int64x2& d)
+{
+ vint64m2_t res = vundefined_i64m2();
+ res = vwmul_vv_i64m2(a.val, b.val, 4);
+ c.val = vget_i64m2_i64m1(res, 0);
+ d.val = vget_i64m2_i64m1(res, 1);
+}
+
+inline void v_mul_expand(const v_uint32x4& a, const v_uint32x4& b,
+ v_uint64x2& c, v_uint64x2& d)
+{
+ vuint64m2_t res = vundefined_u64m2();
+ res = vwmulu_vv_u64m2(a.val, b.val, 4);
+ c.val = vget_u64m2_u64m1(res, 0);
+ d.val = vget_u64m2_u64m1(res, 1);
+}
+
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint8x16, v_add_wrap, vadd_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int8x16, v_add_wrap, vadd_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint16x8, v_add_wrap, vadd_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int16x8, v_add_wrap, vadd_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint8x16, v_sub_wrap, vsub_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int8x16, v_sub_wrap, vsub_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint16x8, v_sub_wrap, vsub_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int16x8, v_sub_wrap, vsub_vv_i16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint8x16, v_mul_wrap, vmul_vv_u8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int8x16, v_mul_wrap, vmul_vv_i8m1, 16)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_uint16x8, v_mul_wrap, vmul_vv_u16m1, 8)
+OPENCV_HAL_IMPL_RISCVV_BINN_FUNC(v_int16x8, v_mul_wrap, vmul_vv_i16m1, 8)
+//////// Dot Product ////////
+// 16 >> 32
+inline v_int32x4 v_dotprod(const v_int16x8& a, const v_int16x8& b)
+{
+ vint32m2_t res = vundefined_i32m2();
+ res = vwmul_vv_i32m2(a.val, b.val, 8);
+ res = vrgather_vv_i32m2(res, (vuint32m2_t){0, 2, 4, 6, 1, 3, 5, 7}, 8);
+ return v_int32x4(vadd_vv_i32m1(vget_i32m2_i32m1(res, 0), vget_i32m2_i32m1(res, 1), 4));
+}
+inline v_int32x4 v_dotprod(const v_int16x8& a, const v_int16x8& b, const v_int32x4& c)
+{
+ vint32m2_t res = vundefined_i32m2();
+ res = vwmul_vv_i32m2(a.val, b.val, 8);
+ res = vrgather_vv_i32m2(res, (vuint32m2_t){0, 2, 4, 6, 1, 3, 5, 7}, 8);
+ return v_int32x4(vadd_vv_i32m1(vadd_vv_i32m1(vget_i32m2_i32m1(res, 0),vget_i32m2_i32m1(res, 1), 4), c.val, 4));
+}
+
+// 32 >> 64
+inline v_int64x2 v_dotprod(const v_int32x4& a, const v_int32x4& b)
+{
+ vint64m2_t res = vundefined_i64m2();
+ res = vwmul_vv_i64m2(a.val, b.val, 4);
+ res = vrgather_vv_i64m2(res, (vuint64m2_t){0, 2, 1, 3}, 4);
+ return v_int64x2(vadd_vv_i64m1(vget_i64m2_i64m1(res, 0), vget_i64m2_i64m1(res, 1), 2));
+}
+inline v_int64x2 v_dotprod(const v_int32x4& a, const v_int32x4& b, const v_int64x2& c)
+{
+ vint64m2_t res = vundefined_i64m2();
+ res = vwmul_vv_i64m2(a.val, b.val, 4);
+ res = vrgather_vv_i64m2(res, (vuint64m2_t){0, 2, 1, 3}, 4);
+ return v_int64x2(vadd_vv_i64m1(vadd_vv_i64m1(vget_i64m2_i64m1(res, 0), vget_i64m2_i64m1(res, 1), 2), c.val, 2));
+}
+
+// 8 >> 32
+inline v_uint32x4 v_dotprod_expand(const v_uint8x16& a, const v_uint8x16& b)
+{
+ vuint16m2_t v1 = vundefined_u16m2();
+ vuint32m2_t v2 = vundefined_u32m2();
+ v1 = vwmulu_vv_u16m2(a.val, b.val, 16);
+ v1 = vrgather_vv_u16m2(v1, (vuint16m2_t){0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}, 16);
+ v2 = vwaddu_vv_u32m2(vget_u16m2_u16m1(v1, 0), vget_u16m2_u16m1(v1, 1), 8);
+ return v_uint32x4(vadd_vv_u32m1(vget_u32m2_u32m1(v2, 0), vget_u32m2_u32m1(v2, 1), 4));
+}
+
+inline v_uint32x4 v_dotprod_expand(const v_uint8x16& a, const v_uint8x16& b,
+ const v_uint32x4& c)
+{
+ vuint16m2_t v1 = vundefined_u16m2();
+ vuint32m2_t v2 = vundefined_u32m2();
+ v1 = vwmulu_vv_u16m2(a.val, b.val, 16);
+ v1 = vrgather_vv_u16m2(v1, (vuint16m2_t){0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}, 16);
+ v2 = vwaddu_vv_u32m2(vget_u16m2_u16m1(v1, 0), vget_u16m2_u16m1(v1, 1), 8);
+ return v_uint32x4(vadd_vv_u32m1(vadd_vv_u32m1(vget_u32m2_u32m1(v2, 0), vget_u32m2_u32m1(v2, 1), 4), c.val, 4));
+}
+
+inline v_int32x4 v_dotprod_expand(const v_int8x16& a, const v_int8x16& b)
+{
+ vint16m2_t v1 = vundefined_i16m2();
+ vint32m2_t v2 = vundefined_i32m2();
+ v1 = vwmul_vv_i16m2(a.val, b.val, 16);
+ v1 = vrgather_vv_i16m2(v1, (vuint16m2_t){0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}, 16);
+ v2 = vwadd_vv_i32m2(vget_i16m2_i16m1(v1, 0), vget_i16m2_i16m1(v1, 1), 8);
+ return v_int32x4(vadd_vv_i32m1(vget_i32m2_i32m1(v2, 0), vget_i32m2_i32m1(v2, 1), 4));
+}
+
+inline v_int32x4 v_dotprod_expand(const v_int8x16& a, const v_int8x16& b,
+ const v_int32x4& c)
+{
+ vint16m2_t v1 = vundefined_i16m2();
+ vint32m2_t v2 = vundefined_i32m2();
+ v1 = vwmul_vv_i16m2(a.val, b.val, 16);
+ v1 = vrgather_vv_i16m2(v1, (vuint16m2_t){0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}, 16);
+ v2 = vwadd_vv_i32m2(vget_i16m2_i16m1(v1, 0), vget_i16m2_i16m1(v1, 1), 8);
+ return v_int32x4(vadd_vv_i32m1(vadd_vv_i32m1(vget_i32m2_i32m1(v2, 0), vget_i32m2_i32m1(v2, 1), 4), c.val, 4));
+}
+
+inline v_uint64x2 v_dotprod_expand(const v_uint16x8& a, const v_uint16x8& b)
+{
+ vuint32m2_t v1 = vundefined_u32m2();
+ vuint64m2_t v2 = vundefined_u64m2();
+ v1 = vwmulu_vv_u32m2(a.val, b.val, 8);
+ v1 = vrgather_vv_u32m2(v1, (vuint32m2_t){0, 4, 1, 5, 2, 6, 3, 7}, 8);
+ v2 = vwaddu_vv_u64m2(vget_u32m2_u32m1(v1, 0), vget_u32m2_u32m1(v1, 1), 4);
+ return v_uint64x2(vadd_vv_u64m1(vget_u64m2_u64m1(v2, 0), vget_u64m2_u64m1(v2, 1), 2));
+}
+
+inline v_uint64x2 v_dotprod_expand(const v_uint16x8& a, const v_uint16x8& b,
+ const v_uint64x2& c)
+{
+ vuint32m2_t v1 = vundefined_u32m2();
+ vuint64m2_t v2 = vundefined_u64m2();
+ v1 = vwmulu_vv_u32m2(a.val, b.val, 8);
+ v1 = vrgather_vv_u32m2(v1, (vuint32m2_t){0, 4, 1, 5, 2, 6, 3, 7}, 8);
+ v2 = vwaddu_vv_u64m2(vget_u32m2_u32m1(v1, 0), vget_u32m2_u32m1(v1, 1), 4);
+ return v_uint64x2(vadd_vv_u64m1(vadd_vv_u64m1(vget_u64m2_u64m1(v2, 0), vget_u64m2_u64m1(v2, 1), 2), c.val, 2));
+}
+
+inline v_int64x2 v_dotprod_expand(const v_int16x8& a, const v_int16x8& b)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ vint64m2_t v2 = vundefined_i64m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ v1 = vrgather_vv_i32m2(v1, (vuint32m2_t){0, 4, 1, 5, 2, 6, 3, 7}, 8);
+ v2 = vwadd_vv_i64m2(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4);
+ return v_int64x2(vadd_vv_i64m1(vget_i64m2_i64m1(v2, 0), vget_i64m2_i64m1(v2, 1), 2));
+}
+
+inline v_int64x2 v_dotprod_expand(const v_int16x8& a, const v_int16x8& b,
+ const v_int64x2& c)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ vint64m2_t v2 = vundefined_i64m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ v1 = vrgather_vv_i32m2(v1, (vuint32m2_t){0, 4, 1, 5, 2, 6, 3, 7}, 8);
+ v2 = vwadd_vv_i64m2(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4);
+ return v_int64x2(vadd_vv_i64m1(vadd_vv_i64m1(vget_i64m2_i64m1(v2, 0), vget_i64m2_i64m1(v2, 1), 2), c.val, 2));
+}
+
+//////// Fast Dot Product ////////
+// 16 >> 32
+inline v_int32x4 v_dotprod_fast(const v_int16x8& a, const v_int16x8& b)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ return v_int32x4(vadd_vv_i32m1(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4));
+}
+
+inline v_int32x4 v_dotprod_fast(const v_int16x8& a, const v_int16x8& b, const v_int32x4& c)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ return v_int32x4(vadd_vv_i32m1(vadd_vv_i32m1(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4), c.val, 4));
+}
+
+// 32 >> 64
+inline v_int64x2 v_dotprod_fast(const v_int32x4& a, const v_int32x4& b)
+{
+ vint64m2_t v1 = vundefined_i64m2();
+ v1 = vwmul_vv_i64m2(a.val, b.val, 4);
+ return v_int64x2(vadd_vv_i64m1(vget_i64m2_i64m1(v1, 0), vget_i64m2_i64m1(v1, 1), 2));
+}
+inline v_int64x2 v_dotprod_fast(const v_int32x4& a, const v_int32x4& b, const v_int64x2& c)
+{
+ vint64m2_t v1 = vundefined_i64m2();
+ v1 = vwmul_vv_i64m2(a.val, b.val, 8);
+ return v_int64x2(vadd_vv_i64m1(vadd_vv_i64m1(vget_i64m2_i64m1(v1, 0), vget_i64m2_i64m1(v1, 1), 4), c.val, 4));
+}
+
+// 8 >> 32
+inline v_uint32x4 v_dotprod_expand_fast(const v_uint8x16& a, const v_uint8x16& b)
+{
+ vuint16m2_t v1 = vundefined_u16m2();
+ vuint32m2_t v2 = vundefined_u32m2();
+ v1 = vwmulu_vv_u16m2(a.val, b.val, 16);
+ v2 = vwaddu_vv_u32m2(vget_u16m2_u16m1(v1, 0), vget_u16m2_u16m1(v1, 1), 8);
+ return v_uint32x4(vadd_vv_u32m1(vget_u32m2_u32m1(v2, 0), vget_u32m2_u32m1(v2, 1), 4));
+}
+
+inline v_uint32x4 v_dotprod_expand_fast(const v_uint8x16& a, const v_uint8x16& b, const v_uint32x4& c)
+{
+ vuint16m2_t v1 = vundefined_u16m2();
+ vuint32m2_t v2 = vundefined_u32m2();
+ v1 = vwmulu_vv_u16m2(a.val, b.val, 16);
+ v2 = vwaddu_vv_u32m2(vget_u16m2_u16m1(v1, 0), vget_u16m2_u16m1(v1, 1), 8);
+ return v_uint32x4(vadd_vv_u32m1(vadd_vv_u32m1(vget_u32m2_u32m1(v2, 0), vget_u32m2_u32m1(v2, 1), 4), c.val, 4));
+}
+
+inline v_int32x4 v_dotprod_expand_fast(const v_int8x16& a, const v_int8x16& b)
+{
+ vint16m2_t v1 = vundefined_i16m2();
+ vint32m2_t v2 = vundefined_i32m2();
+ v1 = vwmul_vv_i16m2(a.val, b.val, 16);
+ v2 = vwadd_vv_i32m2(vget_i16m2_i16m1(v1, 0), vget_i16m2_i16m1(v1, 1), 8);
+ return v_int32x4(vadd_vv_i32m1(vget_i32m2_i32m1(v2, 0), vget_i32m2_i32m1(v2, 1), 4));
+}
+inline v_int32x4 v_dotprod_expand_fast(const v_int8x16& a, const v_int8x16& b, const v_int32x4& c)
+{
+ vint16m2_t v1 = vundefined_i16m2();
+ vint32m2_t v2 = vundefined_i32m2();
+ v1 = vwmul_vv_i16m2(a.val, b.val, 16);
+ v2 = vwadd_vv_i32m2(vget_i16m2_i16m1(v1, 0), vget_i16m2_i16m1(v1, 1), 8);
+ return v_int32x4(vadd_vv_i32m1(vadd_vv_i32m1(vget_i32m2_i32m1(v2, 0), vget_i32m2_i32m1(v2, 1), 4), c.val, 4));
+}
+
+// 16 >> 64
+inline v_uint64x2 v_dotprod_expand_fast(const v_uint16x8& a, const v_uint16x8& b)
+{
+ vuint32m2_t v1 = vundefined_u32m2();
+ vuint64m2_t v2 = vundefined_u64m2();
+ v1 = vwmulu_vv_u32m2(a.val, b.val, 8);
+ v2 = vwaddu_vv_u64m2(vget_u32m2_u32m1(v1, 0), vget_u32m2_u32m1(v1, 1), 4);
+ return v_uint64x2(vadd_vv_u64m1(vget_u64m2_u64m1(v2, 0), vget_u64m2_u64m1(v2, 1), 2));
+}
+inline v_uint64x2 v_dotprod_expand_fast(const v_uint16x8& a, const v_uint16x8& b, const v_uint64x2& c)
+{
+ vuint32m2_t v1 = vundefined_u32m2();
+ vuint64m2_t v2 = vundefined_u64m2();
+ v1 = vwmulu_vv_u32m2(a.val, b.val, 8);
+ v2 = vwaddu_vv_u64m2(vget_u32m2_u32m1(v1, 0), vget_u32m2_u32m1(v1, 1), 4);
+ return v_uint64x2(vadd_vv_u64m1(vadd_vv_u64m1(vget_u64m2_u64m1(v2, 0), vget_u64m2_u64m1(v2, 1), 2), c.val, 2));
+}
+
+inline v_int64x2 v_dotprod_expand_fast(const v_int16x8& a, const v_int16x8& b)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ vint64m2_t v2 = vundefined_i64m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ v2 = vwadd_vv_i64m2(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4);
+ return v_int64x2(vadd_vv_i64m1(vget_i64m2_i64m1(v2, 0), vget_i64m2_i64m1(v2, 1), 2));
+}
+inline v_int64x2 v_dotprod_expand_fast(const v_int16x8& a, const v_int16x8& b, const v_int64x2& c)
+{
+ vint32m2_t v1 = vundefined_i32m2();
+ vint64m2_t v2 = vundefined_i64m2();
+ v1 = vwmul_vv_i32m2(a.val, b.val, 8);
+ v2 = vwadd_vv_i64m2(vget_i32m2_i32m1(v1, 0), vget_i32m2_i32m1(v1, 1), 4);
+ return v_int64x2(vadd_vv_i64m1(vadd_vv_i64m1(vget_i64m2_i64m1(v2, 0), vget_i64m2_i64m1(v2, 1), 2), c.val, 2));
+}
+
+
+#define OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(_Tpvec, _Tpvec2, len, scalartype, func, intrin, num) \
+inline scalartype v_reduce_##func(const v_##_Tpvec##x##num& a) \
+{\
+ v##_Tpvec2##m1_t val = vmv_v_x_##len##m1(0, num); \
+ val = intrin(val, a.val, val, num); \
+ return vmv_x_s_##len##m1_##len(val, num); \
+}
+
+
+#define OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(_Tpvec, _Tpvec2, scalartype, func, funcu, num) \
+inline scalartype v_reduce_##func(const v_##_Tpvec##x##num& a) \
+{\
+ v##_Tpvec##m1_t val = (v##_Tpvec##m1_t)vmv_v_x_i8m1(0, num); \
+ val = v##funcu##_vs_##_Tpvec2##m1_##_Tpvec2##m1(val, a.val, a.val, num); \
+ return val[0]; \
+}
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(int8, int16, i16, int, sum, vwredsum_vs_i8m1_i16m1, 16)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(int16, int32, i32, int, sum, vwredsum_vs_i16m1_i32m1, 8)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(int32, int64, i64, int, sum, vwredsum_vs_i32m1_i64m1, 4)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(uint8, uint16, u16, unsigned, sum, vwredsumu_vs_u8m1_u16m1, 16)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(uint16, uint32, u32, unsigned, sum, vwredsumu_vs_u16m1_u32m1, 8)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_W(uint32, uint64, u64, unsigned, sum, vwredsumu_vs_u32m1_u64m1, 4)
+inline float v_reduce_sum(const v_float32x4& a) \
+{\
+ vfloat32m1_t val = vfmv_v_f_f32m1(0.0, 4); \
+ val = vfredsum_vs_f32m1_f32m1(val, a.val, val, 4); \
+ return vfmv_f_s_f32m1_f32(val, 4); \
+}
+inline double v_reduce_sum(const v_float64x2& a) \
+{\
+ vfloat64m1_t val = vfmv_v_f_f64m1(0.0, 2); \
+ val = vfredsum_vs_f64m1_f64m1(val, a.val, val, 2); \
+ return vfmv_f_s_f64m1_f64(val, 2); \
+}
+inline uint64 v_reduce_sum(const v_uint64x2& a)
+{ return vext_x_v_u64m1_u64((vuint64m1_t)a.val, 0, 2)+vext_x_v_u64m1_u64((vuint64m1_t)a.val, 1, 2); }
+
+inline int64 v_reduce_sum(const v_int64x2& a)
+{ return vext_x_v_i64m1_i64((vint64m1_t)a.val, 0, 2)+vext_x_v_i64m1_i64((vint64m1_t)a.val, 1, 2); }
+
+#define OPENCV_HAL_IMPL_RISCVV_REDUCE_OP(func) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(int8, i8, int, func, red##func, 16) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(int16, i16, int, func, red##func, 8) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(int32, i32, int, func, red##func, 4) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(int64, i64, int, func, red##func, 2) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(uint8, u8, unsigned, func, red##func##u, 16) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(uint16, u16, unsigned, func, red##func##u, 8) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(uint32, u32, unsigned, func, red##func##u, 4) \
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP_(float32, f32, float, func, fred##func, 4)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP(max)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_OP(min)
+
+inline v_float32x4 v_reduce_sum4(const v_float32x4& a, const v_float32x4& b,
+ const v_float32x4& c, const v_float32x4& d)
+{
+ vfloat32m1_t a0 = vfmv_v_f_f32m1(0.0, 4);
+ vfloat32m1_t b0 = vfmv_v_f_f32m1(0.0, 4);
+ vfloat32m1_t c0 = vfmv_v_f_f32m1(0.0, 4);
+ vfloat32m1_t d0 = vfmv_v_f_f32m1(0.0, 4);
+ a0 = vfredsum_vs_f32m1_f32m1(a0, a.val, a0, 4);
+ b0 = vfredsum_vs_f32m1_f32m1(b0, b.val, b0, 4);
+ c0 = vfredsum_vs_f32m1_f32m1(c0, c.val, c0, 4);
+ d0 = vfredsum_vs_f32m1_f32m1(d0, d.val, d0, 4);
+ return v_float32x4(a0[0], b0[0], c0[0], d0[0]);
+}
+
+inline float v_reduce_sad(const v_float32x4& a, const v_float32x4& b)
+{
+ vfloat32m1_t a0 = vfmv_v_f_f32m1(0.0, 4);
+ vfloat32m1_t x = vfsub_vv_f32m1(a.val, b.val, 4);
+ vbool32_t mask=vmflt_vf_f32m1_b32(x, 0, 4);
+ vfloat32m1_t val = vfrsub_vf_f32m1_m(mask, x, x, 0, 4);
+ a0 = vfredsum_vs_f32m1_f32m1(a0, val, a0, 4);
+ return a0[0];
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(_Tpvec, _Tpvec2) \
+inline unsigned v_reduce_sad(const _Tpvec& a, const _Tpvec&b){ \
+ _Tpvec2 x = v_absdiff(a, b); \
+ return v_reduce_sum(x); \
+}
+
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_int8x16, v_uint8x16)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_uint8x16, v_uint8x16)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_int16x8, v_uint16x8)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_uint16x8, v_uint16x8)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_int32x4, v_uint32x4)
+OPENCV_HAL_IMPL_RISCVV_REDUCE_SAD(v_uint32x4, v_uint32x4)
+
+#define OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(_Tpvec, _Tp, _T, num, uv) \
+inline _Tpvec operator == (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmseq_vv_##_Tp##_b##_T(a.val, b.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+inline _Tpvec operator != (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmsne_vv_##_Tp##_b##_T(a.val, b.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+inline _Tpvec operator < (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmslt##uv##_Tp##_b##_T(a.val, b.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+inline _Tpvec operator > (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmslt##uv##_Tp##_b##_T(b.val, a.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+inline _Tpvec operator <= (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmsle##uv##_Tp##_b##_T(a.val, b.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+inline _Tpvec operator >= (const _Tpvec& a, const _Tpvec& b) \
+{ \
+ vbool##_T##_t mask = vmsle##uv##_Tp##_b##_T(b.val, a.val, num); \
+ return _Tpvec(vmerge_vxm_##_Tp(mask, vmv_v_x_##_Tp(0, num), -1, num)); \
+} \
+
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_int8x16, i8m1, 8, 16, _vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_int16x8, i16m1, 16, 8, _vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_int32x4, i32m1, 32, 4, _vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_int64x2, i64m1, 64, 2, _vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_uint8x16, u8m1, 8, 16, u_vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_uint16x8, u16m1, 16, 8, u_vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_uint32x4, u32m1, 32, 4, u_vv_)
+OPENCV_HAL_IMPL_RISCVV_INT_CMP_OP(v_uint64x2, u64m1, 64, 2, u_vv_)
+
+//TODO: ==
+inline v_float32x4 operator == (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmfeq_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 operator != (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmfne_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 operator < (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmflt_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 operator <= (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmfle_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 operator > (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmfgt_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 operator >= (const v_float32x4& a, const v_float32x4& b)
+{
+ vbool32_t mask = vmfge_vv_f32m1_b32(a.val, b.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+inline v_float32x4 v_not_nan(const v_float32x4& a)
+{
+ vbool32_t mask = vmford_vv_f32m1_b32(a.val, a.val, 4);
+ vint32m1_t res = vmerge_vxm_i32m1(mask, vmv_v_x_i32m1(0.0, 4), -1, 4);
+ return v_float32x4((vfloat32m1_t)res);
+}
+
+//TODO: ==
+inline v_float64x2 operator == (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmfeq_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 operator != (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmfne_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 operator < (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmflt_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 operator <= (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmfle_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 operator > (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmfgt_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 operator >= (const v_float64x2& a, const v_float64x2& b)
+{
+ vbool64_t mask = vmfge_vv_f64m1_b64(a.val, b.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+inline v_float64x2 v_not_nan(const v_float64x2& a)
+{
+ vbool64_t mask = vmford_vv_f64m1_b64(a.val, a.val, 2);
+ vint64m1_t res = vmerge_vxm_i64m1(mask, vmv_v_x_i64m1(0.0, 2), -1, 2);
+ return v_float64x2((vfloat64m1_t)res);
+}
+#define OPENCV_HAL_IMPL_RISCVV_TRANSPOSE4x4(_Tp, _T) \
+inline void v_transpose4x4(const v_##_Tp##32x4& a0, const v_##_Tp##32x4& a1, \
+ const v_##_Tp##32x4& a2, const v_##_Tp##32x4& a3, \
+ v_##_Tp##32x4& b0, v_##_Tp##32x4& b1, \
+ v_##_Tp##32x4& b2, v_##_Tp##32x4& b3) \
+{ \
+ v##_Tp##32m4_t val = vundefined_##_T##m4(); \
+ val = vset_##_T##m4(val, 0, a0.val); \
+ val = vset_##_T##m4(val, 1, a1.val); \
+ val = vset_##_T##m4(val, 2, a2.val); \
+ val = vset_##_T##m4(val, 3, a3.val); \
+ val = vrgather_vv_##_T##m4(val, (vuint32m4_t){0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}, 16); \
+ b0.val = vget_##_T##m4_##_T##m1(val, 0); \
+ b1.val = vget_##_T##m4_##_T##m1(val, 1); \
+ b2.val = vget_##_T##m4_##_T##m1(val, 2); \
+ b3.val = vget_##_T##m4_##_T##m1(val, 3); \
+}
+OPENCV_HAL_IMPL_RISCVV_TRANSPOSE4x4(uint, u32)
+OPENCV_HAL_IMPL_RISCVV_TRANSPOSE4x4(int, i32)
+OPENCV_HAL_IMPL_RISCVV_TRANSPOSE4x4(float, f32)
+
+
+#define OPENCV_HAL_IMPL_RISCVV_SHIFT_LEFT(_Tpvec, suffix, _T, num) \
+inline _Tpvec operator << (const _Tpvec& a, int n) \
+{ return _Tpvec((vsll_vx_##_T##m1(a.val, n, num))); } \
+template inline _Tpvec v_shl(const _Tpvec& a) \
+{ return _Tpvec((vsll_vx_##_T##m1(a.val, n, num))); }
+
+#define OPENCV_HAL_IMPL_RISCVV_SHIFT_RIGHT(_Tpvec, suffix, _T, num, intric) \
+inline _Tpvec operator >> (const _Tpvec& a, int n) \
+{ return _Tpvec((v##intric##_vx_##_T##m1(a.val, n, num))); } \
+template inline _Tpvec v_shr(const _Tpvec& a) \
+{ return _Tpvec((v##intric##_vx_##_T##m1(a.val, n, num))); }\
+template inline _Tpvec v_rshr(const _Tpvec& a) \
+{ return _Tpvec((v##intric##_vx_##_T##m1(vadd_vx_##_T##m1(a.val, 1<<(n-1), num), n, num))); }
+
+// trade efficiency for convenience
+#define OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(suffix, _T, num, intrin) \
+OPENCV_HAL_IMPL_RISCVV_SHIFT_LEFT(v_##suffix##x##num, suffix, _T, num) \
+OPENCV_HAL_IMPL_RISCVV_SHIFT_RIGHT(v_##suffix##x##num, suffix, _T, num, intrin)
+
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(uint8, u8, 16, srl)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(uint16, u16, 8, srl)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(uint32, u32, 4, srl)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(uint64, u64, 2, srl)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(int8, i8, 16, sra)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(int16, i16, 8, sra)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(int32, i32, 4, sra)
+OPENCV_HAL_IMPL_RISCVV_SHIFT_OP(int64, i64, 2, sra)
+
+#if 0
+#define VUP4(n) {0, 1, 2, 3}
+#define VUP8(n) {0, 1, 2, 3, 4, 5, 6, 7}
+#define VUP16(n) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
+#define VUP2(n) {0, 1}
+#endif
+#define OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(_Tpvec, suffix, _T, num, num2, vmv, len) \
+template inline _Tpvec v_rotate_left(const _Tpvec& a) \
+{ \
+ suffix##m1_t tmp = vmv##_##_T##m1(0, num);\
+ tmp = vslideup_vx_##_T##m1_m(vmset_m_##len(num), tmp, a.val, n, num);\
+ return _Tpvec(tmp);\
+} \
+template inline _Tpvec v_rotate_right(const _Tpvec& a) \
+{ \
+ return _Tpvec(vslidedown_vx_##_T##m1(a.val, n, num));\
+} \
+template<> inline _Tpvec v_rotate_left<0>(const _Tpvec& a) \
+{ return a; } \
+template inline _Tpvec v_rotate_right(const _Tpvec& a, const _Tpvec& b) \
+{ \
+ suffix##m2_t tmp = vundefined_##_T##m2(); \
+ tmp = vset_##_T##m2(tmp, 0, a.val); \
+ tmp = vset_##_T##m2(tmp, 1, b.val); \
+ tmp = vslidedown_vx_##_T##m2(tmp, n, num2);\
+ return _Tpvec(vget_##_T##m2_##_T##m1(tmp, 0));\
+} \
+template inline _Tpvec v_rotate_left(const _Tpvec& a, const _Tpvec& b) \
+{ \
+ suffix##m2_t tmp = vundefined_##_T##m2(); \
+ tmp = vset_##_T##m2(tmp, 0, b.val); \
+ tmp = vset_##_T##m2(tmp, 1, a.val); \
+ tmp = vslideup_vx_##_T##m2(tmp, n, num2);\
+ return _Tpvec(vget_##_T##m2_##_T##m1(tmp, 1));\
+} \
+template<> inline _Tpvec v_rotate_left<0>(const _Tpvec& a, const _Tpvec& b) \
+{ \
+ CV_UNUSED(b); return a; \
+}
+
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_uint8x16, vuint8, u8, 16, 32, vmv_v_x, b8)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_int8x16, vint8, i8, 16, 32, vmv_v_x, b8)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_uint16x8, vuint16, u16, 8, 16, vmv_v_x, b16)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_int16x8, vint16, i16, 8, 16, vmv_v_x, b16)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_uint32x4, vuint32, u32, 4, 8, vmv_v_x, b32)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_int32x4, vint32, i32, 4, 8, vmv_v_x, b32)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_uint64x2, vuint64, u64, 2, 4, vmv_v_x, b64)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_int64x2, vint64, i64, 2, 4, vmv_v_x, b64)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_float32x4, vfloat32, f32, 4, 8, vfmv_v_f, b32)
+OPENCV_HAL_IMPL_RISCVV_ROTATE_OP(v_float64x2, vfloat64, f64, 2, 4, vfmv_v_f, b64)
+
+#define OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(_Tpvec, _Tp, _Tp2, len, hnum, num) \
+inline _Tpvec v_load_halves(const _Tp* ptr0, const _Tp* ptr1) \
+{ \
+ typedef uint64 CV_DECL_ALIGNED(1) unaligned_uint64; \
+ vuint64m1_t tmp = {*(unaligned_uint64*)ptr0, *(unaligned_uint64*)ptr1};\
+ return _Tpvec(_Tp2##_t(tmp)); } \
+inline _Tpvec v_load_low(const _Tp* ptr) \
+{ return _Tpvec(vle_v_##len(ptr, hnum)); }\
+inline _Tpvec v_load_aligned(const _Tp* ptr) \
+{ return _Tpvec(vle_v_##len(ptr, num)); } \
+inline _Tpvec v_load(const _Tp* ptr) \
+{ return _Tpvec((_Tp2##_t)vle_v_##len((const _Tp *)ptr, num)); } \
+inline void v_store_low(_Tp* ptr, const _Tpvec& a) \
+{ vse_v_##len(ptr, a.val, hnum);}\
+inline void v_store_high(_Tp* ptr, const _Tpvec& a) \
+{ \
+ _Tp2##_t a0 = vslidedown_vx_##len(a.val, hnum, num); \
+ vse_v_##len(ptr, a0, hnum);}\
+inline void v_store(_Tp* ptr, const _Tpvec& a) \
+{ vse_v_##len(ptr, a.val, num); } \
+inline void v_store_aligned(_Tp* ptr, const _Tpvec& a) \
+{ vse_v_##len(ptr, a.val, num); } \
+inline void v_store_aligned_nocache(_Tp* ptr, const _Tpvec& a) \
+{ vse_v_##len(ptr, a.val, num); } \
+inline void v_store(_Tp* ptr, const _Tpvec& a, hal::StoreMode /*mode*/) \
+{ vse_v_##len(ptr, a.val, num); }
+
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_uint8x16, uchar, vuint8m1, u8m1, 8, 16)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_int8x16, schar, vint8m1, i8m1, 8, 16)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_uint16x8, ushort, vuint16m1, u16m1, 4, 8)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_int16x8, short, vint16m1, i16m1, 4, 8)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_uint32x4, unsigned, vuint32m1, u32m1, 2, 4)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_int32x4, int, vint32m1, i32m1, 2, 4)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_uint64x2, unsigned long, vuint64m1, u64m1, 1, 2)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_int64x2, long, vint64m1, i64m1, 1, 2)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_float32x4, float, vfloat32m1, f32m1, 2, 4)
+OPENCV_HAL_IMPL_RISCVV_LOADSTORE_OP(v_float64x2, double, vfloat64m1, f64m1, 1, 2)
+
+
+////////////// Lookup table access ////////////////////
+
+inline v_int8x16 v_lut(const schar* tab, const int* idx)
+{
+#if 1
+ schar CV_DECL_ALIGNED(32) elems[16] =
+ {
+ tab[idx[ 0]],
+ tab[idx[ 1]],
+ tab[idx[ 2]],
+ tab[idx[ 3]],
+ tab[idx[ 4]],
+ tab[idx[ 5]],
+ tab[idx[ 6]],
+ tab[idx[ 7]],
+ tab[idx[ 8]],
+ tab[idx[ 9]],
+ tab[idx[10]],
+ tab[idx[11]],
+ tab[idx[12]],
+ tab[idx[13]],
+ tab[idx[14]],
+ tab[idx[15]]
+ };
+ return v_int8x16(vle_v_i8m1(elems, 16));
+#else
+ int32xm4_t index32 = vlev_int32xm4(idx, 16);
+ vint16m2_t index16 = vnsra_vx_i16m2_int32xm4(index32, 0, 16);
+ vint8m1_t index = vnsra_vx_i8m1_i16m2(index16, 0, 16);
+ return v_int8x16(vlxbv_i8m1(tab, index, 16));
+#endif
+}
+
+inline v_int8x16 v_lut_pairs(const schar* tab, const int* idx){
+ schar CV_DECL_ALIGNED(32) elems[16] =
+ {
+ tab[idx[0]],
+ tab[idx[0] + 1],
+ tab[idx[1]],
+ tab[idx[1] + 1],
+ tab[idx[2]],
+ tab[idx[2] + 1],
+ tab[idx[3]],
+ tab[idx[3] + 1],
+ tab[idx[4]],
+ tab[idx[4] + 1],
+ tab[idx[5]],
+ tab[idx[5] + 1],
+ tab[idx[6]],
+ tab[idx[6] + 1],
+ tab[idx[7]],
+ tab[idx[7] + 1]
+ };
+ return v_int8x16(vle_v_i8m1(elems, 16));
+}
+inline v_int8x16 v_lut_quads(const schar* tab, const int* idx)
+{
+ schar CV_DECL_ALIGNED(32) elems[16] =
+ {
+ tab[idx[0]],
+ tab[idx[0] + 1],
+ tab[idx[0] + 2],
+ tab[idx[0] + 3],
+ tab[idx[1]],
+ tab[idx[1] + 1],
+ tab[idx[1] + 2],
+ tab[idx[1] + 3],
+ tab[idx[2]],
+ tab[idx[2] + 1],
+ tab[idx[2] + 2],
+ tab[idx[2] + 3],
+ tab[idx[3]],
+ tab[idx[3] + 1],
+ tab[idx[3] + 2],
+ tab[idx[3] + 3]
+ };
+ return v_int8x16(vle_v_i8m1(elems, 16));
+}
+
+inline v_uint8x16 v_lut(const uchar* tab, const int* idx) { return v_reinterpret_as_u8(v_lut((schar*)tab, idx)); }
+inline v_uint8x16 v_lut_pairs(const uchar* tab, const int* idx) { return v_reinterpret_as_u8(v_lut_pairs((schar*)tab, idx)); }
+inline v_uint8x16 v_lut_quads(const uchar* tab, const int* idx) { return v_reinterpret_as_u8(v_lut_quads((schar*)tab, idx)); }
+
+inline v_int16x8 v_lut(const short* tab, const int* idx)
+{
+ short CV_DECL_ALIGNED(32) elems[8] =
+ {
+ tab[idx[0]],
+ tab[idx[1]],
+ tab[idx[2]],
+ tab[idx[3]],
+ tab[idx[4]],
+ tab[idx[5]],
+ tab[idx[6]],
+ tab[idx[7]]
+ };
+ return v_int16x8(vle_v_i16m1(elems, 8));
+}
+inline v_int16x8 v_lut_pairs(const short* tab, const int* idx)
+{
+ short CV_DECL_ALIGNED(32) elems[8] =
+ {
+ tab[idx[0]],
+ tab[idx[0] + 1],
+ tab[idx[1]],
+ tab[idx[1] + 1],
+ tab[idx[2]],
+ tab[idx[2] + 1],
+ tab[idx[3]],
+ tab[idx[3] + 1]
+ };
+ return v_int16x8(vle_v_i16m1(elems, 8));
+}
+inline v_int16x8 v_lut_quads(const short* tab, const int* idx)
+{
+ short CV_DECL_ALIGNED(32) elems[8] =
+ {
+ tab[idx[0]],
+ tab[idx[0] + 1],
+ tab[idx[0] + 2],
+ tab[idx[0] + 3],
+ tab[idx[1]],
+ tab[idx[1] + 1],
+ tab[idx[1] + 2],
+ tab[idx[1] + 3]
+ };
+ return v_int16x8(vle_v_i16m1(elems, 8));
+}
+inline v_uint16x8 v_lut(const ushort* tab, const int* idx) { return v_reinterpret_as_u16(v_lut((short*)tab, idx)); }
+inline v_uint16x8 v_lut_pairs(const ushort* tab, const int* idx) { return v_reinterpret_as_u16(v_lut_pairs((short*)tab, idx)); }
+inline v_uint16x8 v_lut_quads(const ushort* tab, const int* idx) { return v_reinterpret_as_u16(v_lut_quads((short*)tab, idx)); }
+
+inline v_int32x4 v_lut(const int* tab, const int* idx)
+{
+ int CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idx[0]],
+ tab[idx[1]],
+ tab[idx[2]],
+ tab[idx[3]]
+ };
+ return v_int32x4(vle_v_i32m1(elems, 4));
+}
+inline v_int32x4 v_lut_pairs(const int* tab, const int* idx)
+{
+ int CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idx[0]],
+ tab[idx[0] + 1],
+ tab[idx[1]],
+ tab[idx[1] + 1]
+ };
+ return v_int32x4(vle_v_i32m1(elems, 4));
+}
+inline v_int32x4 v_lut_quads(const int* tab, const int* idx)
+{
+ return v_int32x4(vle_v_i32m1(tab+idx[0], 4));
+}
+inline v_uint32x4 v_lut(const unsigned* tab, const int* idx) { return v_reinterpret_as_u32(v_lut((int*)tab, idx)); }
+inline v_uint32x4 v_lut_pairs(const unsigned* tab, const int* idx) { return v_reinterpret_as_u32(v_lut_pairs((int*)tab, idx)); }
+inline v_uint32x4 v_lut_quads(const unsigned* tab, const int* idx) { return v_reinterpret_as_u32(v_lut_quads((int*)tab, idx)); }
+
+inline v_int64x2 v_lut(const int64_t* tab, const int* idx)
+{
+ vint64m1_t res = {tab[idx[0]], tab[idx[1]]};
+ return v_int64x2(res);
+}
+inline v_int64x2 v_lut_pairs(const int64_t* tab, const int* idx)
+{
+ return v_int64x2(vle_v_i64m1(tab+idx[0], 2));
+}
+
+inline v_uint64x2 v_lut(const uint64_t* tab, const int* idx)
+{
+ vuint64m1_t res = {tab[idx[0]], tab[idx[1]]};
+ return v_uint64x2(res);
+}
+inline v_uint64x2 v_lut_pairs(const uint64_t* tab, const int* idx)
+{
+ return v_uint64x2(vle_v_u64m1(tab+idx[0], 2));
+}
+
+inline v_float32x4 v_lut(const float* tab, const int* idx)
+{
+ float CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idx[0]],
+ tab[idx[1]],
+ tab[idx[2]],
+ tab[idx[3]]
+ };
+ return v_float32x4(vle_v_f32m1(elems, 4));
+}
+inline v_float32x4 v_lut_pairs(const float* tab, const int* idx)
+{
+ float CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idx[0]],
+ tab[idx[0]+1],
+ tab[idx[1]],
+ tab[idx[1]+1]
+ };
+ return v_float32x4(vle_v_f32m1(elems, 4));
+}
+inline v_float32x4 v_lut_quads(const float* tab, const int* idx)
+{
+ return v_float32x4(vle_v_f32m1(tab + idx[0], 4));
+}
+inline v_float64x2 v_lut(const double* tab, const int* idx)
+{
+ vfloat64m1_t res = {tab[idx[0]], tab[idx[1]]};
+ return v_float64x2(res);
+}
+inline v_float64x2 v_lut_pairs(const double* tab, const int* idx)
+{
+ return v_float64x2(vle_v_f64m1(tab+idx[0], 2));
+}
+
+inline v_int32x4 v_lut(const int* tab, const v_int32x4& idxvec)
+{
+ int CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idxvec.val[0]],
+ tab[idxvec.val[1]],
+ tab[idxvec.val[2]],
+ tab[idxvec.val[3]]
+ };
+ return v_int32x4(vle_v_i32m1(elems, 4));
+}
+
+inline v_uint32x4 v_lut(const unsigned* tab, const v_int32x4& idxvec)
+{
+ unsigned CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idxvec.val[0]],
+ tab[idxvec.val[1]],
+ tab[idxvec.val[2]],
+ tab[idxvec.val[3]]
+ };
+ return v_uint32x4(vle_v_u32m1(elems, 4));
+}
+
+inline v_float32x4 v_lut(const float* tab, const v_int32x4& idxvec)
+{
+ float CV_DECL_ALIGNED(32) elems[4] =
+ {
+ tab[idxvec.val[0]],
+ tab[idxvec.val[1]],
+ tab[idxvec.val[2]],
+ tab[idxvec.val[3]]
+ };
+ return v_float32x4(vle_v_f32m1(elems, 4));
+}
+inline v_float64x2 v_lut(const double* tab, const v_int32x4& idxvec)
+{
+ vfloat64m1_t res = {tab[idxvec.val[0]], tab[idxvec.val[1]]};
+ return v_float64x2(res);
+}
+inline void v_lut_deinterleave(const float* tab, const v_int32x4& idxvec, v_float32x4& x, v_float32x4& y)
+{
+ vint32m1_t index_x = vmul_vx_i32m1(idxvec.val, 4, 4);
+ vint32m1_t index_y = vadd_vx_i32m1(index_x, 4, 4);
+
+ x.val = vlxe_v_f32m1(tab, index_x, 4);
+ y.val = vlxe_v_f32m1(tab, index_y, 4);
+}
+
+inline void v_lut_deinterleave(const double* tab, const v_int32x4& idxvec, v_float64x2& x, v_float64x2& y)
+{
+ int CV_DECL_ALIGNED(32) idx[4];
+ v_store_aligned(idx, idxvec);
+
+ x = v_float64x2(tab[idx[0]], tab[idx[1]]);
+ y = v_float64x2(tab[idx[0]+1], tab[idx[1]+1]);
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_PACKS(_Tp, _Tp2, _T2, num2, _T1, num, intrin, shr, _Type) \
+inline v_##_Tp##x##num v_pack(const v_##_Tp2##x##num2& a, const v_##_Tp2##x##num2& b) \
+{ \
+ v##_Tp2##m2_t tmp = vundefined_##_T2##m2(); \
+ tmp = vset_##_T2##m2(tmp, 0, a.val); \
+ tmp = vset_##_T2##m2(tmp, 1, b.val); \
+ return v_##_Tp##x##num(shr##_##_T1##m1(tmp, 0, num)); \
+}\
+template inline \
+v_##_Tp##x##num v_rshr_pack(const v_##_Tp2##x##num2& a, const v_##_Tp2##x##num2& b) \
+{ \
+ v##_Tp2##m2_t tmp = vundefined_##_T2##m2(); \
+ tmp = vset_##_T2##m2(tmp, 0, a.val); \
+ tmp = vset_##_T2##m2(tmp, 1, b.val); \
+ return v_##_Tp##x##num(intrin##_##_T1##m1(tmp, n, num)); \
+}\
+inline void v_pack_store(_Type* ptr, const v_##_Tp2##x##num2& a) \
+{ \
+ v##_Tp2##m2_t tmp = vundefined_##_T2##m2(); \
+ tmp = vset_##_T2##m2(tmp, 0, a.val); \
+ tmp = vset_##_T2##m2(tmp, 1, vmv_v_x_##_T2##m1(0, num2)); \
+ asm("" ::: "memory"); \
+ vse_v_##_T1##m1(ptr, shr##_##_T1##m1(tmp, 0, num), num2); \
+}\
+template inline \
+void v_rshr_pack_store(_Type* ptr, const v_##_Tp2##x##num2& a) \
+{ \
+ v##_Tp2##m2_t tmp = vundefined_##_T2##m2(); \
+ tmp = vset_##_T2##m2(tmp, 0, a.val); \
+ tmp = vset_##_T2##m2(tmp, 1, vmv_v_x_##_T2##m1(0, num2)); \
+ vse_v_##_T1##m1(ptr, intrin##_##_T1##m1(tmp, n, num), num2); \
+}
+OPENCV_HAL_IMPL_RISCVV_PACKS(int8, int16, i16, 8, i8, 16, vnclip_vx, vnclip_vx, signed char)
+OPENCV_HAL_IMPL_RISCVV_PACKS(int16, int32, i32, 4, i16, 8, vnclip_vx, vnclip_vx, signed short)
+OPENCV_HAL_IMPL_RISCVV_PACKS(int32, int64, i64, 2, i32, 4, vnclip_vx, vnsra_vx, int)
+OPENCV_HAL_IMPL_RISCVV_PACKS(uint8, uint16, u16, 8, u8, 16, vnclipu_vx, vnclipu_vx, unsigned char)
+OPENCV_HAL_IMPL_RISCVV_PACKS(uint16, uint32, u32, 4, u16, 8, vnclipu_vx, vnclipu_vx, unsigned short)
+OPENCV_HAL_IMPL_RISCVV_PACKS(uint32, uint64, u64, 2, u32, 4, vnclipu_vx, vnsrl_vx, unsigned int)
+
+// pack boolean
+inline v_uint8x16 v_pack_b(const v_uint16x8& a, const v_uint16x8& b)
+{
+ vuint16m2_t tmp = vundefined_u16m2(); \
+ tmp = vset_u16m2(tmp, 0, a.val); \
+ tmp = vset_u16m2(tmp, 1, b.val); \
+ return v_uint8x16(vnsrl_vx_u8m1(tmp, 0, 16));
+}
+
+inline v_uint8x16 v_pack_b(const v_uint32x4& a, const v_uint32x4& b,
+ const v_uint32x4& c, const v_uint32x4& d)
+{
+ vuint32m4_t vabcd = vundefined_u32m4(); \
+ vuint16m2_t v16 = vundefined_u16m2(); \
+ vabcd = vset_u32m4(vabcd, 0, a.val); \
+ vabcd = vset_u32m4(vabcd, 1, b.val); \
+ vabcd = vset_u32m4(vabcd, 2, c.val); \
+ vabcd = vset_u32m4(vabcd, 3, d.val); \
+ v16 = vnsrl_vx_u16m2(vabcd, 0, 16);
+ return v_uint8x16(vnsrl_vx_u8m1(v16, 0, 16));
+}
+
+inline v_uint8x16 v_pack_b(const v_uint64x2& a, const v_uint64x2& b, const v_uint64x2& c,
+ const v_uint64x2& d, const v_uint64x2& e, const v_uint64x2& f,
+ const v_uint64x2& g, const v_uint64x2& h)
+{
+ vuint64m8_t v64 = vundefined_u64m8(); \
+ vuint32m4_t v32 = vundefined_u32m4(); \
+ vuint16m2_t v16 = vundefined_u16m2(); \
+ v64 = vset_u64m8(v64, 0, a.val); \
+ v64 = vset_u64m8(v64, 1, b.val); \
+ v64 = vset_u64m8(v64, 2, c.val); \
+ v64 = vset_u64m8(v64, 3, d.val); \
+ v64 = vset_u64m8(v64, 4, e.val); \
+ v64 = vset_u64m8(v64, 5, f.val); \
+ v64 = vset_u64m8(v64, 6, g.val); \
+ v64 = vset_u64m8(v64, 7, h.val); \
+ v32 = vnsrl_vx_u32m4(v64, 0, 16);
+ v16 = vnsrl_vx_u16m2(v32, 0, 16);
+ return v_uint8x16(vnsrl_vx_u8m1(v16, 0, 16));
+}
+
+//inline v_uint8x16 v_pack_u(const v_int16x8& a, const v_int16x8& b) \
+//{ \
+// int16xm2_u tmp; \
+// tmp.m1[0] = (vint16m1_t)a.val; \
+// tmp.m1[1] = (vint16m1_t)b.val; \
+// e8xm1_t mask = (e8xm1_t)vmsge_vx_e16xm2_i16m2(tmp.v, 0, 16);\
+// return v_uint8x16(vnclipuvi_mask_u8m1_u16m2(vmv_v_x_u8m1(0, 16), (vuint16m2_t)tmp.v, 0, mask, 16));
+//}
+
+#define OPENCV_HAL_IMPL_RISCVV_PACK_U(tp1, num1, tp2, num2, _Tp) \
+inline v_uint##tp1##x##num1 v_pack_u(const v_int##tp2##x##num2& a, const v_int##tp2##x##num2& b) \
+{ \
+ vint##tp2##m2_t tmp = vundefined_##i##tp2##m2(); \
+ tmp = vset_##i##tp2##m2(tmp, 0, a.val); \
+ tmp = vset_##i##tp2##m2(tmp, 1, b.val); \
+ vint##tp2##m2_t val = vmax_vx_i##tp2##m2(tmp, 0, num1);\
+ return v_uint##tp1##x##num1(vnclipu_vx_u##tp1##m1((vuint##tp2##m2_t)val, 0, num1)); \
+} \
+inline void v_pack_u_store(_Tp* ptr, const v_int##tp2##x##num2& a) \
+{ \
+ vint##tp2##m2_t tmp = vundefined_##i##tp2##m2(); \
+ tmp = vset_##i##tp2##m2(tmp, 0, a.val); \
+ vint##tp2##m2_t val = vmax_vx_i##tp2##m2(tmp, 0, num1);\
+ return vse_v_u##tp1##m1(ptr, vnclipu_vx_u##tp1##m1((vuint##tp2##m2_t)val, 0, num1), num2); \
+} \
+template inline \
+v_uint##tp1##x##num1 v_rshr_pack_u(const v_int##tp2##x##num2& a, const v_int##tp2##x##num2& b) \
+{ \
+ vint##tp2##m2_t tmp = vundefined_##i##tp2##m2(); \
+ tmp = vset_##i##tp2##m2(tmp, 0, a.val); \
+ tmp = vset_##i##tp2##m2(tmp, 1, b.val); \
+ vint##tp2##m2_t val = vmax_vx_i##tp2##m2(tmp, 0, num1);\
+ return v_uint##tp1##x##num1(vnclipu_vx_u##tp1##m1((vuint##tp2##m2_t)val, n, num1)); \
+} \
+template inline \
+void v_rshr_pack_u_store(_Tp* ptr, const v_int##tp2##x##num2& a) \
+{ \
+ vint##tp2##m2_t tmp = vundefined_##i##tp2##m2(); \
+ tmp = vset_##i##tp2##m2(tmp, 0, a.val); \
+ vint##tp2##m2_t val_ = vmax_vx_i##tp2##m2(tmp, 0, num1);\
+ vuint##tp1##m1_t val = vnclipu_vx_u##tp1##m1((vuint##tp2##m2_t)val_, n, num1); \
+ return vse_v_u##tp1##m1(ptr, val, num2);\
+}
+OPENCV_HAL_IMPL_RISCVV_PACK_U(8, 16, 16, 8, unsigned char )
+OPENCV_HAL_IMPL_RISCVV_PACK_U(16, 8, 32, 4, unsigned short)
+
+#ifdef __GNUC__
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wuninitialized"
+#endif
+
+// saturating multiply 8-bit, 16-bit
+#define OPENCV_HAL_IMPL_RISCVV_MUL_SAT(_Tpvec, _Tpwvec) \
+ inline _Tpvec operator * (const _Tpvec& a, const _Tpvec& b) \
+ { \
+ _Tpwvec c, d; \
+ v_mul_expand(a, b, c, d); \
+ return v_pack(c, d); \
+ } \
+ inline _Tpvec& operator *= (_Tpvec& a, const _Tpvec& b) \
+ { a = a * b; return a; }
+
+OPENCV_HAL_IMPL_RISCVV_MUL_SAT(v_int8x16, v_int16x8)
+OPENCV_HAL_IMPL_RISCVV_MUL_SAT(v_uint8x16, v_uint16x8)
+OPENCV_HAL_IMPL_RISCVV_MUL_SAT(v_int16x8, v_int32x4)
+OPENCV_HAL_IMPL_RISCVV_MUL_SAT(v_uint16x8, v_uint32x4)
+
+#ifdef __GNUC__
+#pragma GCC diagnostic pop
+#endif
+static const signed char popCountTable[256] =
+{
+ 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
+ 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
+};
+
+inline vuint8m1_t vcnt_u8(vuint8m1_t val){
+ vuint8m1_t v0 = val & 1;
+ return vlxe_v_u8m1((unsigned char*)popCountTable, val >> 1, 16)+v0;
+}
+
+inline v_uint8x16
+v_popcount(const v_uint8x16& a)
+{
+ return v_uint8x16(vcnt_u8(a.val));
+}
+
+inline v_uint8x16
+v_popcount(const v_int8x16& a)
+{
+ return v_uint8x16(vcnt_u8((vuint8m1_t)a.val));
+}
+
+inline v_uint16x8
+v_popcount(const v_uint16x8& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0x0E0C0A0806040200, 0, 0x0F0D0B0907050301, 0};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint16m2_t res = vwaddu_vv_u16m2(vget_u8m2_u8m1(tmp, 0), vget_u8m2_u8m1(tmp, 1), 8);
+ return v_uint16x8(vget_u16m2_u16m1(res, 0));
+}
+
+inline v_uint16x8
+v_popcount(const v_int16x8& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0x0E0C0A0806040200, 0, 0x0F0D0B0907050301, 0};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint16m2_t res = vwaddu_vv_u16m2(vget_u8m2_u8m1(tmp, 0), vget_u8m2_u8m1(tmp, 1), 8);
+ return v_uint16x8(vget_u16m2_u16m1(res, 0));
+}
+
+inline v_uint32x4
+v_popcount(const v_uint32x4& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0xFFFFFFFF0C080400, 0xFFFFFFFF0D090501,
+ 0xFFFFFFFF0E0A0602, 0xFFFFFFFF0F0B0703};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint16m2_t res_ = vwaddu_vv_u16m2(vget_u8m2_u8m1(tmp, 0), vget_u8m2_u8m1(tmp, 1), 16);
+ vuint32m2_t res = vwaddu_vv_u32m2(vget_u16m2_u16m1(res_, 0), vget_u16m2_u16m1(res_, 1), 8);
+ return v_uint32x4(vget_u32m2_u32m1(res, 0));
+}
+
+inline v_uint32x4
+v_popcount(const v_int32x4& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0xFFFFFFFF0C080400, 0xFFFFFFFF0D090501,
+ 0xFFFFFFFF0E0A0602, 0xFFFFFFFF0F0B0703};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint16m2_t res_ = vwaddu_vv_u16m2(vget_u8m2_u8m1(tmp, 0), vget_u8m2_u8m1(tmp, 1), 16);
+ vuint32m2_t res = vwaddu_vv_u32m2(vget_u16m2_u16m1(res_, 0), vget_u16m2_u16m1(res_, 1), 8);
+ return v_uint32x4(vget_u32m2_u32m1(res, 0));
+}
+
+inline v_uint64x2
+v_popcount(const v_uint64x2& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0x0706050403020100, 0x0000000000000000,
+ 0x0F0E0D0C0B0A0908, 0x0000000000000000};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint8m1_t zero = vmv_v_x_u8m1(0, 16);
+ vuint8m1_t res1 = zero;
+ vuint8m1_t res2 = zero;
+ res1 = vredsum_vs_u8m1_u8m1(res1, vget_u8m2_u8m1(tmp, 0), zero, 8);
+ res2 = vredsum_vs_u8m1_u8m1(res2, vget_u8m2_u8m1(tmp, 1), zero, 8);
+
+ return v_uint64x2((unsigned long)vmv_x_s_u8m1_u8(res1, 8), (unsigned long)vmv_x_s_u8m1_u8(res2, 8));
+}
+
+inline v_uint64x2
+v_popcount(const v_int64x2& a)
+{
+ vuint8m2_t tmp = vundefined_u8m2();
+ tmp = vset_u8m2(tmp, 0, vcnt_u8((vuint8m1_t)a.val));
+ vuint64m2_t mask = (vuint64m2_t){0x0706050403020100, 0x0000000000000000,
+ 0x0F0E0D0C0B0A0908, 0x0000000000000000};
+ tmp = vrgather_vv_u8m2(tmp, (vuint8m2_t)mask, 32); \
+ vuint8m1_t zero = vmv_v_x_u8m1(0, 16);
+ vuint8m1_t res1 = zero;
+ vuint8m1_t res2 = zero;
+ res1 = vredsum_vs_u8m1_u8m1(res1, vget_u8m2_u8m1(tmp, 0), zero, 8);
+ res2 = vredsum_vs_u8m1_u8m1(res2, vget_u8m2_u8m1(tmp, 1), zero, 8);
+
+ return v_uint64x2((unsigned long)vmv_x_s_u8m1_u8(res1, 8), (unsigned long)vmv_x_s_u8m1_u8(res2, 8));
+}
+
+#define SMASK 1, 2, 4, 8, 16, 32, 64, 128
+inline int v_signmask(const v_uint8x16& a)
+{
+ vuint8m1_t t0 = vsrl_vx_u8m1(a.val, 7, 16);
+ vuint8m1_t m1 = (vuint8m1_t){SMASK, SMASK};
+ vuint16m2_t t1 = vwmulu_vv_u16m2(t0, m1, 16);
+ vuint32m1_t res = vmv_v_x_u32m1(0, 4);
+ vuint32m2_t t2 = vwmulu_vx_u32m2(vget_u16m2_u16m1(t1, 1), 256, 8);
+ res = vredsum_vs_u32m2_u32m1(res, t2, res, 8);
+ res = vwredsumu_vs_u16m1_u32m1(res, vget_u16m2_u16m1(t1, 0), res, 8);
+ return vmv_x_s_u32m1_u32(res, 8);
+}
+inline int v_signmask(const v_int8x16& a)
+{
+ vuint8m1_t t0 = vsrl_vx_u8m1((vuint8m1_t)a.val, 7, 16);
+ vuint8m1_t m1 = (vuint8m1_t){SMASK, SMASK};
+ vint16m2_t t1 = (vint16m2_t)vwmulu_vv_u16m2(t0, m1, 16);
+ vint32m1_t res = vmv_v_x_i32m1(0, 4);
+ vint32m2_t t2 = vwmul_vx_i32m2(vget_i16m2_i16m1(t1, 1), 256, 8);
+ res = vredsum_vs_i32m2_i32m1(res, t2, res, 8);
+ res = vwredsum_vs_i16m1_i32m1(res, vget_i16m2_i16m1(t1, 0), res, 8);
+ return vmv_x_s_i32m1_i32(res, 8);
+}
+
+inline int v_signmask(const v_int16x8& a)
+{
+ vint16m1_t t0 = (vint16m1_t)vsrl_vx_u16m1((vuint16m1_t)a.val, 15, 8);
+ vint16m1_t m1 = (vint16m1_t){SMASK};
+ vint16m1_t t1 = vmul_vv_i16m1(t0, m1, 8);
+ vint16m1_t res = vmv_v_x_i16m1(0, 8);
+ res = vredsum_vs_i16m1_i16m1(res, t1, res, 8);
+ return vmv_x_s_i16m1_i16(res, 8);
+}
+inline int v_signmask(const v_uint16x8& a)
+{
+ vint16m1_t t0 = (vint16m1_t)vsrl_vx_u16m1((vuint16m1_t)a.val, 15, 8);
+ vint16m1_t m1 = (vint16m1_t){SMASK};
+ vint16m1_t t1 = vmul_vv_i16m1(t0, m1, 8);
+ vint16m1_t res = vmv_v_x_i16m1(0, 8);
+ res = vredsum_vs_i16m1_i16m1(res, t1, res, 8);
+ return vmv_x_s_i16m1_i16(res, 8);
+}
+inline int v_signmask(const v_int32x4& a)
+{
+ vint32m1_t t0 = (vint32m1_t)vsrl_vx_u32m1((vuint32m1_t)a.val, 31, 4);
+ vint32m1_t m1 = (vint32m1_t){1, 2, 4, 8};
+ vint32m1_t res = vmv_v_x_i32m1(0, 4);
+ vint32m1_t t1 = vmul_vv_i32m1(t0, m1, 4);
+ res = vredsum_vs_i32m1_i32m1(res, t1, res, 4);
+ return vmv_x_s_i32m1_i32(res, 4);
+}
+inline int v_signmask(const v_uint32x4& a)
+{
+ vint32m1_t t0 = (vint32m1_t)vsrl_vx_u32m1(a.val, 31, 4);
+ vint32m1_t m1 = (vint32m1_t){1, 2, 4, 8};
+ vint32m1_t res = vmv_v_x_i32m1(0, 4);
+ vint32m1_t t1 = vmul_vv_i32m1(t0, m1, 4);
+ res = vredsum_vs_i32m1_i32m1(res, t1, res, 4);
+ return vmv_x_s_i32m1_i32(res, 4);
+}
+inline int v_signmask(const v_uint64x2& a)
+{
+ vuint64m1_t v0 = vsrl_vx_u64m1(a.val, 63, 2);
+ int res = (int)vext_x_v_u64m1_u64(v0, 0, 2) + ((int)vext_x_v_u64m1_u64(v0, 1, 2) << 1);
+ return res;
+}
+inline int v_signmask(const v_int64x2& a)
+{ return v_signmask(v_reinterpret_as_u64(a)); }
+inline int v_signmask(const v_float64x2& a)
+{ return v_signmask(v_reinterpret_as_u64(a)); }
+inline int v_signmask(const v_float32x4& a)
+{
+ vint32m1_t t0 = (vint32m1_t)vsrl_vx_u32m1((vuint32m1_t)a.val, 31, 4);
+ vint32m1_t m1 = (vint32m1_t){1, 2, 4, 8};
+ vint32m1_t res = vmv_v_x_i32m1(0, 4);
+ vint32m1_t t1 = vmul_vv_i32m1(t0, m1, 4);
+ res = vredsum_vs_i32m1_i32m1(res, t1, res, 4);
+ return vmv_x_s_i32m1_i32(res, 4);
+}
+
+inline int v_scan_forward(const v_int8x16& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_uint8x16& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_int16x8& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_uint16x8& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_int32x4& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_uint32x4& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_float32x4& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_int64x2& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+inline int v_scan_forward(const v_uint64x2& a) {
+int val = v_signmask(a);
+if(val==0) return 0;
+else return trailingZeros32(val); }
+
+#define OPENCV_HAL_IMPL_RISCVV_CHECK_ALLANY(_Tpvec, suffix, _T, shift, num) \
+inline bool v_check_all(const v_##_Tpvec& a) \
+{ \
+ suffix##m1_t v0 = vsrl_vx_##_T(vnot_v_##_T(a.val, num), shift, num); \
+ vuint64m1_t v1 = vuint64m1_t(v0); \
+ return (v1[0] | v1[1]) == 0; \
+} \
+inline bool v_check_any(const v_##_Tpvec& a) \
+{ \
+ suffix##m1_t v0 = vsrl_vx_##_T(a.val, shift, num); \
+ vuint64m1_t v1 = vuint64m1_t(v0); \
+ return (v1[0] | v1[1]) != 0; \
+}
+
+OPENCV_HAL_IMPL_RISCVV_CHECK_ALLANY(uint8x16, vuint8, u8m1, 7, 16)
+OPENCV_HAL_IMPL_RISCVV_CHECK_ALLANY(uint16x8, vuint16, u16m1, 15, 8)
+OPENCV_HAL_IMPL_RISCVV_CHECK_ALLANY(uint32x4, vuint32, u32m1, 31, 4)
+OPENCV_HAL_IMPL_RISCVV_CHECK_ALLANY(uint64x2, vuint64, u64m1, 63, 2)
+
+inline bool v_check_all(const v_int8x16& a)
+{ return v_check_all(v_reinterpret_as_u8(a)); }
+inline bool v_check_all(const v_int16x8& a)
+{ return v_check_all(v_reinterpret_as_u16(a)); }
+inline bool v_check_all(const v_int32x4& a)
+{ return v_check_all(v_reinterpret_as_u32(a)); }
+inline bool v_check_all(const v_float32x4& a)
+{ return v_check_all(v_reinterpret_as_u32(a)); }
+inline bool v_check_all(const v_int64x2& a)
+{ return v_check_all(v_reinterpret_as_u64(a)); }
+inline bool v_check_all(const v_float64x2& a)
+{ return v_check_all(v_reinterpret_as_u64(a)); }
+
+inline bool v_check_any(const v_int8x16& a)
+{ return v_check_any(v_reinterpret_as_u8(a)); }
+inline bool v_check_any(const v_int16x8& a)
+{ return v_check_any(v_reinterpret_as_u16(a)); }
+inline bool v_check_any(const v_int32x4& a)
+{ return v_check_any(v_reinterpret_as_u32(a)); }
+inline bool v_check_any(const v_float32x4& a)
+{ return v_check_any(v_reinterpret_as_u32(a)); }
+inline bool v_check_any(const v_int64x2& a)
+{ return v_check_any(v_reinterpret_as_u64(a)); }
+inline bool v_check_any(const v_float64x2& a)
+{ return v_check_any(v_reinterpret_as_u64(a)); }
+
+#define OPENCV_HAL_IMPL_RISCVV_SELECT(_Tpvec, suffix, _Tpvec2, num) \
+inline _Tpvec v_select(const _Tpvec& mask, const _Tpvec& a, const _Tpvec& b) \
+{ \
+ return _Tpvec(vmerge_vvm_##suffix(_Tpvec2(mask.val), b.val, a.val, num)); \
+}
+
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_int8x16, i8m1, vbool8_t, 16)
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_int16x8, i16m1, vbool16_t, 8)
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_int32x4, i32m1, vbool32_t, 4)
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_uint8x16, u8m1, vbool8_t, 16)
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_uint16x8, u16m1, vbool16_t, 8)
+OPENCV_HAL_IMPL_RISCVV_SELECT(v_uint32x4, u32m1, vbool32_t, 4)
+inline v_float32x4 v_select(const v_float32x4& mask, const v_float32x4& a, const v_float32x4& b)
+{
+ return v_float32x4((vfloat32m1_t)vmerge_vvm_u32m1((vbool32_t)mask.val, (vuint32m1_t)b.val, (vuint32m1_t)a.val, 4));
+}
+inline v_float64x2 v_select(const v_float64x2& mask, const v_float64x2& a, const v_float64x2& b)
+{
+ return v_float64x2((vfloat64m1_t)vmerge_vvm_u64m1((vbool64_t)mask.val, (vuint64m1_t)b.val, (vuint64m1_t)a.val, 2));
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_EXPAND(add, _Tpvec, _Tpwvec, _Tp, _Tp1, num1, _Tp2, num2, _T1, _T2) \
+inline void v_expand(const _Tpvec& a, v_##_Tpwvec& b0, v_##_Tpwvec& b1) \
+{ \
+ _T1##_t b = vw##add##_vv_##_Tp2##m2(a.val, vmv_v_x_##_Tp1(0, num1), num1); \
+ b0.val = vget_##_Tp2##m2_##_Tp2##m1(b, 0); \
+ b1.val = vget_##_Tp2##m2_##_Tp2##m1(b, 1); \
+} \
+inline v_##_Tpwvec v_expand_low(const _Tpvec& a) \
+{ \
+ _T1##_t b = vw##add##_vv_##_Tp2##m2(a.val, vmv_v_x_##_Tp1(0, num2), num2); \
+ return v_##_Tpwvec(vget_##_Tp2##m2_##_Tp2##m1(b, 0)); \
+} \
+inline v_##_Tpwvec v_expand_high(const _Tpvec& a) \
+{ \
+ _T1##_t b = vw##add##_vv_##_Tp2##m2(a.val, vmv_v_x_##_Tp1(0, num1), num1); \
+ return v_##_Tpwvec(vget_##_Tp2##m2_##_Tp2##m1(b, 1)); \
+} \
+inline v_##_Tpwvec v_load_expand(const _Tp* ptr) \
+{ \
+ _T2##_t val = vle##_v_##_Tp1(ptr, num2); \
+ _T1##_t b = vw##add##_vv_##_Tp2##m2(val, vmv_v_x_##_Tp1(0, num2), num2); \
+ return v_##_Tpwvec(vget_##_Tp2##m2_##_Tp2##m1(b, 0)); \
+}
+
+OPENCV_HAL_IMPL_RISCVV_EXPAND(addu, v_uint8x16, uint16x8, uchar, u8m1, 16, u16, 8, vuint16m2, vuint8m1)
+OPENCV_HAL_IMPL_RISCVV_EXPAND(addu, v_uint16x8, uint32x4, ushort, u16m1, 8, u32, 4, vuint32m2, vuint16m1)
+OPENCV_HAL_IMPL_RISCVV_EXPAND(addu, v_uint32x4, uint64x2, uint, u32m1, 4, u64, 2, vuint64m2, vuint32m1)
+OPENCV_HAL_IMPL_RISCVV_EXPAND(add, v_int8x16, int16x8, schar, i8m1, 16, i16, 8, vint16m2, vint8m1)
+OPENCV_HAL_IMPL_RISCVV_EXPAND(add, v_int16x8, int32x4, short, i16m1, 8, i32, 4, vint32m2, vint16m1)
+OPENCV_HAL_IMPL_RISCVV_EXPAND(add, v_int32x4, int64x2, int, i32m1, 4, i64, 2, vint64m2, vint32m1)
+
+inline v_uint32x4 v_load_expand_q(const uchar* ptr)
+{
+ vuint16m2_t b = vundefined_u16m2();
+ vuint32m2_t c = vundefined_u32m2();
+ vuint8m1_t val = vle_v_u8m1(ptr, 4); \
+ b = vwaddu_vv_u16m2(val, vmv_v_x_u8m1(0, 4), 4); \
+ c = vwaddu_vv_u32m2(vget_u16m2_u16m1(b, 0), vmv_v_x_u16m1(0, 4), 4); \
+ return v_uint32x4(vget_u32m2_u32m1(c, 0));
+}
+
+inline v_int32x4 v_load_expand_q(const schar* ptr)
+{
+ vint16m2_t b = vundefined_i16m2();
+ vint32m2_t c = vundefined_i32m2();
+ vint8m1_t val = vle_v_i8m1(ptr, 4); \
+ b = vwadd_vv_i16m2(val, vmv_v_x_i8m1(0, 4), 4); \
+ c = vwadd_vv_i32m2(vget_i16m2_i16m1(b, 0), vmv_v_x_i16m1(0, 4), 4); \
+ return v_int32x4(vget_i32m2_i32m1(c, 0));
+}
+#define VITL_16 (vuint64m2_t){0x1303120211011000, 0x1707160615051404, 0x1B0B1A0A19091808, 0x1F0F1E0E1D0D1C0C}
+#define VITL_8 (vuint64m2_t){0x0009000100080000, 0x000B0003000A0002, 0x000D0005000C0004, 0x000F0007000E0006}
+#define VITL_4 (vuint64m2_t){0x0000000400000000, 0x0000000500000001, 0x0000000600000002, 0x0000000700000003}
+#define VITL_2 (vuint64m2_t){0, 2, 1, 3}
+#define LOW_4 0x0000000100000000, 0x0000000500000004
+#define LOW_8 0x0003000200010000, 0x000B000A00090008
+#define LOW_16 0x0706050403020100, 0x1716151413121110
+#define HIGH_4 0x0000000300000002, 0x0000000700000006
+#define HIGH_8 0x0007000600050004, 0x000F000E000D000C
+#define HIGH_16 0x0F0E0D0C0B0A0908, 0x1F1E1D1C1B1A1918
+#define OPENCV_HAL_IMPL_RISCVV_UNPACKS(_Tpvec, _Tp, _T, _UTp, _UT, num, num2, len, numh) \
+inline void v_zip(const v_##_Tpvec& a0, const v_##_Tpvec& a1, v_##_Tpvec& b0, v_##_Tpvec& b1) \
+{ \
+ v##_Tp##m2_t tmp = vundefined_##_T##m2();\
+ tmp = vset_##_T##m2(tmp, 0, a0.val); \
+ tmp = vset_##_T##m2(tmp, 1, a1.val); \
+ vuint64m2_t mask = VITL_##num; \
+ tmp = (v##_Tp##m2_t)vrgather_vv_##_T##m2((v##_Tp##m2_t)tmp, (v##_UTp##m2_t)mask, num2); \
+ b0.val = vget_##_T##m2_##_T##m1(tmp, 0); \
+ b1.val = vget_##_T##m2_##_T##m1(tmp, 1); \
+} \
+inline v_##_Tpvec v_combine_low(const v_##_Tpvec& a, const v_##_Tpvec& b) \
+{ \
+ v##_Tp##m1_t b0 = vslideup_vx_##_T##m1_m(vmset_m_##len(num), a.val, b.val, numh, num); \
+ return v_##_Tpvec(b0);\
+} \
+inline v_##_Tpvec v_combine_high(const v_##_Tpvec& a, const v_##_Tpvec& b) \
+{ \
+ v##_Tp##m1_t b0 = vslidedown_vx_##_T##m1(b.val, numh, num); \
+ v##_Tp##m1_t a0 = vslidedown_vx_##_T##m1(a.val, numh, num); \
+ v##_Tp##m1_t b1 = vslideup_vx_##_T##m1_m(vmset_m_##len(num), a0, b0, numh, num); \
+ return v_##_Tpvec(b1);\
+} \
+inline void v_recombine(const v_##_Tpvec& a, const v_##_Tpvec& b, v_##_Tpvec& c, v_##_Tpvec& d) \
+{ \
+ c.val = vslideup_vx_##_T##m1_m(vmset_m_##len(num), a.val, b.val, numh, num); \
+ v##_Tp##m1_t b0 = vslidedown_vx_##_T##m1(b.val, numh, num); \
+ v##_Tp##m1_t a0 = vslidedown_vx_##_T##m1(a.val, numh, num); \
+ d.val = vslideup_vx_##_T##m1_m(vmset_m_##len(num), a0, b0, numh, num); \
+}
+
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(uint8x16, uint8, u8, uint8, u8, 16, 32, b8, 8)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(int8x16, int8, i8, uint8, u8, 16, 32, b8, 8)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(uint16x8, uint16, u16, uint16, u16, 8, 16, b16, 4)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(int16x8, int16, i16, uint16, u16, 8, 16, b16, 4)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(uint32x4, uint32, u32, uint32, u32, 4, 8, b32, 2)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(int32x4, int32, i32, uint32, u32, 4, 8, b32, 2)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(float32x4, float32, f32, uint32, u32, 4, 8, b32, 2)
+OPENCV_HAL_IMPL_RISCVV_UNPACKS(float64x2, float64, f64, uint64, u64, 2, 4, b64, 1)
+
+inline v_uint8x16 v_reverse(const v_uint8x16 &a)
+{
+ vuint64m1_t mask = (vuint64m1_t){0x08090A0B0C0D0E0F, 0x0001020304050607};
+ return v_uint8x16(vrgather_vv_u8m1(a.val, (vuint8m1_t)mask, 16));
+}
+inline v_int8x16 v_reverse(const v_int8x16 &a)
+{
+ vint64m1_t mask = (vint64m1_t){0x08090A0B0C0D0E0F, 0x0001020304050607};
+ return v_int8x16(vrgather_vv_i8m1(a.val, (vuint8m1_t)mask, 16));
+}
+
+inline v_uint16x8 v_reverse(const v_uint16x8 &a)
+{
+ vuint64m1_t mask = (vuint64m1_t){0x0004000500060007, 0x000000100020003};
+ return v_uint16x8(vrgather_vv_u16m1(a.val, (vuint16m1_t)mask, 8));
+}
+
+inline v_int16x8 v_reverse(const v_int16x8 &a)
+{
+ vint64m1_t mask = (vint64m1_t){0x0004000500060007, 0x000000100020003};
+ return v_int16x8(vrgather_vv_i16m1(a.val, (vuint16m1_t)mask, 8));
+}
+inline v_uint32x4 v_reverse(const v_uint32x4 &a)
+{
+ return v_uint32x4(vrgather_vv_u32m1(a.val, (vuint32m1_t){3, 2, 1, 0}, 4));
+}
+
+inline v_int32x4 v_reverse(const v_int32x4 &a)
+{
+ return v_int32x4(vrgather_vv_i32m1(a.val, (vuint32m1_t){3, 2, 1, 0}, 4));
+}
+
+inline v_float32x4 v_reverse(const v_float32x4 &a)
+{ return v_reinterpret_as_f32(v_reverse(v_reinterpret_as_u32(a))); }
+
+inline v_uint64x2 v_reverse(const v_uint64x2 &a)
+{
+ return v_uint64x2(a.val[1], a.val[0]);
+}
+
+inline v_int64x2 v_reverse(const v_int64x2 &a)
+{
+ return v_int64x2(a.val[1], a.val[0]);
+}
+
+inline v_float64x2 v_reverse(const v_float64x2 &a)
+{
+ return v_float64x2(a.val[1], a.val[0]);
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_EXTRACT(_Tpvec, suffix, size) \
+template \
+inline _Tpvec v_extract(const _Tpvec& a, const _Tpvec& b) \
+{ return v_rotate_right(a, b);}
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_uint8x16, u8, 0)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_int8x16, s8, 0)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_uint16x8, u16, 1)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_int16x8, s16, 1)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_uint32x4, u32, 2)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_int32x4, s32, 2)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_uint64x2, u64, 3)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_int64x2, s64, 3)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_float32x4, f32, 2)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT(v_float64x2, f64, 3)
+
+
+#define OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(_Tpvec, _Tp, suffix) \
+template inline _Tp v_extract_n(_Tpvec v) { return v.val[i]; }
+
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_uint8x16, uchar, u8)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_int8x16, schar, s8)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_uint16x8, ushort, u16)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_int16x8, short, s16)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_uint32x4, uint, u32)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_int32x4, int, s32)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_uint64x2, uint64, u64)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_int64x2, int64, s64)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_float32x4, float, f32)
+OPENCV_HAL_IMPL_RISCVV_EXTRACT_N(v_float64x2, double, f64)
+
+#define OPENCV_HAL_IMPL_RISCVV_BROADCAST(_Tpvec, _Tp, num) \
+template inline _Tpvec v_broadcast_element(_Tpvec v) { return _Tpvec(vrgather_vx_##_Tp##m1(v.val, i, num)); }
+
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_uint8x16, u8, 16)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_int8x16, i8, 16)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_uint16x8, u16, 8)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_int16x8, i16, 8)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_uint32x4, u32, 4)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_int32x4, i32, 4)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_uint64x2, u64, 2)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_int64x2, i64, 2)
+OPENCV_HAL_IMPL_RISCVV_BROADCAST(v_float32x4, f32, 4)
+inline v_int32x4 v_round(const v_float32x4& a)
+{
+ __builtin_riscv_fsrm(0);
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)a.val, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), a.val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+inline v_int32x4 v_floor(const v_float32x4& a)
+{
+ __builtin_riscv_fsrm(2);
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)a.val, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), a.val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+inline v_int32x4 v_ceil(const v_float32x4& a)
+{
+ __builtin_riscv_fsrm(3);
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)a.val, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), a.val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+inline v_int32x4 v_trunc(const v_float32x4& a)
+{
+ __builtin_riscv_fsrm(1);
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)a.val, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), a.val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+inline v_int32x4 v_round(const v_float64x2& a)
+{
+ __builtin_riscv_fsrm(0);
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ //_val = vset_f64m2(_val, 1, a.val);
+ _val = vset_f64m2(_val, 1, vfmv_v_f_f64m1(0, 2));
+ vint32m1_t val = vfncvt_x_f_v_i32m1(_val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+inline v_int32x4 v_round(const v_float64x2& a, const v_float64x2& b)
+{
+ __builtin_riscv_fsrm(0);
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ _val = vset_f64m2(_val, 1, b.val);
+ vint32m1_t val = vfncvt_x_f_v_i32m1(_val, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+inline v_int32x4 v_floor(const v_float64x2& a)
+{
+ __builtin_riscv_fsrm(2);
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ vfloat32m1_t aval = vfncvt_f_f_v_f32m1(_val, 2);
+
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)aval, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), aval, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+inline v_int32x4 v_ceil(const v_float64x2& a)
+{
+ __builtin_riscv_fsrm(3);
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ vfloat32m1_t aval = vfncvt_f_f_v_f32m1(_val, 2);
+
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)aval, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), aval, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+inline v_int32x4 v_trunc(const v_float64x2& a)
+{
+ __builtin_riscv_fsrm(1);
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ vfloat32m1_t aval = vfncvt_f_f_v_f32m1(_val, 2);
+
+ vint32m1_t nan = vand_vx_i32m1((vint32m1_t)aval, 0x7f800000, 4);
+ vbool32_t mask = vmsne_vx_i32m1_b32(nan, 0x7f800000, 4);
+ vint32m1_t val = vfcvt_x_f_v_i32m1_m(mask, vmv_v_x_i32m1(0, 4), aval, 4);
+ __builtin_riscv_fsrm(0);
+ return v_int32x4(val);
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_LOAD_DEINTERLEAVED(intrin, _Tpvec, num, _Tp, _T) \
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b) \
+{ \
+ v##_Tpvec##m1x2_t ret = intrin##2e_v_##_T##m1x2(ptr, num);\
+ a.val = vget_##_T##m1x2_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x2_##_T##m1(ret, 1); \
+} \
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b, v_##_Tpvec##x##num& c) \
+{ \
+ v##_Tpvec##m1x3_t ret = intrin##3e_v_##_T##m1x3(ptr, num);\
+ a.val = vget_##_T##m1x3_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x3_##_T##m1(ret, 1); \
+ c.val = vget_##_T##m1x3_##_T##m1(ret, 2); \
+}\
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b, \
+ v_##_Tpvec##x##num& c, v_##_Tpvec##x##num& d) \
+{ \
+ v##_Tpvec##m1x4_t ret = intrin##4e_v_##_T##m1x4(ptr, num);\
+ a.val = vget_##_T##m1x4_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x4_##_T##m1(ret, 1); \
+ c.val = vget_##_T##m1x4_##_T##m1(ret, 2); \
+ d.val = vget_##_T##m1x4_##_T##m1(ret, 3); \
+} \
+
+#define OPENCV_HAL_IMPL_RISCVV_STORE_INTERLEAVED(intrin, _Tpvec, num, _Tp, _T) \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ hal::StoreMode /*mode*/=hal::STORE_UNALIGNED) \
+{ \
+ v##_Tpvec##m1x2_t ret = vundefined_##_T##m1x2(); \
+ ret = vset_##_T##m1x2(ret, 0, a.val); \
+ ret = vset_##_T##m1x2(ret, 1, b.val); \
+ intrin##2e_v_##_T##m1x2(ptr, ret, num); \
+} \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ const v_##_Tpvec##x##num& c, hal::StoreMode /*mode*/=hal::STORE_UNALIGNED) \
+{ \
+ v##_Tpvec##m1x3_t ret = vundefined_##_T##m1x3(); \
+ ret = vset_##_T##m1x3(ret, 0, a.val); \
+ ret = vset_##_T##m1x3(ret, 1, b.val); \
+ ret = vset_##_T##m1x3(ret, 2, c.val); \
+ intrin##3e_v_##_T##m1x3(ptr, ret, num); \
+} \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ const v_##_Tpvec##x##num& c, const v_##_Tpvec##x##num& d, \
+ hal::StoreMode /*mode*/=hal::STORE_UNALIGNED ) \
+{ \
+ v##_Tpvec##m1x4_t ret = vundefined_##_T##m1x4(); \
+ ret = vset_##_T##m1x4(ret, 0, a.val); \
+ ret = vset_##_T##m1x4(ret, 1, b.val); \
+ ret = vset_##_T##m1x4(ret, 2, c.val); \
+ ret = vset_##_T##m1x4(ret, 3, d.val); \
+ intrin##4e_v_##_T##m1x4(ptr, ret, num); \
+}
+
+#define OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(_Tpvec, _Tp, num, ld, st, _T) \
+OPENCV_HAL_IMPL_RISCVV_LOAD_DEINTERLEAVED(ld, _Tpvec, num, _Tp, _T) \
+OPENCV_HAL_IMPL_RISCVV_STORE_INTERLEAVED(st, _Tpvec, num, _Tp, _T)
+
+//OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(uint8, uchar, )
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(int8, schar, 16, vlseg, vsseg, i8)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(int16, short, 8, vlseg, vsseg, i16)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(int32, int, 4, vlseg, vsseg, i32)
+
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(uint8, unsigned char, 16, vlseg, vsseg, u8)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(uint16, unsigned short, 8, vlseg, vsseg, u16)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED(uint32, unsigned int, 4, vlseg, vsseg, u32)
+
+#define OPENCV_HAL_IMPL_RISCVV_INTERLEAVED_(_Tpvec, _Tp, num, _T) \
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b) \
+{ \
+ v##_Tpvec##m1x2_t ret = vlseg2e_v_##_T##m1x2(ptr, num); \
+ a.val = vget_##_T##m1x2_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x2_##_T##m1(ret, 1); \
+} \
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b, v_##_Tpvec##x##num& c) \
+{ \
+ v##_Tpvec##m1x3_t ret = vlseg3e_v_##_T##m1x3(ptr, num); \
+ a.val = vget_##_T##m1x3_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x3_##_T##m1(ret, 1); \
+ c.val = vget_##_T##m1x3_##_T##m1(ret, 2); \
+}\
+inline void v_load_deinterleave(const _Tp* ptr, v_##_Tpvec##x##num& a, v_##_Tpvec##x##num& b, \
+ v_##_Tpvec##x##num& c, v_##_Tpvec##x##num& d) \
+{ \
+ v##_Tpvec##m1x4_t ret = vlseg4e_v_##_T##m1x4(ptr, num); \
+ a.val = vget_##_T##m1x4_##_T##m1(ret, 0); \
+ b.val = vget_##_T##m1x4_##_T##m1(ret, 1); \
+ c.val = vget_##_T##m1x4_##_T##m1(ret, 2); \
+ d.val = vget_##_T##m1x4_##_T##m1(ret, 3); \
+} \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ hal::StoreMode /*mode*/=hal::STORE_UNALIGNED) \
+{ \
+ v##_Tpvec##m1x2_t ret = vundefined_##_T##m1x2(); \
+ ret = vset_##_T##m1x2(ret, 0, a.val); \
+ ret = vset_##_T##m1x2(ret, 1, b.val); \
+ vsseg2e_v_##_T##m1x2(ptr, ret, num); \
+} \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ const v_##_Tpvec##x##num& c, hal::StoreMode /*mode*/=hal::STORE_UNALIGNED) \
+{ \
+ v##_Tpvec##m1x3_t ret = vundefined_##_T##m1x3(); \
+ ret = vset_##_T##m1x3(ret, 0, a.val); \
+ ret = vset_##_T##m1x3(ret, 1, b.val); \
+ ret = vset_##_T##m1x3(ret, 2, c.val); \
+ vsseg3e_v_##_T##m1x3(ptr, ret, num); \
+} \
+inline void v_store_interleave( _Tp* ptr, const v_##_Tpvec##x##num& a, const v_##_Tpvec##x##num& b, \
+ const v_##_Tpvec##x##num& c, const v_##_Tpvec##x##num& d, \
+ hal::StoreMode /*mode*/=hal::STORE_UNALIGNED ) \
+{ \
+ v##_Tpvec##m1x4_t ret = vundefined_##_T##m1x4(); \
+ ret = vset_##_T##m1x4(ret, 0, a.val); \
+ ret = vset_##_T##m1x4(ret, 1, b.val); \
+ ret = vset_##_T##m1x4(ret, 2, c.val); \
+ ret = vset_##_T##m1x4(ret, 3, d.val); \
+ vsseg4e_v_##_T##m1x4(ptr, ret, num); \
+}
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED_(float32, float, 4, f32)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED_(float64, double, 2, f64)
+
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED_(uint64, unsigned long, 2, u64)
+OPENCV_HAL_IMPL_RISCVV_INTERLEAVED_(int64, long, 2, i64)
+
+inline v_float32x4 v_cvt_f32(const v_int32x4& a)
+{
+ return v_float32x4(vfcvt_f_x_v_f32m1(a.val, 4));
+}
+
+#if CV_SIMD128_64F
+inline v_float32x4 v_cvt_f32(const v_float64x2& a)
+{
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ vfloat32m1_t aval = vfncvt_f_f_v_f32m1(_val, 2);
+ return v_float32x4(aval);
+}
+
+inline v_float32x4 v_cvt_f32(const v_float64x2& a, const v_float64x2& b)
+{
+ vfloat64m2_t _val = vundefined_f64m2();
+ _val = vset_f64m2(_val, 0, a.val);
+ _val = vset_f64m2(_val, 1, b.val);
+ vfloat32m1_t aval = vfncvt_f_f_v_f32m1(_val, 4);
+ return v_float32x4(aval);
+}
+
+inline v_float64x2 v_cvt_f64(const v_int32x4& a)
+{
+ vfloat32m1_t val = vfcvt_f_x_v_f32m1(a.val, 4);
+ vfloat64m2_t _val = vfwcvt_f_f_v_f64m2(val, 4);
+ return v_float64x2(vget_f64m2_f64m1(_val, 0));
+}
+
+inline v_float64x2 v_cvt_f64_high(const v_int32x4& a)
+{
+ vfloat32m1_t val = vfcvt_f_x_v_f32m1(a.val, 4);
+ vfloat64m2_t _val = vfwcvt_f_f_v_f64m2(val, 4);
+ return v_float64x2(vget_f64m2_f64m1(_val, 1));
+}
+
+inline v_float64x2 v_cvt_f64(const v_float32x4& a)
+{
+ vfloat64m2_t _val = vfwcvt_f_f_v_f64m2(a.val, 4);
+ return v_float64x2(vget_f64m2_f64m1(_val, 0));
+}
+
+inline v_float64x2 v_cvt_f64_high(const v_float32x4& a)
+{
+ vfloat64m2_t _val = vfwcvt_f_f_v_f64m2(a.val, 4);
+ return v_float64x2(vget_f64m2_f64m1(_val, 1));
+}
+
+inline v_float64x2 v_cvt_f64(const v_int64x2& a)
+{
+ return v_float64x2(vfcvt_f_x_v_f64m1(a.val, 2));
+}
+
+#endif
+inline v_int8x16 v_interleave_pairs(const v_int8x16& vec)
+{
+ vuint64m1_t m0 = {0x0705060403010200, 0x0F0D0E0C0B090A08};
+ return v_int8x16(vrgather_vv_i8m1(vec.val, (vuint8m1_t)m0, 16));
+}
+inline v_uint8x16 v_interleave_pairs(const v_uint8x16& vec)
+{
+ return v_reinterpret_as_u8(v_interleave_pairs(v_reinterpret_as_s8(vec)));
+}
+
+inline v_int8x16 v_interleave_quads(const v_int8x16& vec)
+{
+ vuint64m1_t m0 = {0x0703060205010400, 0x0F0B0E0A0D090C08};
+ return v_int8x16(vrgather_vv_i8m1(vec.val, (vuint8m1_t)m0, 16));
+}
+inline v_uint8x16 v_interleave_quads(const v_uint8x16& vec)
+{
+ return v_reinterpret_as_u8(v_interleave_quads(v_reinterpret_as_s8(vec)));
+}
+
+inline v_int16x8 v_interleave_pairs(const v_int16x8& vec)
+{
+ vuint64m1_t m0 = {0x0706030205040100, 0x0F0E0B0A0D0C0908};
+ return v_int16x8((vint16m1_t)vrgather_vv_u8m1((vuint8m1_t)vec.val, (vuint8m1_t)m0, 16));
+}
+inline v_uint16x8 v_interleave_pairs(const v_uint16x8& vec) { return v_reinterpret_as_u16(v_interleave_pairs(v_reinterpret_as_s16(vec))); }
+inline v_int16x8 v_interleave_quads(const v_int16x8& vec)
+{
+ vuint64m1_t m0 = {0x0B0A030209080100, 0x0F0E07060D0C0504};
+ return v_int16x8((vint16m1_t)vrgather_vv_u8m1((vuint8m1_t)(vec.val), (vuint8m1_t)m0, 16));
+}
+inline v_uint16x8 v_interleave_quads(const v_uint16x8& vec) { return v_reinterpret_as_u16(v_interleave_quads(v_reinterpret_as_s16(vec))); }
+
+inline v_int32x4 v_interleave_pairs(const v_int32x4& vec)
+{
+ vuint64m1_t m0 = {0x0B0A090803020100, 0x0F0E0D0C07060504};
+ return v_int32x4((vint32m1_t)vrgather_vv_u8m1((vuint8m1_t)(vec.val), (vuint8m1_t)m0, 16));
+}
+inline v_uint32x4 v_interleave_pairs(const v_uint32x4& vec) { return v_reinterpret_as_u32(v_interleave_pairs(v_reinterpret_as_s32(vec))); }
+inline v_float32x4 v_interleave_pairs(const v_float32x4& vec) { return v_reinterpret_as_f32(v_interleave_pairs(v_reinterpret_as_s32(vec))); }
+inline v_int8x16 v_pack_triplets(const v_int8x16& vec)
+{
+ vuint64m1_t m0 = {0x0908060504020100, 0xFFFFFFFF0E0D0C0A};
+ return v_int8x16((vint8m1_t)vrgather_vv_u8m1((vuint8m1_t)(vec.val), (vuint8m1_t)m0, 16));
+}
+inline v_uint8x16 v_pack_triplets(const v_uint8x16& vec) { return v_reinterpret_as_u8(v_pack_triplets(v_reinterpret_as_s8(vec))); }
+
+inline v_int16x8 v_pack_triplets(const v_int16x8& vec)
+{
+ vuint64m1_t m0 = {0x0908050403020100, 0xFFFFFFFF0D0C0B0A};
+ return v_int16x8((vint16m1_t)vrgather_vv_u8m1((vuint8m1_t)(vec.val), (vuint8m1_t)m0, 16));
+}
+inline v_uint16x8 v_pack_triplets(const v_uint16x8& vec) { return v_reinterpret_as_u16(v_pack_triplets(v_reinterpret_as_s16(vec))); }
+
+inline v_int32x4 v_pack_triplets(const v_int32x4& vec) { return vec; }
+inline v_uint32x4 v_pack_triplets(const v_uint32x4& vec) { return vec; }
+inline v_float32x4 v_pack_triplets(const v_float32x4& vec) { return vec; }
+
+#if CV_SIMD128_64F
+inline v_float64x2 v_dotprod_expand(const v_int32x4& a, const v_int32x4& b)
+{ return v_cvt_f64(v_dotprod(a, b)); }
+inline v_float64x2 v_dotprod_expand(const v_int32x4& a, const v_int32x4& b,
+ const v_float64x2& c)
+{ return v_dotprod_expand(a, b) + c; }
+inline v_float64x2 v_dotprod_expand_fast(const v_int32x4& a, const v_int32x4& b)
+{
+ vint64m2_t v1 = vwmul_vv_i64m2(a.val, b.val, 4);
+ vfloat64m1_t res = vfcvt_f_x_v_f64m1(vadd_vv_i64m1(vget_i64m2_i64m1(v1, 0), vget_i64m2_i64m1(v1, 1), 2), 2);
+ return v_float64x2(res);
+}
+inline v_float64x2 v_dotprod_expand_fast(const v_int32x4& a, const v_int32x4& b, const v_float64x2& c)
+{ v_float64x2 res = v_dotprod_expand_fast(a, b);
+ return res + c; }
+#endif
+////// FP16 support ///////
+inline v_float32x4 v_load_expand(const float16_t* ptr)
+{
+ vfloat16m1_t v = vle_v_f16m1((__fp16*)ptr, 4);
+ vfloat32m2_t v32 = vfwcvt_f_f_v_f32m2(v, 4);
+ return v_float32x4(vget_f32m2_f32m1(v32, 0));
+}
+
+inline void v_pack_store(float16_t* ptr, const v_float32x4& v)
+{
+ vfloat32m2_t v32 = vundefined_f32m2();
+ v32 = vset_f32m2(v32, 0, v.val);
+ vfloat16m1_t hv = vfncvt_f_f_v_f16m1(v32, 4);
+ vse_v_f16m1((__fp16*)ptr, hv, 4);
+}
+
+
+inline void v_cleanup() {}
+
+CV_CPU_OPTIMIZATION_HAL_NAMESPACE_END
+
+//! @endcond
+
+}
+#endif
diff --git a/modules/core/include/opencv2/core/mat.hpp b/modules/core/include/opencv2/core/mat.hpp
index 84df297bf9..eeb83c0744 100644
--- a/modules/core/include/opencv2/core/mat.hpp
+++ b/modules/core/include/opencv2/core/mat.hpp
@@ -2011,6 +2011,11 @@ public:
template MatIterator_<_Tp> begin();
template MatConstIterator_<_Tp> begin() const;
+ /** @brief Same as begin() but for inverse traversal
+ */
+ template std::reverse_iterator> rbegin();
+ template std::reverse_iterator> rbegin() const;
+
/** @brief Returns the matrix iterator and sets it to the after-last matrix element.
The methods return the matrix read-only or read-write iterators, set to the point following the last
@@ -2019,6 +2024,12 @@ public:
template MatIterator_<_Tp> end();
template MatConstIterator_<_Tp> end() const;
+ /** @brief Same as end() but for inverse traversal
+ */
+ template std::reverse_iterator< MatIterator_<_Tp>> rend();
+ template std::reverse_iterator< MatConstIterator_<_Tp>> rend() const;
+
+
/** @brief Runs the given functor over all matrix elements in parallel.
The operation passed as argument has to be a function pointer, a function object or a lambda(C++11).
@@ -2250,6 +2261,12 @@ public:
const_iterator begin() const;
const_iterator end() const;
+ //reverse iterators
+ std::reverse_iterator rbegin();
+ std::reverse_iterator rend();
+ std::reverse_iterator rbegin() const;
+ std::reverse_iterator rend() const;
+
//! template methods for for operation over all matrix elements.
// the operations take care of skipping gaps in the end of rows (if any)
template void forEach(const Functor& operation);
diff --git a/modules/core/include/opencv2/core/mat.inl.hpp b/modules/core/include/opencv2/core/mat.inl.hpp
index ff8297ffa4..886b82c6a0 100644
--- a/modules/core/include/opencv2/core/mat.inl.hpp
+++ b/modules/core/include/opencv2/core/mat.inl.hpp
@@ -863,6 +863,33 @@ const _Tp* Mat::ptr(const int* idx) const
return (const _Tp*)p;
}
+template inline
+uchar* Mat::ptr(const Vec& idx)
+{
+ return Mat::ptr(idx.val);
+}
+
+template inline
+const uchar* Mat::ptr(const Vec& idx) const
+{
+ return Mat::ptr(idx.val);
+}
+
+template inline
+_Tp* Mat::ptr(const Vec& idx)
+{
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ return Mat::ptr<_Tp>(idx.val);
+}
+
+template inline
+const _Tp* Mat::ptr(const Vec& idx) const
+{
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ return Mat::ptr<_Tp>(idx.val);
+}
+
+
template inline
_Tp& Mat::at(int i0, int i1)
{
@@ -988,6 +1015,17 @@ MatConstIterator_<_Tp> Mat::begin() const
return MatConstIterator_<_Tp>((const Mat_<_Tp>*)this);
}
+template inline
+std::reverse_iterator> Mat::rbegin() const
+{
+ if (empty())
+ return std::reverse_iterator>();
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ MatConstIterator_<_Tp> it((const Mat_<_Tp>*)this);
+ it += total();
+ return std::reverse_iterator> (it);
+}
+
template inline
MatConstIterator_<_Tp> Mat::end() const
{
@@ -999,6 +1037,15 @@ MatConstIterator_<_Tp> Mat::end() const
return it;
}
+template inline
+std::reverse_iterator> Mat::rend() const
+{
+ if (empty())
+ return std::reverse_iterator>();
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ return std::reverse_iterator>((const Mat_<_Tp>*)this);
+}
+
template inline
MatIterator_<_Tp> Mat::begin()
{
@@ -1008,6 +1055,17 @@ MatIterator_<_Tp> Mat::begin()
return MatIterator_<_Tp>((Mat_<_Tp>*)this);
}
+template inline
+std::reverse_iterator> Mat::rbegin()
+{
+ if (empty())
+ return std::reverse_iterator>();
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ MatIterator_<_Tp> it((Mat_<_Tp>*)this);
+ it += total();
+ return std::reverse_iterator>(it);
+}
+
template inline
MatIterator_<_Tp> Mat::end()
{
@@ -1019,6 +1077,15 @@ MatIterator_<_Tp> Mat::end()
return it;
}
+template inline
+std::reverse_iterator> Mat::rend()
+{
+ if (empty())
+ return std::reverse_iterator>();
+ CV_DbgAssert( elemSize() == sizeof(_Tp) );
+ return std::reverse_iterator>(MatIterator_<_Tp>((Mat_<_Tp>*)this));
+}
+
template inline
void Mat::forEach(const Functor& operation) {
this->forEach_impl<_Tp>(operation);
@@ -1686,24 +1753,48 @@ MatConstIterator_<_Tp> Mat_<_Tp>::begin() const
return Mat::begin<_Tp>();
}
+template inline
+std::reverse_iterator> Mat_<_Tp>::rbegin() const
+{
+ return Mat::rbegin<_Tp>();
+}
+
template inline
MatConstIterator_<_Tp> Mat_<_Tp>::end() const
{
return Mat::end<_Tp>();
}
+template inline
+std::reverse_iterator> Mat_<_Tp>::rend() const
+{
+ return Mat::rend<_Tp>();
+}
+
template inline
MatIterator_<_Tp> Mat_<_Tp>::begin()
{
return Mat::begin<_Tp>();
}
+template inline
+std::reverse_iterator> Mat_<_Tp>::rbegin()
+{
+ return Mat::rbegin<_Tp>();
+}
+
template inline
MatIterator_<_Tp> Mat_<_Tp>::end()
{
return Mat::end<_Tp>();
}
+template inline
+std::reverse_iterator> Mat_<_Tp>::rend()
+{
+ return Mat::rend<_Tp>();
+}
+
template template inline
void Mat_<_Tp>::forEach(const Functor& operation) {
Mat::forEach<_Tp, Functor>(operation);
diff --git a/modules/core/include/opencv2/core/ocl.hpp b/modules/core/include/opencv2/core/ocl.hpp
index 3a76be2353..f9cc9e019a 100644
--- a/modules/core/include/opencv2/core/ocl.hpp
+++ b/modules/core/include/opencv2/core/ocl.hpp
@@ -43,6 +43,8 @@
#define OPENCV_OPENCL_HPP
#include "opencv2/core.hpp"
+#include
+#include
namespace cv { namespace ocl {
@@ -277,6 +279,12 @@ public:
/** @returns cl_context value */
void* ptr() const;
+ /**
+ * @brief Get OpenCL context property specified on context creation
+ * @param propertyId Property id (CL_CONTEXT_* as defined in cl_context_properties type)
+ * @returns Property value if property was specified on clCreateContext, or NULL if context created without the property
+ */
+ void* getOpenCLContextProperty(int propertyId) const;
bool useSVM() const;
void setUseSVM(bool enabled);
@@ -290,6 +298,21 @@ public:
void release();
+ class CV_EXPORTS UserContext {
+ public:
+ virtual ~UserContext();
+ };
+ template
+ inline void setUserContext(const std::shared_ptr& userContext) {
+ setUserContext(typeid(T), userContext);
+ }
+ template
+ inline std::shared_ptr getUserContext() {
+ return std::dynamic_pointer_cast(getUserContext(typeid(T)));
+ }
+ void setUserContext(std::type_index typeId, const std::shared_ptr& userContext);
+ std::shared_ptr getUserContext(std::type_index typeId);
+
struct Impl;
inline Impl* getImpl() const { return (Impl*)p; }
inline bool empty() const { return !p; }
diff --git a/modules/core/include/opencv2/core/types.hpp b/modules/core/include/opencv2/core/types.hpp
index 819fd52817..3f0131da8c 100644
--- a/modules/core/include/opencv2/core/types.hpp
+++ b/modules/core/include/opencv2/core/types.hpp
@@ -714,24 +714,24 @@ public:
//! the default constructor
CV_WRAP KeyPoint();
/**
- @param _pt x & y coordinates of the keypoint
- @param _size keypoint diameter
- @param _angle keypoint orientation
- @param _response keypoint detector response on the keypoint (that is, strength of the keypoint)
- @param _octave pyramid octave in which the keypoint has been detected
- @param _class_id object id
+ @param pt x & y coordinates of the keypoint
+ @param size keypoint diameter
+ @param angle keypoint orientation
+ @param response keypoint detector response on the keypoint (that is, strength of the keypoint)
+ @param octave pyramid octave in which the keypoint has been detected
+ @param class_id object id
*/
- KeyPoint(Point2f _pt, float _size, float _angle=-1, float _response=0, int _octave=0, int _class_id=-1);
+ KeyPoint(Point2f pt, float size, float angle=-1, float response=0, int octave=0, int class_id=-1);
/**
@param x x-coordinate of the keypoint
@param y y-coordinate of the keypoint
- @param _size keypoint diameter
- @param _angle keypoint orientation
- @param _response keypoint detector response on the keypoint (that is, strength of the keypoint)
- @param _octave pyramid octave in which the keypoint has been detected
- @param _class_id object id
+ @param size keypoint diameter
+ @param angle keypoint orientation
+ @param response keypoint detector response on the keypoint (that is, strength of the keypoint)
+ @param octave pyramid octave in which the keypoint has been detected
+ @param class_id object id
*/
- CV_WRAP KeyPoint(float x, float y, float _size, float _angle=-1, float _response=0, int _octave=0, int _class_id=-1);
+ CV_WRAP KeyPoint(float x, float y, float size, float angle=-1, float response=0, int octave=0, int class_id=-1);
size_t hash() const;
diff --git a/modules/core/include/opencv2/core/utils/plugin_loader.private.hpp b/modules/core/include/opencv2/core/utils/plugin_loader.private.hpp
index bc3ae4d08a..d6390fc74a 100644
--- a/modules/core/include/opencv2/core/utils/plugin_loader.private.hpp
+++ b/modules/core/include/opencv2/core/utils/plugin_loader.private.hpp
@@ -80,7 +80,9 @@ LibHandle_t libraryLoad_(const FileSystemPath_t& filename)
return LoadLibraryW(filename.c_str());
#endif
#elif defined(__linux__) || defined(__APPLE__) || defined(__OpenBSD__) || defined(__FreeBSD__) || defined(__HAIKU__) || defined(__GLIBC__)
- return dlopen(filename.c_str(), RTLD_NOW);
+ void* handle = dlopen(filename.c_str(), RTLD_NOW);
+ CV_LOG_IF_DEBUG(NULL, !handle, "dlopen() error: " << dlerror());
+ return handle;
#endif
}
diff --git a/modules/core/misc/objc/common/Point2i.h b/modules/core/misc/objc/common/Point2i.h
index e43ee3a8ec..802c99d613 100644
--- a/modules/core/misc/objc/common/Point2i.h
+++ b/modules/core/misc/objc/common/Point2i.h
@@ -21,7 +21,6 @@ NS_ASSUME_NONNULL_BEGIN
/**
* Represents a two dimensional point the coordinate values of which are of type `int`
*/
-NS_SWIFT_NAME(Point)
CV_EXPORTS @interface Point2i : NSObject
# pragma mark - Properties
diff --git a/modules/core/misc/objc/common/Rect2i.h b/modules/core/misc/objc/common/Rect2i.h
index 6ed86d50bd..a9c1c6e04a 100644
--- a/modules/core/misc/objc/common/Rect2i.h
+++ b/modules/core/misc/objc/common/Rect2i.h
@@ -22,7 +22,6 @@ NS_ASSUME_NONNULL_BEGIN
/**
* Represents a rectange the coordinate and dimension values of which are of type `int`
*/
-NS_SWIFT_NAME(Rect)
CV_EXPORTS @interface Rect2i : NSObject
#pragma mark - Properties
diff --git a/modules/core/misc/objc/common/Size2i.h b/modules/core/misc/objc/common/Size2i.h
index cd74e2c84a..473efa3b57 100644
--- a/modules/core/misc/objc/common/Size2i.h
+++ b/modules/core/misc/objc/common/Size2i.h
@@ -21,7 +21,6 @@ NS_ASSUME_NONNULL_BEGIN
/**
* Represents the dimensions of a rectangle the values of which are of type `int`
*/
-NS_SWIFT_NAME(Size)
CV_EXPORTS @interface Size2i : NSObject
#pragma mark - Properties
diff --git a/modules/core/misc/objc/common/Typealiases.swift b/modules/core/misc/objc/common/Typealiases.swift
new file mode 100644
index 0000000000..534dc492fb
--- /dev/null
+++ b/modules/core/misc/objc/common/Typealiases.swift
@@ -0,0 +1,11 @@
+//
+// Typealiases.swift
+//
+// Created by Chris Ballinger on 2020/11/18.
+//
+
+import Foundation
+
+public typealias Rect = Rect2i
+public typealias Point = Point2i
+public typealias Size = Size2i
diff --git a/modules/core/misc/objc/gen_dict.json b/modules/core/misc/objc/gen_dict.json
index c2ee554eba..a645df19f5 100644
--- a/modules/core/misc/objc/gen_dict.json
+++ b/modules/core/misc/objc/gen_dict.json
@@ -113,13 +113,13 @@
"objc_type": "Point2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Point2i fromNative:%(n)s]",
- "swift_type": "Point"
+ "swift_type": "Point2i"
},
"Point2i": {
"objc_type": "Point2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Point2i fromNative:%(n)s]",
- "swift_type": "Point"
+ "swift_type": "Point2i"
},
"Point2f": {
"objc_type": "Point2f*",
@@ -155,13 +155,13 @@
"objc_type": "Rect2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Rect2i fromNative:%(n)s]",
- "swift_type": "Rect"
+ "swift_type": "Rect2i"
},
"Rect2i": {
"objc_type": "Rect2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Rect2i fromNative:%(n)s]",
- "swift_type": "Rect"
+ "swift_type": "Rect2i"
},
"Rect2f": {
"objc_type": "Rect2f*",
@@ -187,13 +187,13 @@
"objc_type": "Size2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Size2i fromNative:%(n)s]",
- "swift_type": "Size"
+ "swift_type": "Size2i"
},
"Size2i": {
"objc_type": "Size2i*",
"to_cpp": "%(n)s.nativeRef",
"from_cpp": "[Size2i fromNative:%(n)s]",
- "swift_type": "Size"
+ "swift_type": "Size2i"
},
"Size2f": {
"objc_type": "Size2f*",
@@ -275,7 +275,7 @@
"vector_Point": {
"objc_type": "Point2i*",
"v_type": "Point2i",
- "swift_type": "[Point]"
+ "swift_type": "[Point2i]"
},
"vector_Point2f": {
"objc_type": "Point2f*",
@@ -300,7 +300,7 @@
"vector_Rect": {
"objc_type": "Rect2i*",
"v_type": "Rect2i",
- "swift_type": "[Rect]"
+ "swift_type": "[Rect2i]"
},
"vector_Rect2d": {
"objc_type": "Rect2d*",
@@ -388,7 +388,7 @@
"vector_vector_Point": {
"objc_type": "Point2i*",
"v_v_type": "Point2i",
- "swift_type": "[[Point]]"
+ "swift_type": "[[Point2i]]"
},
"vector_vector_Point2f": {
"objc_type": "Point2f*",
diff --git a/modules/core/src/arithm.simd.hpp b/modules/core/src/arithm.simd.hpp
index 0cddc90998..f88597aacc 100644
--- a/modules/core/src/arithm.simd.hpp
+++ b/modules/core/src/arithm.simd.hpp
@@ -1910,4 +1910,4 @@ DEFINE_SIMD_ALL(recip, recip_loop)
#define SIMD_GUARD
#endif
-}} // cv::hal::
\ No newline at end of file
+}} // cv::hal::
diff --git a/modules/core/src/directx.cpp b/modules/core/src/directx.cpp
index 0173f02916..d17adc6b48 100644
--- a/modules/core/src/directx.cpp
+++ b/modules/core/src/directx.cpp
@@ -49,7 +49,6 @@
#ifdef HAVE_DIRECTX
#include
#include "directx.inc.hpp"
-#include "directx.hpp"
#else // HAVE_DIRECTX
#define NO_DIRECTX_SUPPORT_ERROR CV_Error(cv::Error::StsBadFunc, "OpenCV was build without DirectX support")
#endif
@@ -58,6 +57,8 @@
#define NO_OPENCL_SUPPORT_ERROR CV_Error(cv::Error::StsBadFunc, "OpenCV was build without OpenCL support")
#endif // HAVE_OPENCL
+using namespace cv::ocl;
+
namespace cv { namespace directx {
int getTypeFromDXGI_FORMAT(const int iDXGI_FORMAT)
@@ -236,187 +237,121 @@ int getTypeFromD3DFORMAT(const int iD3DFORMAT)
}
#if defined(HAVE_DIRECTX) && defined(HAVE_OPENCL)
-namespace internal {
-struct OpenCLDirectXImpl
+
+#ifdef HAVE_OPENCL_D3D11_NV
+class OpenCL_D3D11_NV : public ocl::Context::UserContext
{
- cl_platform_id platform_;
-
- cl_platform_id initializedPlatform9 = NULL;
- cl_platform_id initializedPlatform10 = NULL;
- cl_platform_id initializedPlatform11 = NULL;
public:
- OpenCLDirectXImpl()
- : platform_(0)
+ OpenCL_D3D11_NV(cl_platform_id platform, ID3D11Device*_device) : device(_device)
{
- }
-
- bool isDirect3DDevice9Ex = false; // Direct3DDevice9Ex or Direct3DDevice9 was used
-
-#ifdef HAVE_OPENCL_D3D11_NV
- clCreateFromD3D11Texture2DNV_fn clCreateFromD3D11Texture2DNV = NULL;
- clEnqueueAcquireD3D11ObjectsNV_fn clEnqueueAcquireD3D11ObjectsNV = NULL;
- clEnqueueReleaseD3D11ObjectsNV_fn clEnqueueReleaseD3D11ObjectsNV = NULL;
-#endif
- clCreateFromD3D11Texture2DKHR_fn clCreateFromD3D11Texture2DKHR = NULL;
- clEnqueueAcquireD3D11ObjectsKHR_fn clEnqueueAcquireD3D11ObjectsKHR = NULL;
- clEnqueueReleaseD3D11ObjectsKHR_fn clEnqueueReleaseD3D11ObjectsKHR = NULL;
-
- clCreateFromD3D10Texture2DKHR_fn clCreateFromD3D10Texture2DKHR = NULL;
- clEnqueueAcquireD3D10ObjectsKHR_fn clEnqueueAcquireD3D10ObjectsKHR = NULL;
- clEnqueueReleaseD3D10ObjectsKHR_fn clEnqueueReleaseD3D10ObjectsKHR = NULL;
-
- clCreateFromDX9MediaSurfaceKHR_fn clCreateFromDX9MediaSurfaceKHR = NULL;
- clEnqueueAcquireDX9MediaSurfacesKHR_fn clEnqueueAcquireDX9MediaSurfacesKHR = NULL;
- clEnqueueReleaseDX9MediaSurfacesKHR_fn clEnqueueReleaseDX9MediaSurfacesKHR = NULL;
-
- cl_platform_id getPlatform()
- {
- if (!platform_)
+ clCreateFromD3D11Texture2DNV = (clCreateFromD3D11Texture2DNV_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D11Texture2DNV");
+ clEnqueueAcquireD3D11ObjectsNV = (clEnqueueAcquireD3D11ObjectsNV_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D11ObjectsNV");
+ clEnqueueReleaseD3D11ObjectsNV = (clEnqueueReleaseD3D11ObjectsNV_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D11ObjectsNV");
+ if (!clCreateFromD3D11Texture2DNV || !clEnqueueAcquireD3D11ObjectsNV || !clEnqueueReleaseD3D11ObjectsNV)
{
- CV_Assert(cv::ocl::haveOpenCL());
-
- cl_device_id device = (cl_device_id)ocl::Device::getDefault().ptr();
- CV_Assert(device);
- cl_int status = clGetDeviceInfo(device, CL_DEVICE_PLATFORM, sizeof(platform_), &platform_, NULL);
- if (status != CL_SUCCESS)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get platform corresponding to device");
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't find functions for D3D11_NV");
}
-
- return platform_;
+ device->AddRef();
}
-
-
- bool initializeD3D11()
- {
- using namespace cv::ocl;
- cl_platform_id platform = getPlatform();
-
- bool useCLNVEXT = false;
- size_t exts_len;
- cl_int status = clGetPlatformInfo(platform, CL_PLATFORM_EXTENSIONS, 0, NULL, &exts_len);
- if (status != CL_SUCCESS)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get length of CL_PLATFORM_EXTENSIONS");
- cv::AutoBuffer extensions(exts_len);
- status = clGetPlatformInfo(platform, CL_PLATFORM_EXTENSIONS, exts_len, static_cast(extensions.data()), NULL);
- if (status != CL_SUCCESS)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No available CL_PLATFORM_EXTENSIONS");
- bool is_support_cl_khr_d3d11_sharing = false;
- if (strstr(extensions.data(), "cl_khr_d3d11_sharing"))
- is_support_cl_khr_d3d11_sharing = true;
-#ifdef HAVE_OPENCL_D3D11_NV
- bool is_support_cl_nv_d3d11_sharing = false;
- if (strstr(extensions.data(), "cl_nv_d3d11_sharing"))
- is_support_cl_nv_d3d11_sharing = true;
- if (!is_support_cl_nv_d3d11_sharing && !is_support_cl_khr_d3d11_sharing)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No supported extensions");
-#else
- if (!is_support_cl_khr_d3d11_sharing)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No supported extensions");
+ ~OpenCL_D3D11_NV() {
+ device->Release();
+ }
+ ID3D11Device* device;
+ clCreateFromD3D11Texture2DNV_fn clCreateFromD3D11Texture2DNV;
+ clEnqueueAcquireD3D11ObjectsNV_fn clEnqueueAcquireD3D11ObjectsNV;
+ clEnqueueReleaseD3D11ObjectsNV_fn clEnqueueReleaseD3D11ObjectsNV;
+};
#endif
-#ifdef HAVE_OPENCL_D3D11_NV
- if (is_support_cl_nv_d3d11_sharing)
- {
- if (initializedPlatform11 != platform)
- {
- clCreateFromD3D11Texture2DNV = (clCreateFromD3D11Texture2DNV_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D11Texture2DNV");
- clEnqueueAcquireD3D11ObjectsNV = (clEnqueueAcquireD3D11ObjectsNV_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D11ObjectsNV");
- clEnqueueReleaseD3D11ObjectsNV = (clEnqueueReleaseD3D11ObjectsNV_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D11ObjectsNV");
- initializedPlatform11 = platform;
- }
- if (clCreateFromD3D11Texture2DNV && clEnqueueAcquireD3D11ObjectsNV && clEnqueueReleaseD3D11ObjectsNV)
- {
- useCLNVEXT = true;
- }
- }
- else
-#endif
- {
- if (is_support_cl_khr_d3d11_sharing)
- {
- if (initializedPlatform11 != platform)
- {
- clCreateFromD3D11Texture2DKHR = (clCreateFromD3D11Texture2DKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D11Texture2DKHR");
- clEnqueueAcquireD3D11ObjectsKHR = (clEnqueueAcquireD3D11ObjectsKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D11ObjectsKHR");
- clEnqueueReleaseD3D11ObjectsKHR = (clEnqueueReleaseD3D11ObjectsKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D11ObjectsKHR");
- initializedPlatform11 = platform;
- }
- if (!clCreateFromD3D11Texture2DKHR || !clEnqueueAcquireD3D11ObjectsKHR || !clEnqueueReleaseD3D11ObjectsKHR)
- {
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't find functions for D3D11");
- }
- }
- }
- return useCLNVEXT;
- }
-
- void initializeD3D9()
+class OpenCL_D3D11 : public ocl::Context::UserContext
+{
+public:
+ OpenCL_D3D11(cl_platform_id platform, ID3D11Device* _device) : device(_device)
{
- using namespace cv::ocl;
- cl_platform_id platform = getPlatform();
- if (initializedPlatform9 != platform)
+ clCreateFromD3D11Texture2DKHR = (clCreateFromD3D11Texture2DKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D11Texture2DKHR");
+ clEnqueueAcquireD3D11ObjectsKHR = (clEnqueueAcquireD3D11ObjectsKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D11ObjectsKHR");
+ clEnqueueReleaseD3D11ObjectsKHR = (clEnqueueReleaseD3D11ObjectsKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D11ObjectsKHR");
+ if (!clCreateFromD3D11Texture2DKHR || !clEnqueueAcquireD3D11ObjectsKHR || !clEnqueueReleaseD3D11ObjectsKHR)
{
- clCreateFromDX9MediaSurfaceKHR = (clCreateFromDX9MediaSurfaceKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromDX9MediaSurfaceKHR");
- clEnqueueAcquireDX9MediaSurfacesKHR = (clEnqueueAcquireDX9MediaSurfacesKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireDX9MediaSurfacesKHR");
- clEnqueueReleaseDX9MediaSurfacesKHR = (clEnqueueReleaseDX9MediaSurfacesKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseDX9MediaSurfacesKHR");
- initializedPlatform9 = platform;
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't find functions for D3D11");
}
+ device->AddRef();
+ }
+ ~OpenCL_D3D11() {
+ device->Release();
+ }
+ ID3D11Device* device;
+ clCreateFromD3D11Texture2DKHR_fn clCreateFromD3D11Texture2DKHR;
+ clEnqueueAcquireD3D11ObjectsKHR_fn clEnqueueAcquireD3D11ObjectsKHR;
+ clEnqueueReleaseD3D11ObjectsKHR_fn clEnqueueReleaseD3D11ObjectsKHR;
+};
+
+class OpenCL_D3D9 : public ocl::Context::UserContext
+{
+public:
+ OpenCL_D3D9(cl_platform_id platform, IDirect3DDevice9* _device, IDirect3DDevice9Ex* _deviceEx)
+ : device(_device)
+ , deviceEx(_deviceEx)
+ {
+ clCreateFromDX9MediaSurfaceKHR = (clCreateFromDX9MediaSurfaceKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromDX9MediaSurfaceKHR");
+ clEnqueueAcquireDX9MediaSurfacesKHR = (clEnqueueAcquireDX9MediaSurfacesKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireDX9MediaSurfacesKHR");
+ clEnqueueReleaseDX9MediaSurfacesKHR = (clEnqueueReleaseDX9MediaSurfacesKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseDX9MediaSurfacesKHR");
if (!clCreateFromDX9MediaSurfaceKHR || !clEnqueueAcquireDX9MediaSurfacesKHR || !clEnqueueReleaseDX9MediaSurfacesKHR)
{
CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't find functions for D3D9");
}
+ if (device)
+ device->AddRef();
+ if (deviceEx)
+ deviceEx->AddRef();
}
+ ~OpenCL_D3D9() {
+ if (device)
+ device->Release();
+ if (deviceEx)
+ deviceEx->Release();
+ }
+ IDirect3DDevice9* device;
+ IDirect3DDevice9Ex* deviceEx;
+ clCreateFromDX9MediaSurfaceKHR_fn clCreateFromDX9MediaSurfaceKHR;
+ clEnqueueAcquireDX9MediaSurfacesKHR_fn clEnqueueAcquireDX9MediaSurfacesKHR;
+ clEnqueueReleaseDX9MediaSurfacesKHR_fn clEnqueueReleaseDX9MediaSurfacesKHR;
+};
- void initializeD3D10()
+class OpenCL_D3D10 : public ocl::Context::UserContext
+{
+public:
+ OpenCL_D3D10(cl_platform_id platform, ID3D10Device* _device) : device(_device)
{
- using namespace cv::ocl;
- cl_platform_id platform = getPlatform();
- if (initializedPlatform10 != platform)
- {
- clCreateFromD3D10Texture2DKHR = (clCreateFromD3D10Texture2DKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D10Texture2DKHR");
- clEnqueueAcquireD3D10ObjectsKHR = (clEnqueueAcquireD3D10ObjectsKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D10ObjectsKHR");
- clEnqueueReleaseD3D10ObjectsKHR = (clEnqueueReleaseD3D10ObjectsKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D10ObjectsKHR");
- initializedPlatform10 = platform;
- }
+ clCreateFromD3D10Texture2DKHR = (clCreateFromD3D10Texture2DKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromD3D10Texture2DKHR");
+ clEnqueueAcquireD3D10ObjectsKHR = (clEnqueueAcquireD3D10ObjectsKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireD3D10ObjectsKHR");
+ clEnqueueReleaseD3D10ObjectsKHR = (clEnqueueReleaseD3D10ObjectsKHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseD3D10ObjectsKHR");
if (!clCreateFromD3D10Texture2DKHR || !clEnqueueAcquireD3D10ObjectsKHR || !clEnqueueReleaseD3D10ObjectsKHR)
{
CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't find functions for D3D10");
}
+ device->AddRef();
}
+ ~OpenCL_D3D10() {
+ device->Release();
+ }
+ ID3D10Device* device;
+ clCreateFromD3D10Texture2DKHR_fn clCreateFromD3D10Texture2DKHR;
+ clEnqueueAcquireD3D10ObjectsKHR_fn clEnqueueAcquireD3D10ObjectsKHR;
+ clEnqueueReleaseD3D10ObjectsKHR_fn clEnqueueReleaseD3D10ObjectsKHR;
};
-
-OpenCLDirectXImpl* createDirectXImpl()
-{
- return new OpenCLDirectXImpl();
-}
-void deleteDirectXImpl(OpenCLDirectXImpl** p)
-{
- if (*p)
- {
- delete (*p);
- *p = NULL;
- }
-}
-OpenCLDirectXImpl& getImpl()
-{
- OpenCLDirectXImpl* i = getDirectXImpl(ocl::Context::getDefault());
- CV_Assert(i);
- return *i;
-}
-}
-using namespace internal;
#endif
namespace ocl {
@@ -443,95 +378,57 @@ Context& initializeContextFromD3D11Device(ID3D11Device* pD3D11Device)
// TODO Filter platforms by name from OPENCV_OPENCL_DEVICE
- size_t exts_len;
- cv::AutoBuffer extensions;
- bool is_support_cl_khr_d3d11_sharing = false;
-#ifdef HAVE_OPENCL_D3D11_NV
- bool is_support_cl_nv_d3d11_sharing = false;
-#endif
for (int i = 0; i < (int)numPlatforms; i++)
{
- status = clGetPlatformInfo(platforms[i], CL_PLATFORM_EXTENSIONS, 0, NULL, &exts_len);
- if (status != CL_SUCCESS)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get length of CL_PLATFORM_EXTENSIONS");
- extensions.resize(exts_len);
- status = clGetPlatformInfo(platforms[i], CL_PLATFORM_EXTENSIONS, exts_len, static_cast(extensions.data()), NULL);
- if (status != CL_SUCCESS)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No available CL_PLATFORM_EXTENSIONS");
- if (strstr(extensions.data(), "cl_khr_d3d11_sharing"))
- is_support_cl_khr_d3d11_sharing = true;
-#ifdef HAVE_OPENCL_D3D11_NV
- if (strstr(extensions.data(), "cl_nv_d3d11_sharing"))
- is_support_cl_nv_d3d11_sharing = true;
-#endif
- }
-#ifdef HAVE_OPENCL_D3D11_NV
- if (!is_support_cl_nv_d3d11_sharing && !is_support_cl_khr_d3d11_sharing)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No supported extensions");
-#else
- if (!is_support_cl_khr_d3d11_sharing)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: No supported extensions");
-#endif
+ cl_platform_id platform = platforms[i];
+ std::string platformName = PlatformInfo(&platform).name();
- int found = -1;
- cl_device_id device = NULL;
- cl_uint numDevices = 0;
- cl_context context = NULL;
+ int found = -1;
+ cl_device_id device = NULL;
+ cl_uint numDevices = 0;
+ cl_context context = NULL;
#ifdef HAVE_OPENCL_D3D11_NV
- if (is_support_cl_nv_d3d11_sharing)
- {
- // try with CL_PREFERRED_DEVICES_FOR_D3D11_NV
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromD3D11NV_fn clGetDeviceIDsFromD3D11NV = (clGetDeviceIDsFromD3D11NV_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11NV");
- if (!clGetDeviceIDsFromD3D11NV)
- continue;
-
- device = NULL;
- numDevices = 0;
- status = clGetDeviceIDsFromD3D11NV(platforms[i], CL_D3D11_DEVICE_NV, pD3D11Device,
- CL_PREFERRED_DEVICES_FOR_D3D11_NV, 1, &device, &numDevices);
- if (status != CL_SUCCESS)
- continue;
- if (numDevices > 0)
- {
- cl_context_properties properties[] = {
- CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
- CL_CONTEXT_D3D11_DEVICE_NV, (cl_context_properties)(pD3D11Device),
- //CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
- 0
- };
-
- context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
- if (status != CL_SUCCESS)
- {
- clReleaseDevice(device);
- }
- else
- {
- found = i;
- break;
- }
- }
- }
- if (found < 0)
- {
- // try with CL_ALL_DEVICES_FOR_D3D11_NV
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromD3D11NV_fn clGetDeviceIDsFromD3D11NV = (clGetDeviceIDsFromD3D11NV_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11NV");
- if (!clGetDeviceIDsFromD3D11NV)
- continue;
-
+ // Get extension function "clGetDeviceIDsFromD3D11NV" (part of OpenCL extension "cl_nv_d3d11_sharing")
+ clGetDeviceIDsFromD3D11NV_fn clGetDeviceIDsFromD3D11NV = (clGetDeviceIDsFromD3D11NV_fn)
+ clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11NV");
+ if (clGetDeviceIDsFromD3D11NV) {
+ // try with CL_PREFERRED_DEVICES_FOR_D3D11_NV
+ do {
device = NULL;
numDevices = 0;
status = clGetDeviceIDsFromD3D11NV(platforms[i], CL_D3D11_DEVICE_NV, pD3D11Device,
- CL_ALL_DEVICES_FOR_D3D11_NV, 1, &device, &numDevices);
+ CL_PREFERRED_DEVICES_FOR_D3D11_NV, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
+ if (numDevices > 0)
+ {
+ cl_context_properties properties[] = {
+ CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
+ CL_CONTEXT_D3D11_DEVICE_NV, (cl_context_properties)(pD3D11Device),
+ //CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
+ 0
+ };
+
+ context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
+ if (status != CL_SUCCESS)
+ {
+ clReleaseDevice(device);
+ }
+ else
+ {
+ found = i;
+ }
+ }
+ } while (0);
+ // try with CL_ALL_DEVICES_FOR_D3D11_NV
+ if (found < 0) do {
+ device = NULL;
+ numDevices = 0;
+ status = clGetDeviceIDsFromD3D11NV(platforms[i], CL_D3D11_DEVICE_NV, pD3D11Device,
+ CL_ALL_DEVICES_FOR_D3D11_NV, 1, &device, &numDevices);
+ if (status != CL_SUCCESS)
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -548,33 +445,43 @@ Context& initializeContextFromD3D11Device(ID3D11Device* pD3D11Device)
else
{
found = i;
- break;
}
}
+ } while (0);
+ if (found >= 0) {
+ OpenCLExecutionContext clExecCtx;
+ try
+ {
+ clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform, pD3D11Device));
+ }
+ catch (...)
+ {
+ clReleaseDevice(device);
+ clReleaseContext(context);
+ throw;
+ }
+ clExecCtx.bind();
+ return const_cast(clExecCtx.getContext());
}
}
- }
#endif
- if (is_support_cl_khr_d3d11_sharing)
- {
- if (found < 0)
+ // Get extension function "clGetDeviceIDsFromD3D11KHR" (part of OpenCL extension "cl_khr_d3d11_sharing")
+ clGetDeviceIDsFromD3D11KHR_fn clGetDeviceIDsFromD3D11KHR = (clGetDeviceIDsFromD3D11KHR_fn)
+ clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11KHR");
+ if (clGetDeviceIDsFromD3D11KHR)
{
// try with CL_PREFERRED_DEVICES_FOR_D3D11_KHR
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromD3D11KHR_fn clGetDeviceIDsFromD3D11KHR = (clGetDeviceIDsFromD3D11KHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11KHR");
- if (!clGetDeviceIDsFromD3D11KHR)
- continue;
+ do {
device = NULL;
numDevices = 0;
status = clGetDeviceIDsFromD3D11KHR(platforms[i], CL_D3D11_DEVICE_KHR, pD3D11Device,
- CL_PREFERRED_DEVICES_FOR_D3D11_KHR, 1, &device, &numDevices);
+ CL_PREFERRED_DEVICES_FOR_D3D11_KHR, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -591,27 +498,17 @@ Context& initializeContextFromD3D11Device(ID3D11Device* pD3D11Device)
else
{
found = i;
- break;
}
}
- }
- }
- if (found < 0)
- {
+ } while (0);
// try with CL_ALL_DEVICES_FOR_D3D11_KHR
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromD3D11KHR_fn clGetDeviceIDsFromD3D11KHR = (clGetDeviceIDsFromD3D11KHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D11KHR");
- if (!clGetDeviceIDsFromD3D11KHR)
- continue;
-
+ if (found < 0) do {
device = NULL;
numDevices = 0;
status = clGetDeviceIDsFromD3D11KHR(platforms[i], CL_D3D11_DEVICE_KHR, pD3D11Device,
- CL_ALL_DEVICES_FOR_D3D11_KHR, 1, &device, &numDevices);
+ CL_ALL_DEVICES_FOR_D3D11_KHR, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -628,33 +525,30 @@ Context& initializeContextFromD3D11Device(ID3D11Device* pD3D11Device)
else
{
found = i;
- break;
}
}
+ } while (0);
+
+ if (found >= 0) {
+ OpenCLExecutionContext clExecCtx;
+ try
+ {
+ clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform, pD3D11Device));
+ }
+ catch (...)
+ {
+ clReleaseDevice(device);
+ clReleaseContext(context);
+ throw;
+ }
+ clExecCtx.bind();
+ return const_cast(clExecCtx.getContext());
}
}
}
- if (found < 0)
- {
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
- }
- cl_platform_id platform = platforms[found];
- std::string platformName = PlatformInfo(&platform).name();
-
- OpenCLExecutionContext clExecCtx;
- try
- {
- clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
- }
- catch (...)
- {
- clReleaseDevice(device);
- clReleaseContext(context);
- throw;
- }
- clExecCtx.bind();
- return const_cast(clExecCtx.getContext());
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
#endif
}
@@ -679,62 +573,28 @@ Context& initializeContextFromD3D10Device(ID3D10Device* pD3D10Device)
CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get platforms");
// TODO Filter platforms by name from OPENCV_OPENCL_DEVICE
-
- int found = -1;
- cl_device_id device = NULL;
- cl_uint numDevices = 0;
- cl_context context = NULL;
-
- // try with CL_PREFERRED_DEVICES_FOR_D3D10_KHR
for (int i = 0; i < (int)numPlatforms; i++)
{
+ cl_platform_id platform = platforms[i];
+ std::string platformName = PlatformInfo(&platform).name();
+ int found = -1;
+ cl_device_id device = NULL;
+ cl_uint numDevices = 0;
+ cl_context context = NULL;
+
clGetDeviceIDsFromD3D10KHR_fn clGetDeviceIDsFromD3D10KHR = (clGetDeviceIDsFromD3D10KHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D10KHR");
+ clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D10KHR");
if (!clGetDeviceIDsFromD3D10KHR)
continue;
- device = NULL;
- numDevices = 0;
- status = clGetDeviceIDsFromD3D10KHR(platforms[i], CL_D3D10_DEVICE_KHR, pD3D10Device,
- CL_PREFERRED_DEVICES_FOR_D3D10_KHR, 1, &device, &numDevices);
- if (status != CL_SUCCESS)
- continue;
- if (numDevices > 0)
- {
- cl_context_properties properties[] = {
- CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
- CL_CONTEXT_D3D10_DEVICE_KHR, (cl_context_properties)(pD3D10Device),
- CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
- NULL, NULL
- };
- context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
- if (status != CL_SUCCESS)
- {
- clReleaseDevice(device);
- }
- else
- {
- found = i;
- break;
- }
- }
- }
- if (found < 0)
- {
- // try with CL_ALL_DEVICES_FOR_D3D10_KHR
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromD3D10KHR_fn clGetDeviceIDsFromD3D10KHR = (clGetDeviceIDsFromD3D10KHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromD3D10KHR");
- if (!clGetDeviceIDsFromD3D10KHR)
- continue;
-
+ // try with CL_PREFERRED_DEVICES_FOR_D3D10_KHR
+ do {
device = NULL;
numDevices = 0;
status = clGetDeviceIDsFromD3D10KHR(platforms[i], CL_D3D10_DEVICE_KHR, pD3D10Device,
- CL_ALL_DEVICES_FOR_D3D10_KHR, 1, &device, &numDevices);
+ CL_PREFERRED_DEVICES_FOR_D3D10_KHR, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -751,30 +611,56 @@ Context& initializeContextFromD3D10Device(ID3D10Device* pD3D10Device)
else
{
found = i;
- break;
}
}
+ } while (0);
+ // try with CL_ALL_DEVICES_FOR_D3D10_KHR
+ if (found < 0) do
+ {
+ device = NULL;
+ numDevices = 0;
+ status = clGetDeviceIDsFromD3D10KHR(platforms[i], CL_D3D10_DEVICE_KHR, pD3D10Device,
+ CL_ALL_DEVICES_FOR_D3D10_KHR, 1, &device, &numDevices);
+ if (status != CL_SUCCESS)
+ break;
+ if (numDevices > 0)
+ {
+ cl_context_properties properties[] = {
+ CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
+ CL_CONTEXT_D3D10_DEVICE_KHR, (cl_context_properties)(pD3D10Device),
+ CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
+ NULL, NULL
+ };
+ context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
+ if (status != CL_SUCCESS)
+ {
+ clReleaseDevice(device);
+ }
+ else
+ {
+ found = i;
+ }
+ }
+ } while (0);
+
+ if (found >= 0) {
+ OpenCLExecutionContext clExecCtx;
+ try
+ {
+ clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform, pD3D10Device));
+ }
+ catch (...)
+ {
+ clReleaseDevice(device);
+ clReleaseContext(context);
+ throw;
+ }
+ clExecCtx.bind();
+ return const_cast(clExecCtx.getContext());
}
- if (found < 0)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
}
-
- cl_platform_id platform = platforms[found];
- std::string platformName = PlatformInfo(&platform).name();
-
- OpenCLExecutionContext clExecCtx;
- try
- {
- clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
- }
- catch (...)
- {
- clReleaseDevice(device);
- clReleaseContext(context);
- throw;
- }
- clExecCtx.bind();
- return const_cast(clExecCtx.getContext());
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
#endif
}
@@ -799,64 +685,29 @@ Context& initializeContextFromDirect3DDevice9Ex(IDirect3DDevice9Ex* pDirect3DDev
CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get platforms");
// TODO Filter platforms by name from OPENCV_OPENCL_DEVICE
-
- int found = -1;
- cl_device_id device = NULL;
- cl_uint numDevices = 0;
- cl_context context = NULL;
-
- // try with CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
for (int i = 0; i < (int)numPlatforms; i++)
{
+ cl_platform_id platform = platforms[i];
+ std::string platformName = PlatformInfo(&platform).name();
+ int found = -1;
+ cl_device_id device = NULL;
+ cl_uint numDevices = 0;
+ cl_context context = NULL;
+
clGetDeviceIDsFromDX9MediaAdapterKHR_fn clGetDeviceIDsFromDX9MediaAdapterKHR = (clGetDeviceIDsFromDX9MediaAdapterKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
+ clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
if (!clGetDeviceIDsFromDX9MediaAdapterKHR)
continue;
- device = NULL;
- numDevices = 0;
- cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9EX_KHR;
- status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9Ex,
- CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
- if (status != CL_SUCCESS)
- continue;
- if (numDevices > 0)
- {
- cl_context_properties properties[] = {
- CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
- CL_CONTEXT_ADAPTER_D3D9EX_KHR, (cl_context_properties)(pDirect3DDevice9Ex),
- CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
- NULL, NULL
- };
- context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
- if (status != CL_SUCCESS)
- {
- clReleaseDevice(device);
- }
- else
- {
- found = i;
- break;
- }
- }
- }
- if (found < 0)
- {
- // try with CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromDX9MediaAdapterKHR_fn clGetDeviceIDsFromDX9MediaAdapterKHR = (clGetDeviceIDsFromDX9MediaAdapterKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
- if (!clGetDeviceIDsFromDX9MediaAdapterKHR)
- continue;
-
+ // try with CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
+ do {
device = NULL;
numDevices = 0;
cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9EX_KHR;
status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9Ex,
- CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
+ CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -873,31 +724,57 @@ Context& initializeContextFromDirect3DDevice9Ex(IDirect3DDevice9Ex* pDirect3DDev
else
{
found = i;
- break;
}
}
+ } while (0);
+ // try with CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
+ if (found < 0) do
+ {
+ device = NULL;
+ numDevices = 0;
+ cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9EX_KHR;
+ status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9Ex,
+ CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
+ if (status != CL_SUCCESS)
+ break;
+ if (numDevices > 0)
+ {
+ cl_context_properties properties[] = {
+ CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
+ CL_CONTEXT_ADAPTER_D3D9EX_KHR, (cl_context_properties)(pDirect3DDevice9Ex),
+ CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
+ NULL, NULL
+ };
+ context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
+ if (status != CL_SUCCESS)
+ {
+ clReleaseDevice(device);
+ }
+ else
+ {
+ found = i;
+ }
+ }
+ } while (0);
+
+ if (found >= 0) {
+ OpenCLExecutionContext clExecCtx;
+ try
+ {
+ clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform, nullptr, pDirect3DDevice9Ex));
+ }
+ catch (...)
+ {
+ clReleaseDevice(device);
+ clReleaseContext(context);
+ throw;
+ }
+ clExecCtx.bind();
+ return const_cast(clExecCtx.getContext());
}
- if (found < 0)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
}
-
- cl_platform_id platform = platforms[found];
- std::string platformName = PlatformInfo(&platform).name();
-
- OpenCLExecutionContext clExecCtx;
- try
- {
- clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
- }
- catch (...)
- {
- clReleaseDevice(device);
- clReleaseContext(context);
- throw;
- }
- clExecCtx.bind();
- getImpl().isDirect3DDevice9Ex = true;
- return const_cast(clExecCtx.getContext());
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
#endif
}
@@ -922,64 +799,29 @@ Context& initializeContextFromDirect3DDevice9(IDirect3DDevice9* pDirect3DDevice9
CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get platforms");
// TODO Filter platforms by name from OPENCV_OPENCL_DEVICE
-
- int found = -1;
- cl_device_id device = NULL;
- cl_uint numDevices = 0;
- cl_context context = NULL;
-
- // try with CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
for (int i = 0; i < (int)numPlatforms; i++)
{
+ cl_platform_id platform = platforms[i];
+ std::string platformName = PlatformInfo(&platform).name();
+ int found = -1;
+ cl_device_id device = NULL;
+ cl_uint numDevices = 0;
+ cl_context context = NULL;
+
clGetDeviceIDsFromDX9MediaAdapterKHR_fn clGetDeviceIDsFromDX9MediaAdapterKHR = (clGetDeviceIDsFromDX9MediaAdapterKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
+ clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
if (!clGetDeviceIDsFromDX9MediaAdapterKHR)
continue;
- device = NULL;
- numDevices = 0;
- cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9_KHR;
- status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9,
- CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
- if (status != CL_SUCCESS)
- continue;
- if (numDevices > 0)
- {
- cl_context_properties properties[] = {
- CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
- CL_CONTEXT_ADAPTER_D3D9_KHR, (cl_context_properties)(pDirect3DDevice9),
- CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
- NULL, NULL
- };
- context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
- if (status != CL_SUCCESS)
- {
- clReleaseDevice(device);
- }
- else
- {
- found = i;
- break;
- }
- }
- }
- if (found < 0)
- {
- // try with CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
- for (int i = 0; i < (int)numPlatforms; i++)
- {
- clGetDeviceIDsFromDX9MediaAdapterKHR_fn clGetDeviceIDsFromDX9MediaAdapterKHR = (clGetDeviceIDsFromDX9MediaAdapterKHR_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromDX9MediaAdapterKHR");
- if (!clGetDeviceIDsFromDX9MediaAdapterKHR)
- continue;
-
+ // try with CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
+ do {
device = NULL;
numDevices = 0;
cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9_KHR;
status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9,
- CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
+ CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
if (status != CL_SUCCESS)
- continue;
+ break;
if (numDevices > 0)
{
cl_context_properties properties[] = {
@@ -999,28 +841,56 @@ Context& initializeContextFromDirect3DDevice9(IDirect3DDevice9* pDirect3DDevice9
break;
}
}
+ } while (0);
+ // try with CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
+ if (found < 0) do
+ {
+ device = NULL;
+ numDevices = 0;
+ cl_dx9_media_adapter_type_khr type = CL_ADAPTER_D3D9_KHR;
+ status = clGetDeviceIDsFromDX9MediaAdapterKHR(platforms[i], 1, &type, &pDirect3DDevice9,
+ CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR, 1, &device, &numDevices);
+ if (status != CL_SUCCESS)
+ break;
+ if (numDevices > 0)
+ {
+ cl_context_properties properties[] = {
+ CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[i],
+ CL_CONTEXT_ADAPTER_D3D9_KHR, (cl_context_properties)(pDirect3DDevice9),
+ CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE,
+ NULL, NULL
+ };
+ context = clCreateContext(properties, 1, &device, NULL, NULL, &status);
+ if (status != CL_SUCCESS)
+ {
+ clReleaseDevice(device);
+ }
+ else
+ {
+ found = i;
+ break;
+ }
+ }
+ } while (0);
+
+ if (found >= 0) {
+ OpenCLExecutionContext clExecCtx;
+ try
+ {
+ clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform, pDirect3DDevice9, nullptr));
+ }
+ catch (...)
+ {
+ clReleaseDevice(device);
+ clReleaseContext(context);
+ throw;
+ }
+ clExecCtx.bind();
+ return const_cast(clExecCtx.getContext());
}
- if (found < 0)
- CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
}
-
- cl_platform_id platform = platforms[found];
- std::string platformName = PlatformInfo(&platform).name();
-
- OpenCLExecutionContext clExecCtx;
- try
- {
- clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
- }
- catch (...)
- {
- clReleaseDevice(device);
- clReleaseContext(context);
- throw;
- }
- clExecCtx.bind();
- getImpl().isDirect3DDevice9Ex = false;
- return const_cast(clExecCtx.getContext());
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't create context for DirectX interop");
#endif
}
@@ -1104,24 +974,25 @@ static void __convertToD3D11Texture2DKHR(InputArray src, ID3D11Texture2D* pD3D11
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
cl_context context = (cl_context)ctx.ptr();
- OpenCLDirectXImpl& impl = getImpl();
+ OpenCL_D3D11* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
cl_int status = 0;
cl_mem clImage = 0;
#ifdef HAVE_DIRECTX_NV12
cl_mem clImageUV = 0;
#endif
- clImage = impl.clCreateFromD3D11Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 0, &status);
+ clImage = impl->clCreateFromD3D11Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DKHR failed");
#ifdef HAVE_DIRECTX_NV12
if(DXGI_FORMAT_NV12 == desc.Format)
{
- clImageUV = impl.clCreateFromD3D11Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 1, &status);
+ clImageUV = impl->clCreateFromD3D11Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DKHR failed");
}
@@ -1129,21 +1000,21 @@ static void __convertToD3D11Texture2DKHR(InputArray src, ID3D11Texture2D* pD3D11
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsKHR failed");
#ifdef HAVE_DIRECTX_NV12
if(DXGI_FORMAT_NV12 == desc.Format)
{
- status = impl.clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsKHR failed");
if(!ocl::ocl_convert_bgr_to_nv12(clBuffer, (int)u.step[0], u.cols, u.rows, clImage, clImageUV))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_bgr_to_nv12 failed");
- status = impl.clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsKHR failed");
}
@@ -1159,7 +1030,7 @@ static void __convertToD3D11Texture2DKHR(InputArray src, ID3D11Texture2D* pD3D11
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyBufferToImage failed");
}
- status = impl.clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsKHR failed");
@@ -1203,44 +1074,45 @@ static void __convertToD3D11Texture2DNV(InputArray src, ID3D11Texture2D* pD3D11T
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
cl_context context = (cl_context)ctx.ptr();
- OpenCLDirectXImpl& impl = getImpl();
+ OpenCL_D3D11_NV* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
cl_int status = 0;
cl_mem clImage = 0;
#ifdef HAVE_DIRECTX_NV12
cl_mem clImageUV = 0;
#endif
- clImage = impl.clCreateFromD3D11Texture2DNV(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 0, &status);
+ clImage = impl->clCreateFromD3D11Texture2DNV(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DNV failed");
#ifdef HAVE_DIRECTX_NV12
if (DXGI_FORMAT_NV12 == desc.Format)
{
- clImageUV = impl.clCreateFromD3D11Texture2DNV(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 1, &status);
+ clImageUV = impl->clCreateFromD3D11Texture2DNV(context, CL_MEM_WRITE_ONLY, pD3D11Texture2D, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DNV failed");
}
#endif
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsNV failed");
#ifdef HAVE_DIRECTX_NV12
if(DXGI_FORMAT_NV12 == desc.Format)
{
- status = impl.clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsNV failed");
if(!ocl::ocl_convert_bgr_to_nv12(clBuffer, (int)u.step[0], u.cols, u.rows, clImage, clImageUV))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_bgr_to_nv12 failed");
- status = impl.clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsNV failed");
}
@@ -1256,7 +1128,7 @@ static void __convertToD3D11Texture2DNV(InputArray src, ID3D11Texture2D* pD3D11T
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyBufferToImage failed");
}
- status = impl.clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsNV failed");
@@ -1298,15 +1170,16 @@ static void __convertFromD3D11Texture2DKHR(ID3D11Texture2D* pD3D11Texture2D, Out
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
cl_context context = (cl_context)ctx.ptr();
- OpenCLDirectXImpl& impl = getImpl();
+ OpenCL_D3D11* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
cl_int status = 0;
cl_mem clImage = 0;
- clImage = impl.clCreateFromD3D11Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 0, &status);
+ clImage = impl->clCreateFromD3D11Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DKHR failed");
@@ -1314,7 +1187,7 @@ static void __convertFromD3D11Texture2DKHR(ID3D11Texture2D* pD3D11Texture2D, Out
cl_mem clImageUV = 0;
if(DXGI_FORMAT_NV12 == desc.Format)
{
- clImageUV = impl.clCreateFromD3D11Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 1, &status);
+ clImageUV = impl->clCreateFromD3D11Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DKHR failed");
}
@@ -1322,21 +1195,21 @@ static void __convertFromD3D11Texture2DKHR(ID3D11Texture2D* pD3D11Texture2D, Out
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsKHR failed");
#ifdef HAVE_DIRECTX_NV12
if(DXGI_FORMAT_NV12 == desc.Format)
{
- status = impl.clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsKHR failed");
if(!ocl::ocl_convert_nv12_to_bgr(clImage, clImageUV, clBuffer, (int)u.step[0], u.cols, u.rows))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_nv12_to_bgr failed");
- status = impl.clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsKHR failed");
}
@@ -1352,7 +1225,7 @@ static void __convertFromD3D11Texture2DKHR(ID3D11Texture2D* pD3D11Texture2D, Out
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyImageToBuffer failed");
}
- status = impl.clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsKHR failed");
@@ -1394,15 +1267,16 @@ static void __convertFromD3D11Texture2DNV(ID3D11Texture2D* pD3D11Texture2D, Outp
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
cl_context context = (cl_context)ctx.ptr();
- OpenCLDirectXImpl& impl = getImpl();
+ OpenCL_D3D11_NV* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
cl_int status = 0;
cl_mem clImage = 0;
- clImage = impl.clCreateFromD3D11Texture2DNV(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 0, &status);
+ clImage = impl->clCreateFromD3D11Texture2DNV(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DNV failed");
@@ -1410,28 +1284,28 @@ static void __convertFromD3D11Texture2DNV(ID3D11Texture2D* pD3D11Texture2D, Outp
cl_mem clImageUV = 0;
if(DXGI_FORMAT_NV12 == desc.Format)
{
- clImageUV = impl.clCreateFromD3D11Texture2DNV(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 1, &status);
+ clImageUV = impl->clCreateFromD3D11Texture2DNV(context, CL_MEM_READ_ONLY, pD3D11Texture2D, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D11Texture2DNV failed");
}
#endif
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsNV failed");
#ifdef HAVE_DIRECTX_NV12
if (DXGI_FORMAT::DXGI_FORMAT_NV12 == desc.Format)
{
- status = impl.clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D11ObjectsNV failed");
if (!ocl::ocl_convert_nv12_to_bgr(clImage, clImageUV, clBuffer, (int)u.step[0], u.cols, u.rows))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_nv12_to_bgr failed");
- status = impl.clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImageUV, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsNV failed");
}
@@ -1447,7 +1321,7 @@ static void __convertFromD3D11Texture2DNV(ID3D11Texture2D* pD3D11Texture2D, Outp
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyImageToBuffer failed");
}
- status = impl.clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D11ObjectsNV(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D11ObjectsNV failed");
@@ -1479,16 +1353,21 @@ void convertToD3D11Texture2D(InputArray src, ID3D11Texture2D* pD3D11Texture2D)
NO_OPENCL_SUPPORT_ERROR;
#else
- bool useCLNVEXT = getImpl().initializeD3D11();
- if(!useCLNVEXT){
- __convertToD3D11Texture2DKHR(src,pD3D11Texture2D);
- }
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
#ifdef HAVE_OPENCL_D3D11_NV
- else
- {
+ OpenCL_D3D11_NV* impl_nv = ctx.getUserContext().get();
+ if (impl_nv) {
__convertToD3D11Texture2DNV(src,pD3D11Texture2D);
+ return;
}
#endif
+ OpenCL_D3D11* impl = ctx.getUserContext().get();
+ if (impl) {
+ __convertToD3D11Texture2DKHR(src, pD3D11Texture2D);
+ }
+ else {
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
+ }
#endif
}
@@ -1501,16 +1380,20 @@ void convertFromD3D11Texture2D(ID3D11Texture2D* pD3D11Texture2D, OutputArray dst
NO_OPENCL_SUPPORT_ERROR;
#else
- bool useCLNVEXT = getImpl().initializeD3D11();
- if(!useCLNVEXT){
- __convertFromD3D11Texture2DKHR(pD3D11Texture2D,dst);
- }
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
#ifdef HAVE_OPENCL_D3D11_NV
- else
- {
+ OpenCL_D3D11_NV* impl_nv = ctx.getUserContext().get();
+ if (impl_nv) {
__convertFromD3D11Texture2DNV(pD3D11Texture2D,dst);
}
#endif
+ OpenCL_D3D11* impl = ctx.getUserContext().get();
+ if (impl) {
+ __convertFromD3D11Texture2DKHR(pD3D11Texture2D, dst);
+ }
+ else {
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
+ }
#endif
}
@@ -1520,8 +1403,11 @@ void convertToD3D10Texture2D(InputArray src, ID3D10Texture2D* pD3D10Texture2D)
#if !defined(HAVE_DIRECTX)
NO_DIRECTX_SUPPORT_ERROR;
#elif defined(HAVE_OPENCL)
- OpenCLDirectXImpl& impl = getImpl();
- impl.initializeD3D10();
+
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
+ OpenCL_D3D10* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
D3D10_TEXTURE2D_DESC desc = { 0 };
pD3D10Texture2D->GetDesc(&desc);
@@ -1533,8 +1419,6 @@ void convertToD3D10Texture2D(InputArray src, ID3D10Texture2D* pD3D10Texture2D)
Size srcSize = src.size();
CV_Assert(srcSize.width == (int)desc.Width && srcSize.height == (int)desc.Height);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
cl_context context = (cl_context)ctx.ptr();
UMat u = src.getUMat();
@@ -1544,14 +1428,14 @@ void convertToD3D10Texture2D(InputArray src, ID3D10Texture2D* pD3D10Texture2D)
CV_Assert(u.isContinuous());
cl_int status = 0;
- cl_mem clImage = impl.clCreateFromD3D10Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D10Texture2D, 0, &status);
+ cl_mem clImage = impl->clCreateFromD3D10Texture2DKHR(context, CL_MEM_WRITE_ONLY, pD3D10Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D10Texture2DKHR failed");
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D10ObjectsKHR failed");
size_t offset = 0; // TODO
@@ -1560,7 +1444,7 @@ void convertToD3D10Texture2D(InputArray src, ID3D10Texture2D* pD3D10Texture2D)
status = clEnqueueCopyBufferToImage(q, clBuffer, clImage, offset, dst_origin, region, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyBufferToImage failed");
- status = impl.clEnqueueReleaseD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D10ObjectsKHR failed");
@@ -1576,14 +1460,17 @@ void convertToD3D10Texture2D(InputArray src, ID3D10Texture2D* pD3D10Texture2D)
NO_OPENCL_SUPPORT_ERROR;
#endif
}
+
void convertFromD3D10Texture2D(ID3D10Texture2D* pD3D10Texture2D, OutputArray dst)
{
CV_UNUSED(pD3D10Texture2D); CV_UNUSED(dst);
#if !defined(HAVE_DIRECTX)
NO_DIRECTX_SUPPORT_ERROR;
#elif defined(HAVE_OPENCL)
- OpenCLDirectXImpl& impl = getImpl();
- impl.initializeD3D10();
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
+ OpenCL_D3D10* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
D3D10_TEXTURE2D_DESC desc = { 0 };
pD3D10Texture2D->GetDesc(&desc);
@@ -1591,8 +1478,6 @@ void convertFromD3D10Texture2D(ID3D10Texture2D* pD3D10Texture2D, OutputArray dst
int textureType = getTypeFromDXGI_FORMAT(desc.Format);
CV_Assert(textureType >= 0);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
cl_context context = (cl_context)ctx.ptr();
// TODO Need to specify ACCESS_WRITE here somehow to prevent useless data copying!
@@ -1604,14 +1489,14 @@ void convertFromD3D10Texture2D(ID3D10Texture2D* pD3D10Texture2D, OutputArray dst
CV_Assert(u.isContinuous());
cl_int status = 0;
- cl_mem clImage = impl.clCreateFromD3D10Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D10Texture2D, 0, &status);
+ cl_mem clImage = impl->clCreateFromD3D10Texture2DKHR(context, CL_MEM_READ_ONLY, pD3D10Texture2D, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromD3D10Texture2DKHR failed");
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireD3D10ObjectsKHR failed");
size_t offset = 0; // TODO
@@ -1620,7 +1505,7 @@ void convertFromD3D10Texture2D(ID3D10Texture2D* pD3D10Texture2D, OutputArray dst
status = clEnqueueCopyImageToBuffer(q, clImage, clBuffer, src_origin, region, offset, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyImageToBuffer failed");
- status = impl.clEnqueueReleaseD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseD3D10ObjectsKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseD3D10ObjectsKHR failed");
@@ -1637,15 +1522,17 @@ void convertFromD3D10Texture2D(ID3D10Texture2D* pD3D10Texture2D, OutputArray dst
#endif
}
-
void convertToDirect3DSurface9(InputArray src, IDirect3DSurface9* pDirect3DSurface9, void* surfaceSharedHandle)
{
CV_UNUSED(src); CV_UNUSED(pDirect3DSurface9); CV_UNUSED(surfaceSharedHandle);
#if !defined(HAVE_DIRECTX)
NO_DIRECTX_SUPPORT_ERROR;
#elif defined(HAVE_OPENCL)
- OpenCLDirectXImpl& impl = getImpl();
- impl.initializeD3D9();
+
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
+ OpenCL_D3D9* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
D3DSURFACE_DESC desc;
if (FAILED(pDirect3DSurface9->GetDesc(&desc)))
@@ -1660,8 +1547,6 @@ void convertToDirect3DSurface9(InputArray src, IDirect3DSurface9* pDirect3DSurfa
Size srcSize = src.size();
CV_Assert(srcSize.width == (int)desc.Width && srcSize.height == (int)desc.Height);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
cl_context context = (cl_context)ctx.ptr();
UMat u = src.getUMat();
@@ -1672,8 +1557,8 @@ void convertToDirect3DSurface9(InputArray src, IDirect3DSurface9* pDirect3DSurfa
cl_int status = 0;
cl_dx9_surface_info_khr surfaceInfo = {pDirect3DSurface9, (HANDLE)surfaceSharedHandle};
- cl_mem clImage = impl.clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_WRITE_ONLY,
- impl.isDirect3DDevice9Ex ? CL_ADAPTER_D3D9EX_KHR : CL_ADAPTER_D3D9_KHR,
+ cl_mem clImage = impl->clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_WRITE_ONLY,
+ impl->deviceEx ? CL_ADAPTER_D3D9EX_KHR : CL_ADAPTER_D3D9_KHR,
&surfaceInfo, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromDX9MediaSurfaceKHR failed");
@@ -1681,7 +1566,7 @@ void convertToDirect3DSurface9(InputArray src, IDirect3DSurface9* pDirect3DSurfa
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireDX9MediaSurfacesKHR failed");
size_t offset = 0; // TODO
@@ -1690,7 +1575,7 @@ void convertToDirect3DSurface9(InputArray src, IDirect3DSurface9* pDirect3DSurfa
status = clEnqueueCopyBufferToImage(q, clBuffer, clImage, offset, dst_origin, region, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyBufferToImage failed");
- status = impl.clEnqueueReleaseDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseDX9MediaSurfacesKHR failed");
@@ -1713,8 +1598,11 @@ void convertFromDirect3DSurface9(IDirect3DSurface9* pDirect3DSurface9, OutputArr
#if !defined(HAVE_DIRECTX)
NO_DIRECTX_SUPPORT_ERROR;
#elif defined(HAVE_OPENCL)
- OpenCLDirectXImpl& impl = getImpl();
- impl.initializeD3D9();
+
+ ocl::Context& ctx = ocl::OpenCLExecutionContext::getCurrent().getContext();
+ OpenCL_D3D9* impl = ctx.getUserContext().get();
+ if (nullptr == impl)
+ CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: Context initilized without DirectX interoperability");
D3DSURFACE_DESC desc;
if (FAILED(pDirect3DSurface9->GetDesc(&desc)))
@@ -1725,8 +1613,6 @@ void convertFromDirect3DSurface9(IDirect3DSurface9* pDirect3DSurface9, OutputArr
int surfaceType = getTypeFromD3DFORMAT(desc.Format);
CV_Assert(surfaceType >= 0);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
cl_context context = (cl_context)ctx.ptr();
// TODO Need to specify ACCESS_WRITE here somehow to prevent useless data copying!
@@ -1739,8 +1625,8 @@ void convertFromDirect3DSurface9(IDirect3DSurface9* pDirect3DSurface9, OutputArr
cl_int status = 0;
cl_dx9_surface_info_khr surfaceInfo = {pDirect3DSurface9, (HANDLE)surfaceSharedHandle};
- cl_mem clImage = impl.clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_READ_ONLY,
- impl.isDirect3DDevice9Ex ? CL_ADAPTER_D3D9EX_KHR : CL_ADAPTER_D3D9_KHR,
+ cl_mem clImage = impl->clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_READ_ONLY,
+ impl->deviceEx ? CL_ADAPTER_D3D9EX_KHR : CL_ADAPTER_D3D9_KHR,
&surfaceInfo, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromDX9MediaSurfaceKHR failed");
@@ -1748,7 +1634,7 @@ void convertFromDirect3DSurface9(IDirect3DSurface9* pDirect3DSurface9, OutputArr
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_WRITE);
cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
- status = impl.clEnqueueAcquireDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueAcquireDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireDX9MediaSurfacesKHR failed");
size_t offset = 0; // TODO
@@ -1757,7 +1643,7 @@ void convertFromDirect3DSurface9(IDirect3DSurface9* pDirect3DSurface9, OutputArr
status = clEnqueueCopyImageToBuffer(q, clImage, clBuffer, src_origin, region, offset, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueCopyImageToBuffer failed");
- status = impl.clEnqueueReleaseDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
+ status = impl->clEnqueueReleaseDX9MediaSurfacesKHR(q, 1, &clImage, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseDX9MediaSurfacesKHR failed");
diff --git a/modules/core/src/directx.hpp b/modules/core/src/directx.hpp
deleted file mode 100644
index 9f23352d4d..0000000000
--- a/modules/core/src/directx.hpp
+++ /dev/null
@@ -1,23 +0,0 @@
-// This file is part of OpenCV project.
-// It is subject to the license terms in the LICENSE file found in the top-level directory
-// of this distribution and at http://opencv.org/license.html.
-
-#ifndef OPENCV_CORE_SRC_DIRECTX_HPP
-#define OPENCV_CORE_SRC_DIRECTX_HPP
-
-#ifndef HAVE_DIRECTX
-#error Invalid build configuration
-#endif
-
-namespace cv {
-namespace directx {
-namespace internal {
-
-struct OpenCLDirectXImpl;
-OpenCLDirectXImpl* createDirectXImpl();
-void deleteDirectXImpl(OpenCLDirectXImpl**);
-OpenCLDirectXImpl* getDirectXImpl(ocl::Context& ctx);
-
-}}} // namespace internal
-
-#endif // OPENCV_CORE_SRC_DIRECTX_HPP
diff --git a/modules/core/src/glob.cpp b/modules/core/src/glob.cpp
index fa8592caa5..b7cf1bf236 100644
--- a/modules/core/src/glob.cpp
+++ b/modules/core/src/glob.cpp
@@ -43,7 +43,9 @@
#include "precomp.hpp"
#include "opencv2/core/utils/filesystem.hpp"
+#include "opencv2/core/utils/filesystem.private.hpp"
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
#if defined _WIN32 || defined WINCE
# include
const char dir_separators[] = "/\\";
@@ -131,12 +133,15 @@ namespace
}
-#else
+#else // defined _WIN32 || defined WINCE
# include
# include
const char dir_separators[] = "/";
-#endif
+#endif // defined _WIN32 || defined WINCE
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
+
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
static bool isDir(const cv::String& path, DIR* dir)
{
#if defined _WIN32 || defined _WIN32_WCE
@@ -168,13 +173,20 @@ static bool isDir(const cv::String& path, DIR* dir)
return is_dir != 0;
#endif
}
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
bool cv::utils::fs::isDirectory(const cv::String& path)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
CV_INSTRUMENT_REGION();
return isDir(path, NULL);
+#else
+ CV_UNUSED(path);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif
}
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
static bool wildcmp(const char *string, const char *wild)
{
// Based on wildcmp written by Jack Handy - jakkhandy@hotmail.com
@@ -267,9 +279,11 @@ static void glob_rec(const cv::String& directory, const cv::String& wildchart, s
CV_Error_(CV_StsObjectNotFound, ("could not open directory: %s", directory.c_str()));
}
}
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
void cv::glob(String pattern, std::vector& result, bool recursive)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
CV_INSTRUMENT_REGION();
result.clear();
@@ -303,20 +317,44 @@ void cv::glob(String pattern, std::vector& result, bool recursive)
glob_rec(path, wildchart, result, recursive, false, path);
std::sort(result.begin(), result.end());
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(pattern);
+ CV_UNUSED(result);
+ CV_UNUSED(recursive);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
void cv::utils::fs::glob(const cv::String& directory, const cv::String& pattern,
std::vector& result,
bool recursive, bool includeDirectories)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
glob_rec(directory, pattern, result, recursive, includeDirectories, directory);
std::sort(result.begin(), result.end());
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(directory);
+ CV_UNUSED(pattern);
+ CV_UNUSED(result);
+ CV_UNUSED(recursive);
+ CV_UNUSED(includeDirectories);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
void cv::utils::fs::glob_relative(const cv::String& directory, const cv::String& pattern,
std::vector& result,
bool recursive, bool includeDirectories)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
glob_rec(directory, pattern, result, recursive, includeDirectories, cv::String());
std::sort(result.begin(), result.end());
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(directory);
+ CV_UNUSED(pattern);
+ CV_UNUSED(result);
+ CV_UNUSED(recursive);
+ CV_UNUSED(includeDirectories);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
diff --git a/modules/core/src/ocl.cpp b/modules/core/src/ocl.cpp
index ac52eeaf99..8749b29ec8 100644
--- a/modules/core/src/ocl.cpp
+++ b/modules/core/src/ocl.cpp
@@ -113,10 +113,6 @@
#include "opencv2/core/opencl/runtime/opencl_core.hpp"
-#ifdef HAVE_DIRECTX
-#include "directx.hpp"
-#endif
-
#ifdef HAVE_OPENCL_SVM
#include "opencv2/core/opencl/runtime/opencl_svm_20.hpp"
#include "opencv2/core/opencl/runtime/opencl_svm_hsa_extension.hpp"
@@ -2367,9 +2363,6 @@ protected:
, contextId(CV_XADD(&g_contextId, 1))
, configuration(configuration_)
, handle(0)
-#ifdef HAVE_DIRECTX
- , p_directx_impl(0)
-#endif
#ifdef HAVE_OPENCL_SVM
, svmInitialized(false)
#endif
@@ -2395,11 +2388,10 @@ protected:
handle = NULL;
}
devices.clear();
-#ifdef HAVE_DIRECTX
- directx::internal::deleteDirectXImpl(&p_directx_impl);
-#endif
}
+ userContextStorage.clear();
+
{
cv::AutoLock lock(cv::getInitializationMutex());
auto& container = getGlobalContainer();
@@ -2705,18 +2697,20 @@ public:
return *bufferPoolHostPtr_.get();
}
-#ifdef HAVE_DIRECTX
- directx::internal::OpenCLDirectXImpl* p_directx_impl;
-
- directx::internal::OpenCLDirectXImpl* getDirectXImpl()
- {
- if (!p_directx_impl)
- {
- p_directx_impl = directx::internal::createDirectXImpl();
- }
- return p_directx_impl;
+ std::map> userContextStorage;
+ cv::Mutex userContextMutex;
+ void setUserContext(std::type_index typeId, const std::shared_ptr& userContext) {
+ cv::AutoLock lock(userContextMutex);
+ userContextStorage[typeId] = userContext;
+ }
+ std::shared_ptr getUserContext(std::type_index typeId) {
+ cv::AutoLock lock(userContextMutex);
+ auto it = userContextStorage.find(typeId);
+ if (it != userContextStorage.end())
+ return it->second;
+ else
+ return nullptr;
}
-#endif
#ifdef HAVE_OPENCL_SVM
bool svmInitialized;
@@ -3036,6 +3030,25 @@ Context Context::create(const std::string& configuration)
return ctx;
}
+void* Context::getOpenCLContextProperty(int propertyId) const
+{
+ if (p == NULL)
+ return nullptr;
+ ::size_t size = 0;
+ CV_OCL_CHECK(clGetContextInfo(p->handle, CL_CONTEXT_PROPERTIES, 0, NULL, &size));
+ std::vector prop(size / sizeof(cl_context_properties), (cl_context_properties)0);
+ CV_OCL_CHECK(clGetContextInfo(p->handle, CL_CONTEXT_PROPERTIES, size, prop.data(), NULL));
+ for (size_t i = 0; i < prop.size(); i += 2)
+ {
+ if (prop[i] == (cl_context_properties)propertyId)
+ {
+ CV_LOG_DEBUG(NULL, "OpenCL: found context property=" << propertyId << ") => " << (void*)prop[i + 1]);
+ return (void*)prop[i + 1];
+ }
+ }
+ return nullptr;
+}
+
#ifdef HAVE_OPENCL_SVM
bool Context::useSVM() const
{
@@ -3097,6 +3110,21 @@ CV_EXPORTS bool useSVM(UMatUsageFlags usageFlags)
} // namespace cv::ocl::svm
#endif // HAVE_OPENCL_SVM
+Context::UserContext::~UserContext()
+{
+}
+
+void Context::setUserContext(std::type_index typeId, const std::shared_ptr& userContext)
+{
+ CV_Assert(p);
+ p->setUserContext(typeId, userContext);
+}
+
+std::shared_ptr Context::getUserContext(std::type_index typeId)
+{
+ CV_Assert(p);
+ return p->getUserContext(typeId);
+}
static void get_platform_name(cl_platform_id id, String& name)
{
@@ -3454,7 +3482,6 @@ struct Kernel::Impl
void registerImageArgument(int arg, const Image2D& image)
{
CV_CheckGE(arg, 0, "");
- CV_CheckLT(arg, (int)MAX_ARRS, "");
if (arg < (int)shadow_images.size() && shadow_images[arg].ptr() != image.ptr()) // TODO future: replace ptr => impl (more strong check)
{
CV_Check(arg, !isInProgress, "ocl::Kernel: clearing of pending Image2D arguments is not allowed");
@@ -7505,15 +7532,4 @@ uint64 Timer::durationNS() const
}} // namespace
-#ifdef HAVE_DIRECTX
-namespace cv { namespace directx { namespace internal {
-OpenCLDirectXImpl* getDirectXImpl(ocl::Context& ctx)
-{
- ocl::Context::Impl* i = ctx.getImpl();
- CV_Assert(i);
- return i->getDirectXImpl();
-}
-}}} // namespace cv::directx::internal
-#endif
-
#endif // HAVE_OPENCL
diff --git a/modules/core/src/ocl_disabled.impl.hpp b/modules/core/src/ocl_disabled.impl.hpp
index b5f9c4f69b..a217979a1e 100644
--- a/modules/core/src/ocl_disabled.impl.hpp
+++ b/modules/core/src/ocl_disabled.impl.hpp
@@ -172,9 +172,16 @@ Context& Context::getDefault(bool initialize)
}
void* Context::ptr() const { return NULL; }
+void* Context::getOpenCLContextProperty(int /*propertyId*/) const { OCL_NOT_AVAILABLE(); }
+
bool Context::useSVM() const { return false; }
void Context::setUseSVM(bool enabled) { }
+Context::UserContext::~UserContext() { }
+
+void Context::setUserContext(std::type_index /*typeId*/, const std::shared_ptr& /*userContext*/) { OCL_NOT_AVAILABLE(); }
+std::shared_ptr Context::getUserContext(std::type_index /*typeId*/) { OCL_NOT_AVAILABLE(); }
+
/* static */ Context Context::fromHandle(void* context) { OCL_NOT_AVAILABLE(); }
/* static */ Context Context::fromDevice(const ocl::Device& device) { OCL_NOT_AVAILABLE(); }
/* static */ Context Context::create(const std::string& configuration) { OCL_NOT_AVAILABLE(); }
diff --git a/modules/core/src/precomp.hpp b/modules/core/src/precomp.hpp
index 5a0a7637c2..3057729928 100644
--- a/modules/core/src/precomp.hpp
+++ b/modules/core/src/precomp.hpp
@@ -375,6 +375,8 @@ cv::Mutex& getInitializationMutex();
#define CV_SINGLETON_LAZY_INIT(TYPE, INITIALIZER) CV_SINGLETON_LAZY_INIT_(TYPE, INITIALIZER, instance)
#define CV_SINGLETON_LAZY_INIT_REF(TYPE, INITIALIZER) CV_SINGLETON_LAZY_INIT_(TYPE, INITIALIZER, *instance)
+CV_EXPORTS void releaseTlsStorageThread();
+
int cv_snprintf(char* buf, int len, const char* fmt, ...);
int cv_vsnprintf(char* buf, int len, const char* fmt, va_list args);
}
diff --git a/modules/core/src/system.cpp b/modules/core/src/system.cpp
index 97a2a289c7..c001de3aac 100644
--- a/modules/core/src/system.cpp
+++ b/modules/core/src/system.cpp
@@ -53,6 +53,8 @@
#include
#include
+#include
+
namespace cv {
static void _initSystem()
@@ -393,6 +395,7 @@ struct HWFeatures
g_hwFeatureNames[CPU_VSX3] = "VSX3";
g_hwFeatureNames[CPU_MSA] = "CPU_MSA";
+ g_hwFeatureNames[CPU_RISCVV] = "RISCVV";
g_hwFeatureNames[CPU_AVX512_COMMON] = "AVX512-COMMON";
g_hwFeatureNames[CPU_AVX512_SKX] = "AVX512-SKX";
@@ -588,6 +591,9 @@ struct HWFeatures
#if defined _ARM_ && (defined(_WIN32_WCE) && _WIN32_WCE >= 0x800)
have[CV_CPU_NEON] = true;
#endif
+ #ifdef __riscv_vector
+ have[CV_CPU_RISCVV] = true;
+ #endif
#ifdef __mips_msa
have[CV_CPU_MSA] = true;
#endif
@@ -947,6 +953,7 @@ String format( const char* fmt, ... )
String tempfile( const char* suffix )
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
String fname;
#ifndef NO_GETENV
const char *temp_dir = getenv("OPENCV_TEMP_PATH");
@@ -1033,6 +1040,10 @@ String tempfile( const char* suffix )
return fname + suffix;
}
return fname;
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(suffix);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
static ErrorCallback customErrorCallback = 0;
@@ -1468,6 +1479,9 @@ struct ThreadData
size_t idx; // Thread index in TLS storage. This is not OS thread ID!
};
+
+static bool g_isTlsStorageInitialized = false;
+
// Main TLS storage class
class TlsStorage
{
@@ -1477,6 +1491,7 @@ public:
{
tlsSlots.reserve(32);
threads.reserve(32);
+ g_isTlsStorageInitialized = true;
}
~TlsStorage()
{
@@ -1681,12 +1696,31 @@ static TlsStorage &getTlsStorage()
#ifndef _WIN32 // pthread key destructor
static void opencv_tls_destructor(void* pData)
{
+ if (!g_isTlsStorageInitialized)
+ return; // nothing to release, so prefer to avoid creation of new global structures
getTlsStorage().releaseThread(pData);
}
#else // _WIN32
#ifdef CV_USE_FLS
static void WINAPI opencv_fls_destructor(void* pData)
{
+ // Empiric detection of ExitProcess call
+ DWORD code = STILL_ACTIVE/*259*/;
+ BOOL res = GetExitCodeProcess(GetCurrentProcess(), &code);
+ if (res && code != STILL_ACTIVE)
+ {
+ // Looks like we are in ExitProcess() call
+ // This is FLS specific only because their callback is called before DllMain.
+ // TLS doesn't have similar problem, DllMain() is called first which mark __termination properly.
+ // Note: this workaround conflicts with ExitProcess() steps order described in documentation, however it works:
+ // 3. ... called with DLL_PROCESS_DETACH
+ // 7. The termination status of the process changes from STILL_ACTIVE to the exit value of the process.
+ // (ref: https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-exitprocess)
+ cv::__termination = true;
+ }
+
+ if (!g_isTlsStorageInitialized)
+ return; // nothing to release, so prefer to avoid creation of new global structures
getTlsStorage().releaseThread(pData);
}
#endif // CV_USE_FLS
@@ -1695,6 +1729,13 @@ static void WINAPI opencv_fls_destructor(void* pData)
} // namespace details
using namespace details;
+void releaseTlsStorageThread()
+{
+ if (!g_isTlsStorageInitialized)
+ return; // nothing to release, so prefer to avoid creation of new global structures
+ getTlsStorage().releaseThread();
+}
+
TLSDataContainer::TLSDataContainer()
{
key_ = (int)getTlsStorage().reserveSlot(this); // Reserve key from TLS storage
@@ -1778,7 +1819,7 @@ BOOL WINAPI DllMain(HINSTANCE, DWORD fdwReason, LPVOID lpReserved)
{
// Not allowed to free resources if lpReserved is non-null
// http://msdn.microsoft.com/en-us/library/windows/desktop/ms682583.aspx
- cv::getTlsStorage().releaseThread();
+ releaseTlsStorageThread();
}
}
return TRUE;
diff --git a/modules/core/src/utils/datafile.cpp b/modules/core/src/utils/datafile.cpp
index 6a53c73499..3af83a5d8f 100644
--- a/modules/core/src/utils/datafile.cpp
+++ b/modules/core/src/utils/datafile.cpp
@@ -16,6 +16,7 @@
#include "opencv2/core/utils/filesystem.hpp"
#include
+#include "opencv2/core/utils/filesystem.private.hpp"
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
@@ -67,6 +68,7 @@ CV_EXPORTS void addDataSearchSubDirectory(const cv::String& subdir)
_getDataSearchSubDirectory().push_back(subdir);
}
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
static bool isPathSep(char c)
{
return c == '/' || c == '\\';
@@ -96,12 +98,14 @@ static bool isSubDirectory_(const cv::String& base_path, const cv::String& path)
}
return true;
}
+
static bool isSubDirectory(const cv::String& base_path, const cv::String& path)
{
bool res = isSubDirectory_(base_path, path);
CV_LOG_VERBOSE(NULL, 0, "isSubDirectory(): base: " << base_path << " path: " << path << " => result: " << (res ? "TRUE" : "FALSE"));
return res;
}
+#endif //OPENCV_HAVE_FILESYSTEM_SUPPORT
static cv::String getModuleLocation(const void* addr)
{
@@ -188,6 +192,7 @@ cv::String findDataFile(const cv::String& relative_path,
const std::vector* search_paths,
const std::vector* subdir_paths)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
configuration_parameter = configuration_parameter ? configuration_parameter : "OPENCV_DATA_PATH";
CV_LOG_DEBUG(NULL, cv::format("utils::findDataFile('%s', %s)", relative_path.c_str(), configuration_parameter));
@@ -410,10 +415,18 @@ cv::String findDataFile(const cv::String& relative_path,
#endif
return cv::String(); // not found
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(relative_path);
+ CV_UNUSED(configuration_parameter);
+ CV_UNUSED(search_paths);
+ CV_UNUSED(subdir_paths);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
cv::String findDataFile(const cv::String& relative_path, bool required, const char* configuration_parameter)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
CV_LOG_DEBUG(NULL, cv::format("cv::utils::findDataFile('%s', %s, %s)",
relative_path.c_str(), required ? "true" : "false",
configuration_parameter ? configuration_parameter : "NULL"));
@@ -424,6 +437,12 @@ cv::String findDataFile(const cv::String& relative_path, bool required, const ch
if (result.empty() && required)
CV_Error(cv::Error::StsError, cv::format("OpenCV: Can't find required data file: %s", relative_path.c_str()));
return result;
+#else // OPENCV_HAVE_FILESYSTEM_SUPPORT
+ CV_UNUSED(relative_path);
+ CV_UNUSED(required);
+ CV_UNUSED(configuration_parameter);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
}
}} // namespace
diff --git a/modules/core/src/utils/samples.cpp b/modules/core/src/utils/samples.cpp
index c1162f85fe..5d1ee5af8b 100644
--- a/modules/core/src/utils/samples.cpp
+++ b/modules/core/src/utils/samples.cpp
@@ -11,6 +11,7 @@
#define CV_LOG_STRIP_LEVEL CV_LOG_LEVEL_VERBOSE + 1
#include "opencv2/core/utils/logger.hpp"
#include "opencv2/core/utils/filesystem.hpp"
+#include "opencv2/core/utils/filesystem.private.hpp"
namespace cv { namespace samples {
@@ -49,6 +50,7 @@ CV_EXPORTS void addSamplesDataSearchSubDirectory(const cv::String& subdir)
cv::String findFile(const cv::String& relative_path, bool required, bool silentMode)
{
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
CV_LOG_DEBUG(NULL, cv::format("cv::samples::findFile('%s', %s)", relative_path.c_str(), required ? "true" : "false"));
cv::String result = cv::utils::findDataFile(relative_path,
"OPENCV_SAMPLES_DATA_PATH",
@@ -61,6 +63,12 @@ cv::String findFile(const cv::String& relative_path, bool required, bool silentM
if (result.empty() && required)
CV_Error(cv::Error::StsError, cv::format("OpenCV samples: Can't find required data file: %s", relative_path.c_str()));
return result;
+#else
+ CV_UNUSED(relative_path);
+ CV_UNUSED(required);
+ CV_UNUSED(silentMode);
+ CV_Error(Error::StsNotImplemented, "File system support is disabled in this OpenCV build!");
+#endif
}
diff --git a/modules/core/src/va_intel.cpp b/modules/core/src/va_intel.cpp
index 1d2b1cbf32..a7623c37f4 100644
--- a/modules/core/src/va_intel.cpp
+++ b/modules/core/src/va_intel.cpp
@@ -7,6 +7,8 @@
#include "precomp.hpp"
+#include
+
#ifdef HAVE_VA
# include
#else // HAVE_VA
@@ -48,12 +50,28 @@ namespace cv { namespace va_intel {
#ifdef HAVE_VA_INTEL
-static clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn clGetDeviceIDsFromVA_APIMediaAdapterINTEL = NULL;
-static clCreateFromVA_APIMediaSurfaceINTEL_fn clCreateFromVA_APIMediaSurfaceINTEL = NULL;
-static clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn clEnqueueAcquireVA_APIMediaSurfacesINTEL = NULL;
-static clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn clEnqueueReleaseVA_APIMediaSurfacesINTEL = NULL;
-
-static bool contextInitialized = false;
+class VAAPIInterop : public ocl::Context::UserContext
+{
+public:
+ VAAPIInterop(cl_platform_id platform) {
+ clCreateFromVA_APIMediaSurfaceINTEL = (clCreateFromVA_APIMediaSurfaceINTEL_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clCreateFromVA_APIMediaSurfaceINTEL");
+ clEnqueueAcquireVA_APIMediaSurfacesINTEL = (clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueAcquireVA_APIMediaSurfacesINTEL");
+ clEnqueueReleaseVA_APIMediaSurfacesINTEL = (clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn)
+ clGetExtensionFunctionAddressForPlatform(platform, "clEnqueueReleaseVA_APIMediaSurfacesINTEL");
+ if (!clCreateFromVA_APIMediaSurfaceINTEL ||
+ !clEnqueueAcquireVA_APIMediaSurfacesINTEL ||
+ !clEnqueueReleaseVA_APIMediaSurfacesINTEL) {
+ CV_Error(cv::Error::OpenCLInitError, "OpenCL: Can't get extension function for VA-API interop");
+ }
+ }
+ virtual ~VAAPIInterop() {
+ }
+ clCreateFromVA_APIMediaSurfaceINTEL_fn clCreateFromVA_APIMediaSurfaceINTEL;
+ clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn clEnqueueAcquireVA_APIMediaSurfacesINTEL;
+ clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn clEnqueueReleaseVA_APIMediaSurfacesINTEL;
+};
#endif // HAVE_VA_INTEL
@@ -65,10 +83,8 @@ Context& initializeContextFromVA(VADisplay display, bool tryInterop)
#if !defined(HAVE_VA)
NO_VA_SUPPORT_ERROR;
#else // !HAVE_VA
- init_libva();
# ifdef HAVE_VA_INTEL
- contextInitialized = false;
if (tryInterop)
{
cl_uint numPlatforms;
@@ -97,20 +113,10 @@ Context& initializeContextFromVA(VADisplay display, bool tryInterop)
for (int i = 0; i < (int)numPlatforms; ++i)
{
// Get extension function pointers
-
+ clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn clGetDeviceIDsFromVA_APIMediaAdapterINTEL;
clGetDeviceIDsFromVA_APIMediaAdapterINTEL = (clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn)
clGetExtensionFunctionAddressForPlatform(platforms[i], "clGetDeviceIDsFromVA_APIMediaAdapterINTEL");
- clCreateFromVA_APIMediaSurfaceINTEL = (clCreateFromVA_APIMediaSurfaceINTEL_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clCreateFromVA_APIMediaSurfaceINTEL");
- clEnqueueAcquireVA_APIMediaSurfacesINTEL = (clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clEnqueueAcquireVA_APIMediaSurfacesINTEL");
- clEnqueueReleaseVA_APIMediaSurfacesINTEL = (clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn)
- clGetExtensionFunctionAddressForPlatform(platforms[i], "clEnqueueReleaseVA_APIMediaSurfacesINTEL");
-
- if (((void*)clGetDeviceIDsFromVA_APIMediaAdapterINTEL == NULL) ||
- ((void*)clCreateFromVA_APIMediaSurfaceINTEL == NULL) ||
- ((void*)clEnqueueAcquireVA_APIMediaSurfacesINTEL == NULL) ||
- ((void*)clEnqueueReleaseVA_APIMediaSurfacesINTEL == NULL))
+ if ((void*)clGetDeviceIDsFromVA_APIMediaAdapterINTEL == NULL)
{
continue;
}
@@ -151,8 +157,6 @@ Context& initializeContextFromVA(VADisplay display, bool tryInterop)
if (found >= 0)
{
- contextInitialized = true;
-
cl_platform_id platform = platforms[found];
std::string platformName = PlatformInfo(&platform).name();
@@ -160,6 +164,7 @@ Context& initializeContextFromVA(VADisplay display, bool tryInterop)
try
{
clExecCtx = OpenCLExecutionContext::create(platformName, platform, context, device);
+ clExecCtx.getContext().setUserContext(std::make_shared(platform));
}
catch (...)
{
@@ -520,7 +525,6 @@ void convertToVASurface(VADisplay display, InputArray src, VASurfaceID surface,
#if !defined(HAVE_VA)
NO_VA_SUPPORT_ERROR;
#else // !HAVE_VA
- init_libva();
const int stype = CV_8UC3;
@@ -531,7 +535,18 @@ void convertToVASurface(VADisplay display, InputArray src, VASurfaceID surface,
CV_Assert(srcSize.width == size.width && srcSize.height == size.height);
#ifdef HAVE_VA_INTEL
- if (contextInitialized)
+ ocl::OpenCLExecutionContext& ocl_context = ocl::OpenCLExecutionContext::getCurrent();
+ VAAPIInterop* interop = ocl_context.getContext().getUserContext().get();
+ CV_LOG_IF_DEBUG(NULL, !interop,
+ "OpenCL/VA_INTEL: Can't interop with current OpenCL context - missing VAAPIInterop API. "
+ "OpenCL context should be created through initializeContextFromVA()");
+ void* context_display = ocl_context.getContext().getOpenCLContextProperty(CL_CONTEXT_VA_API_DISPLAY_INTEL);
+ CV_LOG_IF_INFO(NULL, interop && !context_display,
+ "OpenCL/VA_INTEL: Can't interop with current OpenCL context - missing VA display, context re-creation is required");
+ bool isValidContextDisplay = (display == context_display);
+ CV_LOG_IF_INFO(NULL, interop && context_display && !isValidContextDisplay,
+ "OpenCL/VA_INTEL: Can't interop with current OpenCL context - VA display mismatch: " << context_display << "(context) vs " << (void*)display << "(surface)");
+ if (isValidContextDisplay && interop)
{
UMat u = src.getUMat();
@@ -541,28 +556,26 @@ void convertToVASurface(VADisplay display, InputArray src, VASurfaceID surface,
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_READ);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
- cl_context context = (cl_context)ctx.ptr();
+ cl_context context = (cl_context)ocl_context.getContext().ptr();
cl_int status = 0;
- cl_mem clImageY = clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_WRITE_ONLY, &surface, 0, &status);
+ cl_mem clImageY = interop->clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_WRITE_ONLY, &surface, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromVA_APIMediaSurfaceINTEL failed (Y plane)");
- cl_mem clImageUV = clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_WRITE_ONLY, &surface, 1, &status);
+ cl_mem clImageUV = interop->clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_WRITE_ONLY, &surface, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromVA_APIMediaSurfaceINTEL failed (UV plane)");
- cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
+ cl_command_queue q = (cl_command_queue)ocl_context.getQueue().ptr();
cl_mem images[2] = { clImageY, clImageUV };
- status = clEnqueueAcquireVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
+ status = interop->clEnqueueAcquireVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireVA_APIMediaSurfacesINTEL failed");
if (!ocl::ocl_convert_bgr_to_nv12(clBuffer, (int)u.step[0], u.cols, u.rows, clImageY, clImageUV))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_bgr_to_nv12 failed");
- clEnqueueReleaseVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
+ interop->clEnqueueReleaseVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseVA_APIMediaSurfacesINTEL failed");
@@ -580,6 +593,7 @@ void convertToVASurface(VADisplay display, InputArray src, VASurfaceID surface,
else
# endif // HAVE_VA_INTEL
{
+ init_libva();
Mat m = src.getMat();
// TODO Add support for roi
@@ -626,7 +640,6 @@ void convertFromVASurface(VADisplay display, VASurfaceID surface, Size size, Out
#if !defined(HAVE_VA)
NO_VA_SUPPORT_ERROR;
#else // !HAVE_VA
- init_libva();
const int dtype = CV_8UC3;
@@ -634,7 +647,9 @@ void convertFromVASurface(VADisplay display, VASurfaceID surface, Size size, Out
dst.create(size, dtype);
#ifdef HAVE_VA_INTEL
- if (contextInitialized)
+ ocl::OpenCLExecutionContext& ocl_context = ocl::OpenCLExecutionContext::getCurrent();
+ VAAPIInterop* interop = ocl_context.getContext().getUserContext().get();
+ if (display == ocl_context.getContext().getOpenCLContextProperty(CL_CONTEXT_VA_API_DISPLAY_INTEL) && interop)
{
UMat u = dst.getUMat();
@@ -644,28 +659,26 @@ void convertFromVASurface(VADisplay display, VASurfaceID surface, Size size, Out
cl_mem clBuffer = (cl_mem)u.handle(ACCESS_WRITE);
- using namespace cv::ocl;
- Context& ctx = Context::getDefault();
- cl_context context = (cl_context)ctx.ptr();
+ cl_context context = (cl_context)ocl_context.getContext().ptr();
cl_int status = 0;
- cl_mem clImageY = clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_READ_ONLY, &surface, 0, &status);
+ cl_mem clImageY = interop->clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_READ_ONLY, &surface, 0, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromVA_APIMediaSurfaceINTEL failed (Y plane)");
- cl_mem clImageUV = clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_READ_ONLY, &surface, 1, &status);
+ cl_mem clImageUV = interop->clCreateFromVA_APIMediaSurfaceINTEL(context, CL_MEM_READ_ONLY, &surface, 1, &status);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clCreateFromVA_APIMediaSurfaceINTEL failed (UV plane)");
- cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr();
+ cl_command_queue q = (cl_command_queue)ocl_context.getQueue().ptr();
cl_mem images[2] = { clImageY, clImageUV };
- status = clEnqueueAcquireVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
+ status = interop->clEnqueueAcquireVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueAcquireVA_APIMediaSurfacesINTEL failed");
if (!ocl::ocl_convert_nv12_to_bgr(clImageY, clImageUV, clBuffer, (int)u.step[0], u.cols, u.rows))
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: ocl_convert_nv12_to_bgr failed");
- status = clEnqueueReleaseVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
+ status = interop->clEnqueueReleaseVA_APIMediaSurfacesINTEL(q, 2, images, 0, NULL, NULL);
if (status != CL_SUCCESS)
CV_Error(cv::Error::OpenCLApiCallError, "OpenCL: clEnqueueReleaseVA_APIMediaSurfacesINTEL failed");
@@ -683,6 +696,7 @@ void convertFromVASurface(VADisplay display, VASurfaceID surface, Size size, Out
else
# endif // HAVE_VA_INTEL
{
+ init_libva();
Mat m = dst.getMat();
// TODO Add support for roi
diff --git a/modules/core/test/ocl/test_opencl.cpp b/modules/core/test/ocl/test_opencl.cpp
index e639f72948..daa023534d 100644
--- a/modules/core/test/ocl/test_opencl.cpp
+++ b/modules/core/test/ocl/test_opencl.cpp
@@ -132,6 +132,73 @@ TEST(OpenCL, support_SPIR_programs)
testOpenCLKernel(k);
}
+
+TEST(OpenCL, image2Dcount_regression_19334)
+{
+ cv::ocl::Context ctx = cv::ocl::Context::getDefault();
+ if (!ctx.ptr())
+ {
+ throw cvtest::SkipTestException("OpenCL is not available");
+ }
+ cv::ocl::Device device = cv::ocl::Device::getDefault();
+ if (!device.compilerAvailable())
+ {
+ throw cvtest::SkipTestException("OpenCL compiler is not available");
+ }
+
+ std::string module_name; // empty to disable OpenCL cache
+
+ static const char* opencl_kernel_src =
+"__kernel void test_kernel(int a,\n"
+" __global const uchar* src0, int src0_step, int src0_offset, int src0_rows, int src0_cols,\n"
+" __global const uchar* src1, int src1_step, int src1_offset, int src1_rows, int src1_cols,\n"
+" __global const uchar* src2, int src2_step, int src2_offset, int src2_rows, int src2_cols,\n"
+" __read_only image2d_t image)\n"
+"{\n"
+"}";
+ cv::ocl::ProgramSource src(module_name, "test_opencl_image_arg", opencl_kernel_src, "");
+ cv::String errmsg;
+ cv::ocl::Program program(src, "", errmsg);
+ ASSERT_TRUE(program.ptr() != NULL);
+ cv::ocl::Kernel k("test_kernel", program);
+ ASSERT_FALSE(k.empty());
+
+ std::vector images(4);
+ for (size_t i = 0; i < images.size(); ++i)
+ images[i] = UMat(10, 10, CV_8UC1);
+ cv::ocl::Image2D image;
+ try
+ {
+ cv::ocl::Image2D image_(images.back());
+ image = image_;
+ }
+ catch (const cv::Exception&)
+ {
+ throw cvtest::SkipTestException("OpenCL images are not supported");
+ }
+
+ int nargs = 0;
+ int a = 0;
+ nargs = k.set(nargs, a);
+ ASSERT_EQ(1, nargs);
+ nargs = k.set(nargs, images[0]);
+ ASSERT_EQ(6, nargs);
+ nargs = k.set(nargs, images[1]);
+ ASSERT_EQ(11, nargs);
+ nargs = k.set(nargs, images[2]);
+ ASSERT_EQ(16, nargs);
+
+ // do not throw (issue of #19334)
+ ASSERT_NO_THROW(nargs = k.set(nargs, image));
+ ASSERT_EQ(17, nargs);
+
+ // allow to replace image argument if kernel is not running
+ UMat image2(10, 10, CV_8UC1);
+ ASSERT_NO_THROW(nargs = k.set(16, cv::ocl::Image2D(image2)));
+ ASSERT_EQ(17, nargs);
+}
+
+
TEST(OpenCL, move_construct_assign)
{
cv::ocl::Context ctx1 = cv::ocl::Context::getDefault();
diff --git a/modules/core/test/test_intrin_utils.hpp b/modules/core/test/test_intrin_utils.hpp
index 269ebe0f2a..5c22caaf12 100644
--- a/modules/core/test/test_intrin_utils.hpp
+++ b/modules/core/test/test_intrin_utils.hpp
@@ -577,6 +577,25 @@ template struct TheTest
return *this;
}
+ TheTest & test_mul_hi()
+ {
+ // typedef typename V_RegTraits::w_reg Rx2;
+ Data dataA, dataB(32767);
+ R a = dataA, b = dataB;
+
+ R c = v_mul_hi(a, b);
+
+ Data resC = c;
+ const int n = R::nlanes / 2;
+ for (int i = 0; i < n; ++i)
+ {
+ SCOPED_TRACE(cv::format("i=%d", i));
+ EXPECT_EQ((typename R::lane_type)((dataA[i] * dataB[i]) >> 16), resC[i]);
+ }
+
+ return *this;
+ }
+
TheTest & test_abs()
{
typedef typename V_RegTraits::u_reg Ru;
@@ -1663,6 +1682,7 @@ void test_hal_intrin_uint16()
.test_arithm_wrap()
.test_mul()
.test_mul_expand()
+ .test_mul_hi()
.test_cmp()
.test_shift<1>()
.test_shift<8>()
@@ -1697,6 +1717,7 @@ void test_hal_intrin_int16()
.test_arithm_wrap()
.test_mul()
.test_mul_expand()
+ .test_mul_hi()
.test_cmp()
.test_shift<1>()
.test_shift<8>()
diff --git a/modules/core/test/test_mat.cpp b/modules/core/test/test_mat.cpp
index 9b6145d733..a5d844e7ad 100644
--- a/modules/core/test/test_mat.cpp
+++ b/modules/core/test/test_mat.cpp
@@ -2355,4 +2355,98 @@ TEST(Mat, regression_18473)
}
+TEST(Mat, ptrVecni_20044)
+{
+ Mat_ m(3,4); m << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12;
+ Vec2i idx(1,1);
+
+ uchar *u = m.ptr(idx);
+ EXPECT_EQ(int(6), *(int*)(u));
+ const uchar *cu = m.ptr(idx);
+ EXPECT_EQ(int(6), *(int*)(cu));
+
+ int *i = m.ptr(idx);
+ EXPECT_EQ(int(6), *(i));
+ const int *ci = m.ptr(idx);
+ EXPECT_EQ(int(6), *(ci));
+}
+
+TEST(Mat, reverse_iterator_19967)
+{
+ // empty iterator (#16855)
+ cv::Mat m_empty;
+ EXPECT_NO_THROW(m_empty.rbegin());
+ EXPECT_NO_THROW(m_empty.rend());
+ EXPECT_TRUE(m_empty.rbegin() == m_empty.rend());
+
+ // 1D test
+ std::vector data{0, 1, 2, 3};
+ const std::vector sizes_1d{4};
+
+ //Base class
+ cv::Mat m_1d(sizes_1d, CV_8U, data.data());
+ auto mismatch_it_pair_1d = std::mismatch(data.rbegin(), data.rend(), m_1d.rbegin());
+ EXPECT_EQ(mismatch_it_pair_1d.first, data.rend()); // expect no mismatch
+ EXPECT_EQ(mismatch_it_pair_1d.second, m_1d.rend());
+
+ //Templated derived class
+ cv::Mat_ m_1d_t(static_cast(sizes_1d.size()), sizes_1d.data(), data.data());
+ auto mismatch_it_pair_1d_t = std::mismatch(data.rbegin(), data.rend(), m_1d_t.rbegin());
+ EXPECT_EQ(mismatch_it_pair_1d_t.first, data.rend()); // expect no mismatch
+ EXPECT_EQ(mismatch_it_pair_1d_t.second, m_1d_t.rend());
+
+
+ // 2D test
+ const std::vector sizes_2d{2, 2};
+
+ //Base class
+ cv::Mat m_2d(sizes_2d, CV_8U, data.data());
+ auto mismatch_it_pair_2d = std::mismatch(data.rbegin(), data.rend(), m_2d.rbegin());
+ EXPECT_EQ(mismatch_it_pair_2d.first, data.rend());
+ EXPECT_EQ(mismatch_it_pair_2d.second, m_2d.rend());
+
+ //Templated derived class
+ cv::Mat_ m_2d_t(static_cast(sizes_2d.size()),sizes_2d.data(), data.data());
+ auto mismatch_it_pair_2d_t = std::mismatch(data.rbegin(), data.rend(), m_2d_t.rbegin());
+ EXPECT_EQ(mismatch_it_pair_2d_t.first, data.rend());
+ EXPECT_EQ(mismatch_it_pair_2d_t.second, m_2d_t.rend());
+
+ // 3D test
+ std::vector data_3d{0, 1, 2, 3, 4, 5, 6, 7};
+ const std::vector sizes_3d{2, 2, 2};
+
+ //Base class
+ cv::Mat m_3d(sizes_3d, CV_8U, data_3d.data());
+ auto mismatch_it_pair_3d = std::mismatch(data_3d.rbegin(), data_3d.rend(), m_3d.rbegin());
+ EXPECT_EQ(mismatch_it_pair_3d.first, data_3d.rend());
+ EXPECT_EQ(mismatch_it_pair_3d.second, m_3d.rend());
+
+ //Templated derived class
+ cv::Mat_ m_3d_t(static_cast(sizes_3d.size()),sizes_3d.data(), data_3d.data());
+ auto mismatch_it_pair_3d_t = std::mismatch(data_3d.rbegin(), data_3d.rend(), m_3d_t.rbegin());
+ EXPECT_EQ(mismatch_it_pair_3d_t.first, data_3d.rend());
+ EXPECT_EQ(mismatch_it_pair_3d_t.second, m_3d_t.rend());
+
+ // const test base class
+ const cv::Mat m_1d_const(sizes_1d, CV_8U, data.data());
+
+ auto mismatch_it_pair_1d_const = std::mismatch(data.rbegin(), data.rend(), m_1d_const.rbegin());
+ EXPECT_EQ(mismatch_it_pair_1d_const.first, data.rend()); // expect no mismatch
+ EXPECT_EQ(mismatch_it_pair_1d_const.second, m_1d_const.rend());
+
+ EXPECT_FALSE((std::is_assignable()), uchar>::value)) << "Constness of const iterator violated.";
+ EXPECT_FALSE((std::is_assignable()), uchar>::value)) << "Constness of const iterator violated.";
+
+ // const test templated dervied class
+ const cv::Mat_ m_1d_const_t(static_cast(sizes_1d.size()), sizes_1d.data(), data.data());
+
+ auto mismatch_it_pair_1d_const_t = std::mismatch(data.rbegin(), data.rend(), m_1d_const_t.rbegin());
+ EXPECT_EQ(mismatch_it_pair_1d_const_t.first, data.rend()); // expect no mismatch
+ EXPECT_EQ(mismatch_it_pair_1d_const_t.second, m_1d_const_t.rend());
+
+ EXPECT_FALSE((std::is_assignable::value)) << "Constness of const iterator violated.";
+ EXPECT_FALSE((std::is_assignable::value)) << "Constness of const iterator violated.";
+
+}
+
}} // namespace
diff --git a/modules/core/test/test_utils.cpp b/modules/core/test/test_utils.cpp
index d8789ddfc2..ed5f34603d 100644
--- a/modules/core/test/test_utils.cpp
+++ b/modules/core/test/test_utils.cpp
@@ -9,6 +9,7 @@
#include "opencv2/core/utils/buffer_area.private.hpp"
#include "test_utils_tls.impl.hpp"
+#include "opencv2/core/utils/filesystem.private.hpp"
namespace opencv_test { namespace {
@@ -336,7 +337,7 @@ TEST(Logger, DISABLED_message_if)
}
}
-
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
TEST(Samples, findFile)
{
cv::utils::logging::LogLevel prev = cv::utils::logging::setLogLevel(cv::utils::logging::LOG_LEVEL_VERBOSE);
@@ -353,6 +354,7 @@ TEST(Samples, findFile_missing)
ASSERT_ANY_THROW(path = samples::findFile("non-existed.file", true));
cv::utils::logging::setLogLevel(prev);
}
+#endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
template
inline bool buffers_overlap(T * first, size_t first_num, T * second, size_t second_num)
diff --git a/modules/dnn/CMakeLists.txt b/modules/dnn/CMakeLists.txt
index b0811fb223..4c8129cbda 100644
--- a/modules/dnn/CMakeLists.txt
+++ b/modules/dnn/CMakeLists.txt
@@ -176,16 +176,20 @@ ocv_add_perf_tests(${INF_ENGINE_TARGET}
FILES Include ${perf_hdrs}
)
-ocv_option(${the_module}_PERF_CAFFE "Add performance tests of Caffe framework" OFF)
-ocv_option(${the_module}_PERF_CLCAFFE "Add performance tests of clCaffe framework" OFF)
+ocv_option(OPENCV_DNN_PERF_CAFFE "Add performance tests of Caffe framework" OFF)
+ocv_option(OPENCV_DNN_PERF_CLCAFFE "Add performance tests of clCaffe framework" OFF)
if(BUILD_PERF_TESTS)
- if (${the_module}_PERF_CAFFE)
+ if (OPENCV_DNN_PERF_CAFFE
+ OR ${the_module}_PERF_CAFFE # compatibility for deprecated option
+ )
find_package(Caffe QUIET)
if (Caffe_FOUND)
add_definitions(-DHAVE_CAFFE=1)
ocv_target_link_libraries(opencv_perf_dnn caffe)
endif()
- elseif(${the_module}_PERF_CLCAFFE)
+ elseif(OPENCV_DNN_PERF_CLCAFFE
+ OR ${the_module}_PERF_CAFFE # compatibility for deprecated option
+ )
find_package(Caffe QUIET)
if (Caffe_FOUND)
add_definitions(-DHAVE_CLCAFFE=1)
diff --git a/modules/dnn/include/opencv2/dnn/dnn.hpp b/modules/dnn/include/opencv2/dnn/dnn.hpp
index 0743de00ab..255b41de88 100644
--- a/modules/dnn/include/opencv2/dnn/dnn.hpp
+++ b/modules/dnn/include/opencv2/dnn/dnn.hpp
@@ -738,9 +738,11 @@ CV__DNN_INLINE_NS_BEGIN
CV_WRAP void enableFusion(bool fusion);
/** @brief Returns overall time for inference and timings (in ticks) for layers.
+ *
* Indexes in returned vector correspond to layers ids. Some layers can be fused with others,
- * in this case zero ticks count will be return for that skipped layers.
- * @param timings vector for tick timings for all layers.
+ * in this case zero ticks count will be return for that skipped layers. Supported by DNN_BACKEND_OPENCV on DNN_TARGET_CPU only.
+ *
+ * @param[out] timings vector for tick timings for all layers.
* @return overall ticks for model inference.
*/
CV_WRAP int64 getPerfProfile(CV_OUT std::vector& timings);
diff --git a/modules/dnn/src/ie_ngraph.cpp b/modules/dnn/src/ie_ngraph.cpp
index 49717f8513..7484032714 100644
--- a/modules/dnn/src/ie_ngraph.cpp
+++ b/modules/dnn/src/ie_ngraph.cpp
@@ -20,6 +20,9 @@
#include
#include
+#include "opencv2/core/utils/filesystem.hpp"
+#include "opencv2/core/utils/filesystem.private.hpp"
+
namespace cv { namespace dnn {
#ifdef HAVE_DNN_NGRAPH
@@ -683,6 +686,23 @@ void InfEngineNgraphNet::initPlugin(InferenceEngine::CNNNetwork& net)
ie.SetConfig({{
InferenceEngine::PluginConfigParams::KEY_CPU_THREADS_NUM, format("%d", getNumThreads()),
}}, device_name);
+#endif
+#if INF_ENGINE_VER_MAJOR_GE(INF_ENGINE_RELEASE_2021_2)
+ if (device_name.find("GPU") == 0)
+ {
+#if OPENCV_HAVE_FILESYSTEM_SUPPORT
+ std::string cache_path = utils::fs::getCacheDirectory((std::string("dnn_ie_cache_") + device_name).c_str(), "OPENCV_DNN_IE_GPU_CACHE_DIR");
+#else
+ std::string cache_path = utils::getConfigurationParameterString("OPENCV_DNN_IE_GPU_CACHE_DIR", "");
+#endif
+ if (!cache_path.empty() && cache_path != "disabled")
+ {
+ CV_LOG_INFO(NULL, "OpenCV/nGraph: using GPU kernels cache: " << cache_path);
+ ie.SetConfig({{
+ InferenceEngine::PluginConfigParams::KEY_CACHE_DIR, cache_path,
+ }}, device_name);
+ }
+ }
#endif
}
std::map config;
diff --git a/modules/dnn/src/layers/crop_and_resize_layer.cpp b/modules/dnn/src/layers/crop_and_resize_layer.cpp
index a4443ed3a2..eb8822870f 100644
--- a/modules/dnn/src/layers/crop_and_resize_layer.cpp
+++ b/modules/dnn/src/layers/crop_and_resize_layer.cpp
@@ -133,7 +133,8 @@ public:
auto input = nodes[0].dynamicCast()->node;
auto rois = nodes[1].dynamicCast()->node;
- std::vector dims = rois->get_shape(), offsets(4, 0);
+ auto rois_shape = rois->get_shape();
+ std::vector dims(rois_shape.begin(), rois_shape.end()), offsets(4, 0);
offsets[3] = 2;
dims[3] = 7;
@@ -147,7 +148,7 @@ public:
lower_bounds, upper_bounds, strides, std::vector{}, std::vector{});
// Reshape rois from 4D to 2D
- std::vector shapeData = {dims[2], 5};
+ std::vector shapeData = {dims[2], 5};
auto shape = std::make_shared(ngraph::element::i64, ngraph::Shape{2}, shapeData.data());
auto reshape = std::make_shared(slice, shape, true);
diff --git a/modules/features2d/include/opencv2/features2d.hpp b/modules/features2d/include/opencv2/features2d.hpp
index c5a94a68fd..38bd7ae487 100644
--- a/modules/features2d/include/opencv2/features2d.hpp
+++ b/modules/features2d/include/opencv2/features2d.hpp
@@ -61,25 +61,11 @@ easily switch between different algorithms solving the same problem. This sectio
matching descriptors that are represented as vectors in a multidimensional space. All objects that
implement vector descriptor matchers inherit the DescriptorMatcher interface.
-@note
- - An example explaining keypoint matching can be found at
- opencv_source_code/samples/cpp/descriptor_extractor_matcher.cpp
- - An example on descriptor matching evaluation can be found at
- opencv_source_code/samples/cpp/detector_descriptor_matcher_evaluation.cpp
- - An example on one to many image matching can be found at
- opencv_source_code/samples/cpp/matching_to_many_images.cpp
-
@defgroup features2d_draw Drawing Function of Keypoints and Matches
@defgroup features2d_category Object Categorization
This section describes approaches based on local 2D features and used to categorize objects.
-@note
- - A complete Bag-Of-Words sample can be found at
- opencv_source_code/samples/cpp/bagofwords_classification.cpp
- - (Python) An example using the features2D framework to perform object categorization can be
- found at opencv_source_code/samples/python/find_obj.py
-
@defgroup feature2d_hal Hardware Acceleration Layer
@{
@defgroup features2d_hal_interface Interface
@@ -90,7 +76,7 @@ This section describes approaches based on local 2D features and used to categor
namespace cv
{
-//! @addtogroup features2d
+//! @addtogroup features2d_main
//! @{
// //! writes vector of keypoints to the file storage
@@ -237,9 +223,6 @@ the vector descriptor extractors inherit the DescriptorExtractor interface.
*/
typedef Feature2D DescriptorExtractor;
-//! @addtogroup features2d_main
-//! @{
-
/** @brief Class for implementing the wrapper which makes detectors and extractors to be affine invariant,
described as ASIFT in @cite YM11 .
@@ -486,20 +469,20 @@ class CV_EXPORTS_W MSER : public Feature2D
public:
/** @brief Full constructor for %MSER detector
- @param _delta it compares \f$(size_{i}-size_{i-delta})/size_{i-delta}\f$
- @param _min_area prune the area which smaller than minArea
- @param _max_area prune the area which bigger than maxArea
- @param _max_variation prune the area have similar size to its children
- @param _min_diversity for color image, trace back to cut off mser with diversity less than min_diversity
- @param _max_evolution for color image, the evolution steps
- @param _area_threshold for color image, the area threshold to cause re-initialize
- @param _min_margin for color image, ignore too small margin
- @param _edge_blur_size for color image, the aperture size for edge blur
+ @param delta it compares \f$(size_{i}-size_{i-delta})/size_{i-delta}\f$
+ @param min_area prune the area which smaller than minArea
+ @param max_area prune the area which bigger than maxArea
+ @param max_variation prune the area have similar size to its children
+ @param min_diversity for color image, trace back to cut off mser with diversity less than min_diversity
+ @param max_evolution for color image, the evolution steps
+ @param area_threshold for color image, the area threshold to cause re-initialize
+ @param min_margin for color image, ignore too small margin
+ @param edge_blur_size for color image, the aperture size for edge blur
*/
- CV_WRAP static Ptr create( int _delta=5, int _min_area=60, int _max_area=14400,
- double _max_variation=0.25, double _min_diversity=.2,
- int _max_evolution=200, double _area_threshold=1.01,
- double _min_margin=0.003, int _edge_blur_size=5 );
+ CV_WRAP static Ptr create( int delta=5, int min_area=60, int max_area=14400,
+ double max_variation=0.25, double min_diversity=.2,
+ int max_evolution=200, double area_threshold=1.01,
+ double min_margin=0.003, int edge_blur_size=5 );
/** @brief Detect %MSER regions
diff --git a/modules/flann/include/opencv2/flann/any.h b/modules/flann/include/opencv2/flann/any.h
index f5684e9962..4906fec081 100644
--- a/modules/flann/include/opencv2/flann/any.h
+++ b/modules/flann/include/opencv2/flann/any.h
@@ -167,17 +167,15 @@ class SinglePolicy
public:
static base_any_policy* get_policy();
-
-private:
- static typename choose_policy::type policy;
};
-template
-typename choose_policy::type SinglePolicy::policy;
-
/// This function will return a different policy for each type.
template
-inline base_any_policy* SinglePolicy::get_policy() { return &policy; }
+inline base_any_policy* SinglePolicy::get_policy()
+{
+ static typename choose_policy::type policy;
+ return &policy;
+}
} // namespace anyimpl
diff --git a/modules/gapi/CMakeLists.txt b/modules/gapi/CMakeLists.txt
index 6b586c1f99..b26b613e72 100644
--- a/modules/gapi/CMakeLists.txt
+++ b/modules/gapi/CMakeLists.txt
@@ -162,6 +162,9 @@ set(gapi_srcs
# Python bridge
src/backends/ie/bindings_ie.cpp
src/backends/python/gpythonbackend.cpp
+
+ # Utils (ITT tracing)
+ src/utils/itt.cpp
)
ocv_add_dispatched_file(backends/fluid/gfluidimgproc_func SSE4_1 AVX2)
@@ -178,13 +181,22 @@ ocv_module_include_directories("${CMAKE_CURRENT_LIST_DIR}/src")
ocv_create_module()
ocv_target_link_libraries(${the_module} PRIVATE ade)
+
if(OPENCV_GAPI_INF_ENGINE)
ocv_target_link_libraries(${the_module} PRIVATE ${INF_ENGINE_TARGET})
endif()
+
if(HAVE_TBB)
ocv_target_link_libraries(${the_module} PRIVATE tbb)
endif()
+# TODO: Consider support of ITT in G-API standalone mode.
+if(CV_TRACE AND HAVE_ITT)
+ ocv_target_compile_definitions(${the_module} PRIVATE -DOPENCV_WITH_ITT=1)
+ ocv_module_include_directories(${ITT_INCLUDE_DIRS})
+ ocv_target_link_libraries(${the_module} PRIVATE ${ITT_LIBRARIES})
+endif()
+
set(__test_extra_deps "")
if(OPENCV_GAPI_INF_ENGINE)
list(APPEND __test_extra_deps ${INF_ENGINE_TARGET})
diff --git a/modules/gapi/include/opencv2/gapi/core.hpp b/modules/gapi/include/opencv2/gapi/core.hpp
index cb5d55d13f..cb8a6127d7 100644
--- a/modules/gapi/include/opencv2/gapi/core.hpp
+++ b/modules/gapi/include/opencv2/gapi/core.hpp
@@ -575,6 +575,12 @@ namespace core {
return std::make_tuple(empty_gopaque_desc(), empty_array_desc(), empty_array_desc());
}
};
+
+ G_TYPED_KERNEL(GTranspose, , "org.opencv.core.transpose") {
+ static GMatDesc outMeta(GMatDesc in) {
+ return in.withSize({in.size.height, in.size.width});
+ }
+ };
} // namespace core
namespace streaming {
@@ -1490,7 +1496,7 @@ enlarge an image, it will generally look best with cv::INTER_CUBIC (slow) or cv:
@sa warpAffine, warpPerspective, remap, resizeP
*/
-GAPI_EXPORTS GMat resize(const GMat& src, const Size& dsize, double fx = 0, double fy = 0, int interpolation = INTER_LINEAR);
+GAPI_EXPORTS_W GMat resize(const GMat& src, const Size& dsize, double fx = 0, double fy = 0, int interpolation = INTER_LINEAR);
/** @brief Resizes a planar image.
@@ -1927,6 +1933,21 @@ GAPI_EXPORTS std::tuple,GArray,GArray>
kmeans(const GArray& data, const int K, const GArray& bestLabels,
const TermCriteria& criteria, const int attempts, const KmeansFlags flags);
+
+/** @brief Transposes a matrix.
+
+The function transposes the matrix:
+\f[\texttt{dst} (i,j) = \texttt{src} (j,i)\f]
+
+@note
+ - Function textual ID is "org.opencv.core.transpose"
+ - No complex conjugation is done in case of a complex matrix. It should be done separately if needed.
+
+@param src input array.
+*/
+GAPI_EXPORTS GMat transpose(const GMat& src);
+
+
namespace streaming {
/** @brief Gets dimensions from Mat.
diff --git a/modules/gapi/include/opencv2/gapi/gcommon.hpp b/modules/gapi/include/opencv2/gapi/gcommon.hpp
index 8119e397eb..a9cb015901 100644
--- a/modules/gapi/include/opencv2/gapi/gcommon.hpp
+++ b/modules/gapi/include/opencv2/gapi/gcommon.hpp
@@ -195,6 +195,14 @@ private:
using GCompileArgs = std::vector