[GSoC] New universal intrinsic backend for RVV
* Add new rvv backend (partially implemented).
* Modify the framework of Universal Intrinsic.
* Add CV_SIMD macro guards to current UI code.
* Use vlanes() instead of nlanes.
* Modify the UI test.
* Enable the new RVV (scalable) backend.
* Remove whitespace.
* Rename and some others modify.
* Update intrin.hpp but still not work on AVX/SSE
* Update conditional compilation macros.
* Use static variable for vlanes.
* Use max_nlanes for array defining.
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/vsx_utils.hpp:352:12: warning: 'vec_permi' macro redefined [-Wmacro-redefined]
# define vec_permi(a, b, c) vec_xxpermdi(b, a, (3 ^ (((c) & 1) << 1 | (c) >> 1)))
^
/usr/lib/clang/13.0.0/include/altivec.h:13077:9: note: previous definition is here
#define vec_permi(__a, __b, __c) \
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/vsx_utils.hpp:370:25: error: redefinition of 'vec_promote'
VSX_FINLINE(vec_dword2) vec_promote(long long a, int b)
^
/usr/lib/clang/13.0.0/include/altivec.h:14604:1: note: previous definition is here
vec_promote(signed long long __a, int __b) {
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/vsx_utils.hpp:377:26: error: redefinition of 'vec_promote'
VSX_FINLINE(vec_udword2) vec_promote(unsigned long long a, int b)
^
/usr/lib/clang/13.0.0/include/altivec.h:14611:1: note: previous definition is here
vec_promote(unsigned long long __a, int __b) {
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/hal/intrin_vsx.hpp:1045:22: error: call to 'vec_rsqrt' is ambiguous
{ return v_float32x4(vec_rsqrt(x.val)); }
^~~~~~~~~
/usr/lib/clang/13.0.0/include/altivec.h:8472:34: note: candidate function
static vector float __ATTRS_o_ai vec_rsqrt(vector float __a) {
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/vsx_utils.hpp:362:29: note: candidate function
VSX_FINLINE(vec_float4) vec_rsqrt(const vec_float4& a)
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/hal/intrin_vsx.hpp:1047:22: error: call to 'vec_rsqrt' is ambiguous
{ return v_float64x2(vec_rsqrt(x.val)); }
^~~~~~~~~
/usr/lib/clang/13.0.0/include/altivec.h:8477:35: note: candidate function
static vector double __ATTRS_o_ai vec_rsqrt(vector double __a) {
^
/wrkdirs/usr/ports/graphics/opencv/work/opencv-4.5.5/modules/core/include/opencv2/core/vsx_utils.hpp:365:30: note: candidate function
VSX_FINLINE(vec_double2) vec_rsqrt(const vec_double2& a)
^
1 warning and 4 errors generated.
The specific functions were added to altivec.h in LLVM's 1ff93618e58df210def48d26878c20a1b414d900, c3da07d216dd20fbdb7302fd085c0a59e189ae3d and 10cc5bcd868c433f9a781aef82178b04e98bd098.
All classes are registered in the scope that corresponds to C++
namespace or exported class.
Example:
`cv::ml::Boost` is exported as `cv.ml.Boost`
`cv::SimpleBlobDetector::Params` is exported as
`cv.SimpleBlobDetector.Params`
For backward compatibility all classes are registered in the global
module with their mangling name containing scope information.
Example:
`cv::ml::Boost` has `cv.ml_Boost` alias to `cv.ml.Boost` type
* Added NEON support in builds for Windows on ARM
* Fixed `HAVE_CPU_NEON_SUPPORT` display broken during compiler test
* Fixed a build error prior to Visual Studio 2022
4.x: submodule or a class scope for exported classes
* feature: submodule or a class scope for exported classes
All classes are registered in the scope that corresponds to C++
namespace or exported class.
Example:
`cv::ml::Boost` is exported as `cv.ml.Boost`
`cv::SimpleBlobDetector::Params` is exported as
`cv.SimpleBlobDetector.Params`
For backward compatibility all classes are registered in the global
module with their mangling name containing scope information.
Example:
`cv::ml::Boost` has `cv.ml_Boost` alias to `cv.ml.Boost` type
* refactor: remove redundant GAPI aliases
* fix: use explicit string literals in CVPY_TYPE macro
* fix: add handling for class aliases
- Add special case handling when submodule has the same name as parent
- `PyDict_SetItemString` doesn't steal reference, so reference count
should be explicitly decremented to transfer object life-time
ownership
- Add sanity checks for module registration input
- Add Python 2 and Python 3 reference counting handling
clang-cl defines both __clang__ and _MSC_VER, yet uses `#pragma GCC` to disable certain diagnostics.
At the time `-Wreturn-type-c-linkage` was reported by clang-cl.
This PR fixes this behavior by reordering defines.
- Add special case handling when submodule has the same name as parent
- `PyDict_SetItemString` doesn't steal reference, so reference count
should be explicitly decremented to transfer object life-time
ownership
- Add sanity checks for module registration input
Fow now, it is possible to define valid rectangle for which some
functions overflow (e.g. br(), ares() ...).
This patch fixes the intersection operator so that it works with
any rectangle.
Update RVV backend for using Clang.
* Update cmake file of clang.
* Modify the RVV optimization on DNN to adapt to clang.
* Modify intrin_rvv: Disable some existing types.
* Modify intrin_rvv: Reinterpret instead of load&cast.
* Modify intrin_rvv: Update load&store without cast.
* Modify intrin_rvv: Rename vfredsum to fredosum.
* Modify intrin_rvv: Rewrite Check all/any by using vpopc.
* Modify intrin_rvv: Use reinterpret instead of c-style casting.
* Remove all macros which is not used in v_reinterpret
* Rename vpopc to vcpop according to spec.
* feat: OpenCV extension with pure Python modules
* feat: cv2 is now a Python package instead of extension module
Python package cv2 now can handle both Python and C extension modules
properly without additional "subfolders" like "_extra_py_code".
* feat: can call native function from its reimplementation in Python
`PyObject*` to `std::vector<T>` conversion logic:
- If user passed Numpy Array
- If array is planar and T is a primitive type (doesn't require
constructor call) that matches with the element type of array, then
copy element one by one with the respect of the step between array
elements. If compiler is lucky (or brave enough) copy loop can be
vectorized.
For classes that require constructor calls this path is not
possible, because we can't begin an object lifetime without hacks.
- Otherwise fall-back to general case
- Otherwise - execute the general case:
If PyObject* corresponds to Sequence protocol - iterate over the
sequence elements and invoke the appropriate `pyopencv_to` function.
`std::vector<T>` to `PyObject*` conversion logic:
- If `std::vector<T>` is empty - return empty tuple.
- If `T` has a corresponding `Mat` `DataType` than return
Numpy array instance of the matching `dtype` e.g.
`std::vector<cv::Rect>` is returned as `np.ndarray` of shape `Nx4` and
`dtype=int`.
This branch helps to optimize further evaluations in user code.
- Otherwise - execute the general case:
Construct a tuple of length N = `std::vector::size` and insert
elements one by one.
Unnecessary functions were removed and code was rearranged to allow
compiler select the appropriate conversion function specialization.
Moving RGBD parts to 3d
* files moved from rgbd module in contrib repo
* header paths fixed
* perf file added
* lapack compilation fixed
* Rodrigues fixed in tests
* rgbd namespace removed
* headers fixed
* initial: rgbd files moved to 3d module
* rgbd updated from latest contrib master; less file duplication
* "std::" for sin(), cos(), etc.
* KinFu family -> back to contrib
* paths & namespaces
* removed duplicates, file version updated
* namespace kinfu removed from 3d module
* forgot to move test_colored_kinfu.cpp to contrib
* tests fixed: Params removed
* kinfu namespace removed
* it works without objc bindings
* include headers fixed
* tests: data paths fixed
* headers moved to/from public API
* Intr -> Matx33f in public API
* from kinfu_frame.hpp to utils.hpp
* submap: Intr -> Matx33f, HashTSDFVolume -> Volume; no extra headers
* no RgbdFrame class, no Mat fields & arg -> InputArray & pImpl
* get/setPyramidAt() instead of lots of methods
* Mat -> InputArray, TMat
* prepareFrameCache: refactored
* FastICPOdometry: +truncate threshold, +depthFactor; Mat/UMat choose
* Mat/UMat choose
* minor stuff related to headers
* (un)signed int warnings; compilation minor issues
* minors: submap: pyramids -> OdometryFrame; tests fix; FastICP minor; CV_EXPORTS_W for kinfu_frame.hpp
* FastICPOdometry: caching, rgbCameraMatrix
* OdometryFrame: pyramid%s% -> pyramids[]
* drop: rgbCameraMatrix from FastICP, RGB cache mode, makeColoredFrameFrom depth and all color-functions it calls
* makeFrameFromDepth, buildPyramidPointsNormals -> from public to internal utils.hpp
* minors
* FastICPOdometry: caching updated, init fields
* OdometryFrameImpl<UMat> fixed
* matrix building fixed; minors
* returning linemode back to contrib
* params.pose is Mat now
* precomp headers reorganized
* minor fixes, header paths, extra header removed
* minors: intrinsics -> utils.hpp; whitespaces; empty namespace; warning fixed
* moving declarations from/to headers
* internal headers reorganized (once again)
* fix include
* extra var fix
* fix include, fix (un)singed warning
* calibration.cpp: reverting back
* headers fix
* workaround to fix bindings
* temporary removed wrappers
* VolumeType -> VolumeParams
* (temporarily) removing wrappers for Volume and VolumeParams
* pyopencv_linemod -> contrib
* try to fix test_rgbd.py
* headers fixed
* fixing wrappers for rgbd
* fixing docs
* fixing rgbdPlane
* RgbdNormals wrapped
* wrap Volume and VolumeParams, VolumeType from enum to int
* DepthCleaner wrapped
* header folder "rgbd" -> "3d"
* fixing header path
* VolumeParams referenced by Ptr to support Python wrappers
* render...() fixed
* Ptr<VolumeParams> fixed
* makeVolume(... resolution -> [X, Y, Z])
* fixing static declaration
* try to fix ios objc bindings
* OdometryFrame::release...() removed
* fix for Odometry algos not supporting UMats: prepareFrameCache<>()
* preparePyramidMask(): fix to compile with TMat = UMat
* fixing debug guards
* removing references back; adding makeOdometryFrame() instead
* fixing OpenCL ICP hanging (some threads exit before reaching the barrier -> the rest threads hang)
* try to fix objc wrapper warnings; rerun builders
* VolumeType -> VolumeKind
* try to fix OCL bug
* prints removed
* indentation fixed
* headers fixed
* license fix
* WillowGarage licence notion removed, since it's in OpenCV's COPYRIGHT already
* KinFu license notion shortened
* debugging code removed
* include guards fixed
* KinFu license left in contrib module
* isValidDepth() moved to private header
* indentation fix
* indentation fix in src files
* RgbdNormals rewritten to pImpl
* minor
* DepthCleaner removed due to low code quality, no depthScale provided, no depth images found to be successfully filtered; can be replaced by bilateral filtering
* minors, indentation
* no "private" in public headers
* depthTo3d test moved from separate file
* Normals: setDepth() is useless, removing it
* RgbdPlane => findPlanes()
* rescaleDepth(): minor
* warpFrame: minor
* minor TODO
* all Odometries (except base abstract class) rewritten to pImpl
* FastICPOdometry now supports maxRotation and maxTranslation
* minor
* Odometry's children: now checks are done in setters
* get rid of protected members in Odometry class
* get/set cameraMatrix, transformType, maxRot/Trans, iters, minGradients -> OdometryImpl
* cameraMatrix: from double to float
* matrix exponentiation: Eigen -> dual quaternions
* Odometry evaluation fixed to reuse existing code
* "small" macro fixed by undef
* pixNorm is calculated on CPU only now (and then uploads on GPU)
* test registration: no cvtest classes
* test RgbdNormals and findPlanes(): no cvtest classes
* test_rgbd.py: minor fix
* tests for Odometry: no cvtest classes; UMat tests; logging fixed
* more CV_OVERRIDE to overriden functions
* fixing nondependent names to dependent
* more to prev commit
* forgotten fixes: overriden functions, (non)dependent names
* FastICPOdometry: fix UMat support when OpenCL is off
* try to fix compilation: missing namespaces
* Odometry: static const-mimicking functions to internal constants
* forgotten change to prev commit
* more forgotten fixes
* do not expose "submap.hpp" by default
* in-class enums: give names, CamelCase, int=>enums; minors
* namespaces, underscores, String
* std::map is used by pose graph, adding it
* compute()'s signature fixed, computeImpl()'s too
* RgbdNormals: Mat -> InputArray
* depth.hpp: Mat -> InputArray
* cameraMatrix: Matx33f -> InputArray + default value + checks
* "details" headers are not visible by default
* TSDF tests: rearranging checks
* cameraMatrix: no (realistic) default value
* renderPointsNormals*(): no wrappers for them
* debug: assert on empty frame in TSDF tests
* debugging code for TSDF GPU
* debug from integrate to raycast
* no (non-zero) default camera matrix anymore
* drop debugging code (does not help)
* try to fix TSDF GPU: constant -> global const ptr
docs(core/ocl): clarify ownership of arguments passed into OpenCL related functions
* docs(core/ocl): clarify ownership in OpenCLExecutionContext::create
Although it is technically true that OpenCLExecutionContext::create
calls `clRetainContext` on its context argument, it is misleading
because it does not increase the reference count overall. Clarify that
the ownership of one reference of the passed context and device is
taken.
* docs(core/ocl): document ownership transfer in ocl::Device::fromHandle
bug fixes for universal intrinsics of RISC-V back-end
* Align universal intrinsic comparator behaviour with other platforms
Set all bits to one for return value of int and fp comparators.
* fix v_pack_triplets, v_pack_store and v_pack_u_store
* Remove redundant CV_DECL_ALIGNED statements
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
* [build][option] Introduce `OPENCV_DISABLE_THREAD_SUPPORT` option.
The option forces the library to build without thread support.
* update handling of OPENCV_DISABLE_THREAD_SUPPORT
- reduce amount of #if conditions
* [to squash] cmake: apply mode vars in toolchains too
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* Support cl_image conversion for CL_HALF_FLOAT (float16)
* Support cl_image conversion for additional channel orders:
CL_A, CL_INTENSITY, CL_LUMINANCE, CL_RG, CL_RA
* Comment on why cl_image conversion is unsupported for CL_RGB
* Predict optimal vector width for float16
* ocl::kernelToStr: support float16
* ocl::Device::halfFPConfig: drop artificial requirement for OpenCL
version >= 1.2. Even OpenCL 1.0 supports the underlying config
property, CL_DEVICE_HALF_FP_CONFIG.
* dumpOpenCLInformation: provide info on OpenCL half-float support
and preferred half-float vector width
* randu: support default range [-1.0, 1.0] for float16
* TestBase::warmup: support float16
Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft)
* Fix dynamic loading of clBLAS and clFFT
* Update filenames and function names for clBLAS (formerly, clAmdBlas)
* Update filenames and function names for clFFT (formerly, clAmdFft)
* Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS
* Fix generators for clBLAS and clFFT headers
* Update generators to parse recent clBLAS and clFFT library headers
* Update generators to be compatible with Python 3
* Re-generate OpenCV's clBLAS and clFFT headers
* Update function calls to match names in newly generated headers
* Disable (and comment on) teardown code for clBLAS and clFFT
* Renaming *clamd* files
* Renaming *clamdblas* files to *clblas*
* Renaming *clamdfft* files to *clfft*
* Update generator for CL headers
* Update generator to be compatible with Python 3
* extended C++ version of Levenberg-Marquardt (LM) solver to accommodate all features of the C counterpart.
* removed C version of LM solver
* made a few other little changes to make the code compile and run smoothly
Improve performance on Arm64
* Improve performance on Apple silicon
This patch will
- Enable dot product intrinsics for macOS arm64 builds
- Enable for macOS arm64 builds
- Improve HAL primitives
- reduction (sum, min, max, sad)
- signmask
- mul_expand
- check_any / check_all
Results on a M1 Macbook Pro
* Updates to #20011 based on feedback
- Removes Apple Silicon specific workarounds
- Makes #ifdef sections smaller for v_mul_expand cases
- Moves dot product optimization to compiler optimization check
- Adds 4x4 matrix transpose optimization
* Remove dotprod and fix v_transpose
Based on the latest, we've removed dotprod entirely and will revisit in a future PR.
Added explicit cats with v_transpose4x4()
This should resolve all opens with this PR
* Remove commented out lines
Remove two extraneous comments
* Add the support for riscv64 vector 0.7.1.
* fixed GCC warnings
* cleaned whitespaces
* Remove the worning by the use of internal API of compiler.
* Update the license header.
* removed trailing whitespaces
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@me.com>
Co-authored-by: yulj <linjie.ylj@alibaba-inc.com>
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
* Adding functions rbegin() and rend() functions to matrix class.
This is important to be more standard compliant with C++ and an ever increasing number of people using standard algorithms for better code readability- and maintainability.
The functions are copy pated from their counterparts (even though they should probably call the counterparts but this gave me some troube).
They return iterators using std::reverse_iterators
Follow up of an open feature request:
https://github.com/opencv/opencv/issues/4641
* Fix rbegin() and rend() and provide tests for them
* Removing unnecessary whitespaces
* Adding rbegin and rend to Mat_ class with the right parameters so we don't need to repeat the template argument.
An instantiating cv::Mat_<int> for example can call it's rbegin() function and doesn't need rbegin<int>() with this convience addition.
Follows what is done for forward iterators
* static cast the vector size (return size_t) to an int (that is required for opencv mat constructor)
Co-authored-by: Stefan <stefan.gerl@tum.de>
* Updated cpp reference implementations for a few intrinsics to address wide universal intrinsics as well
* Updated cpp reference implementations for a few more universal intrinsics
* Update polynom_solver.cpp
This pull request is in the response to Issue #19526. I have fixed the problem with the cube root calculation of 2*R. The Issue was in the usage of pow function with negative values of R, but if it is calculated for only positive values of R then changing x0 according to the parity of R, the Issue is resolved. Kindly consider it, Thanks!
* add cv::cubeRoot(double)
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
* Add Python Bindings for getCacheDirectory function
* Added getCacheDirectory interop test with image codecs.
Co-authored-by: Sergey Slashchinin <sergei.slashchinin@xperience.ai>
- follows iso c++ guideline C.44
- enables default compiler-created constructors to
also be noexcept
original commit: 77e26a7db3
- handled KernelArg, Image2D
* [hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64.
On Armv8 in AArch64 execution mode, we can skip the sequence
v<op>_<ty>(vget_high_<ty>(x), vget_high_<ty>(y))
in favour of
v<op>_high_<ty>(x, y)
This has better changes for recent compilers to use less data movement
operations and better register allocation. See for example:
https://godbolt.org/z/bPq7vd
* [hal][neon] Fix build failure on armv7.
* [hal][neon] Address review comments in PR.
PR: https://github.com/opencv/opencv/pull/19486
* [hal][neon] Define macro to check for the AArch64 execution state of Armv8.
* [hal][neon] Fix macro definition for AArch64.
The fix is needed to prevent warnings when building for Armv7.
they might be thrown from third-party code (notably Ogre in the ovis
module).
While Linux is kind enough to print them, they cause instant termination
on Windows.
Arguably, they do not origin from OpenCV itself, but still this helps
understanding what went wrong when calling an OpenCV function.