Commit Graph

9 Commits

Author SHA1 Message Date
HAN Liutong
0dd7769bb1
Merge pull request #23980 from hanliutong:rewrite-core
Rewrite Universal Intrinsic code by using new API: Core module. #23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for #23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-08-11 08:33:33 +03:00
HAN Liutong
0ef803950b
Merge pull request #22179 from hanliutong:new-rvv
[GSoC] New universal intrinsic backend for RVV

* Add new rvv backend (partially implemented).

* Modify the framework of Universal Intrinsic.

* Add CV_SIMD macro guards to current UI code.

* Use vlanes() instead of nlanes.

* Modify the UI test.

* Enable the new RVV (scalable) backend.

* Remove whitespace.

* Rename and some others modify.

* Update intrin.hpp but still not work on AVX/SSE

* Update conditional compilation macros.

* Use static variable for vlanes.

* Use max_nlanes for array defining.
2022-07-19 20:02:00 +03:00
damonyu1989
5f637e5a02
Merge pull request #19778 from damonyu1989:master-riscv-0.7.1
* Add the support for riscv64 vector 0.7.1.

* fixed GCC warnings

* cleaned whitespaces

* Remove the worning by the use of internal API of compiler.

* Update the license header.

* removed trailing whitespaces

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@me.com>
Co-authored-by: yulj <linjie.ylj@alibaba-inc.com>
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
2021-05-25 20:15:12 +03:00
Alexander Alekhin
fb61f88b9c Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-01-12 09:35:39 +00:00
Alexander Alekhin
e180cc050b
Merge pull request #16236 from alalek:fix_core_simd_emulator
* core: fix intrin_cpp, allow to build modules with SIMD emulator

* core(arithm): fix v_zero initialization

* core(simd): 'strict' types for binary/bitwise operations

* features2d: avoid aligned load issue in GCC 5.4 with emulated SIMD

* core(simd): alignment checks in SIMD emulator
2020-01-10 21:31:02 +03:00
Alexander Alekhin
a74fe2ec01 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Alexander Alekhin
1913482cf5 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-11-10 20:50:26 +00:00
Sayed Adel
93ffebc273 core: reimplement SIMD arithmetic, logic and comparison operations into wide universal intrinsics
- initialize arithmetic dispatcher
  - add new universal intrinsic v_absdiffs
  - add new universal intrinsic v_pack_b
  - add accumulate version of universal intrinsic v_round
  - fix sse/avx2:uint8 multiplication overflow
  - reimplement arithmetic, logic and comparison operations into wide universal intrinsics
    with full support for all types
  - reimplement IPP arithmetic, logic and comparison operations in a sperate file arithm_ipp.hpp
  - avoid scalar multiplication if scaling factor eq 1 and use integer multiplication
  - move C arithmetic operations to precomp.hpp and delete [arithm_simd|arithm_core].hpp
  - add compatibility with new opencv4 divide policy
2018-10-30 12:48:31 +02:00