Commit Graph

9 Commits

Author SHA1 Message Date
Yuantao Feng
3afe8ddaf8
core: Rename cv::float16_t to cv::hfloat (#25217)
* rename cv::float16_t to cv::fp16_t

* add typedef fp16_t float16_t

* remove zero(), bits() from fp16_t class

* fp16_t -> hfloat

* remove cv::float16_t::fromBits; add hfloatFromBits

* undo changes in conv_winograd_f63.simd.hpp and conv_block.simd.hpp

* undo some changes in dnn
2024-03-21 23:44:19 +03:00
Liutong HAN
a287605c3e Clean up the Universal Intrinsic API. 2023-10-13 19:23:30 +08:00
HAN Liutong
0dd7769bb1
Merge pull request #23980 from hanliutong:rewrite-core
Rewrite Universal Intrinsic code by using new API: Core module. #23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR #23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for #23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-08-11 08:33:33 +03:00
Vadim Pisarevsky
6d7f5871db
added basic support for CV_16F (the new datatype etc.) (#12463)
* added basic support for CV_16F (the new datatype etc.). CV_USRTYPE1 is now equal to CV_16F, which may break some [rarely used] functionality. We'll see

* fixed just introduced bug in norm; reverted errorneous changes in Torch importer (need to find a better solution)

* addressed some issues found during the PR review

* restored the patch to fix some perf test failures
2018-09-10 16:56:29 +03:00
Vadim Pisarevsky
80b62a41c6 Merge pull request #12411 from vpisarev:wide_convert
* rewrote Mat::convertTo() and convertScaleAbs() to wide universal intrinsics; added always-available and SIMD-optimized FP16<=>FP32 conversion

* fixed compile warnings

* fix some more compile errors

* slightly relaxed accuracy threshold for int->float conversion (since we now do it using single-precision arithmetics, not double-precision)

* fixed compile errors on iOS, Android and in the baseline C++ version (intrin_cpp.hpp)

* trying to fix ARM-neon builds

* trying to fix ARM-neon builds

* trying to fix ARM-neon builds

* trying to fix ARM-neon builds
2018-09-06 19:36:59 +03:00
Maksim Shabunin
221342fb25 Split convert.cpp into smaller pieces 2018-02-12 15:17:19 +03:00
Vitaly Tuzov
5448d9186a AVX and SSE4.1 optimized conversion implementations migrated to separate files 2017-07-04 14:48:01 +03:00
Alexander Alekhin
3e3e2dd512 android: make optional "cpufeatures", build fixes for NDK r15 2017-06-21 13:34:19 +03:00
Tomoaki Teshima
e269ef96cb update convertFp16 using CV_CPU_CALL_FP16
* avoid link error (move the implementation of software version to header)
 * make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp)
 * fix error on 32bit x86
2017-06-06 22:26:51 +09:00