Commit Graph

43 Commits

Author SHA1 Message Date
Maksim Shabunin
bd26d02908 build: enable RISC-V FP16 support in the toolchain 2024-09-25 20:01:29 +03:00
Maksim Shabunin
4c81e174bf
Merge pull request #25901 from mshabunin:fix-riscv-aarch-baseline
RISC-V/AArch64: disable CPU features detection #25901

This PR is the first step in fixing current issues with NEON/RVV, FP16, BF16 and other CPU features on AArch64 and RISC-V platforms.

On AArch64 and RISC-V platforms we usually have the platform set by default in the toolchain when we compile it or in the cmake toolchain file or in CMAKE_CXX_FLAGS by user. Then, there are two ways to set platform options: a) "-mcpu=<some_cpu>" ; b) "-march=<arch description>" (e.g. "rv64gcv"). Furthermore, there are no similar "levels" of optimizations as for x86_64, instead we have features (RVV, FP16,...) which can be enabled or disabled. So, for example, if a user has "rv64gc" set by the toolchain and we want to enable RVV. Then we need to somehow parse their current feature set and append "v" (vector optimizations) to this string. This task is quite hard and the whole procedure is prone to errors.

I propose to use "CPU_BASELINE=DETECT" by default on AArch64 and RISC-V platforms. And somehow remove other features or make them read-only/detect-only, so that OpenCV wouldn't add any extra "-march" flags to the default configuration. We would rely only on the flags provided by the compiler and cmake toolchain file. We can have some predefined configurations in our cmake toolchain files.

Changes made by this PR:
- `CMakeLists.txt`: 
  - use `CMAKE_CROSSCOMPILING` instead of `CMAKE_TOOLCHAIN_FILE` to detect cross-compilation. This might be useful in cases of native compilation with a toolchain file
  - removed obsolete variables `ENABLE_NEON` and `ENABLE_VFPV3`, the first one have been turned ON by default on AArch64 platform which caused setting `CPU_BASELINE=NEON`
  - raise minimum cmake version allowed to 3.7 to allow using `CMAKE_CXX_FLAGS_INIT` in toolchain files
- added separate files with arch flags for native compilation on AArch64 and RISC-V, these files will be used in our toolchain files and in regular cmake
- use `DETECT` as default value for `CPU_BASELINE` also allow `NATIVE`, warn user if other values were used (only for AArch64 and RISC-V)
- for each feature listed in `CPU_DISPATCH` check if corresponding `CPU_${opt}_FLAGS_ON` has been provided, warn user if it is empty (only for AArch64 and RISC-V)
- use `CPU_BASELINE_DISABLE` variable to actually turn off macros responsible for corresponding features even if they are enabled by compiler
- removed Aarch64 feature merge procedure (it didn't support `-mcpu` and built-in `-march`)
- reworked AArch64 and two RISC-V cmake toolchain files (does not affect Android/OSX/iOS/Win):
  - use `CMAKE_CXX_FLAGS_INIT` to set compiler flags
  - use variables `ENABLE_BF16`, `ENABLE_DOTPROD`, `ENABLE_RVV`, `ENABLE_FP16` to control `-march`
  - AArch64: removed other compiler and linker flags
    - `-fdata-sections`, `-fsigned-char`, `-Wl,--no-undefined`, `-Wl,--gc-sections`   - already set by OpenCV
    - `-Wa,--noexecstack`, `-Wl,-z,noexecstack`, `-Wl,-z,relro`, `-Wl,-z,now` - can be enabled by OpenCV via `ENABLE_HARDENING`
    - `-Wno-psabi` - this option used to disable some warnings on older ARM platforms, shouldn't harm
  - ARM: removed same common flags as for AArch64, but left `-mthumb` and `--fix-cortex-a8`, `-z nocopyreloc`
2024-09-12 18:07:24 +03:00
Liutong HAN
32d81a8bae Fix toolchain. 2024-07-18 13:47:53 +00:00
Junyan721113
d9421ac148
Merge pull request #25167 from plctlab:rvp_3rdparty
3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

# Summary

### Previous context
From PR #24556: 

>> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
>>   Reference documentation is here:
>>   
>>   * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html
>>   * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html
>>   * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html
>>   * Carotene library is turned on here: 8bbf08f0de/CMakeLists.txt (L906-L911)

> As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
> All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
> This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

### Progress

#### Part 1 (This PR)

- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [x] Element-wise add and subtract
- [x] Element-wise minimum or maximum
- [x] Element-wise absolute difference
- [x] Bitwise logical operations
- [x] Element-wise compare
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- [x] Integral
- [x] Threshold
- [x] WarpAffine
- [x] WarpPerspective
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)

#### Part 2 (Next PR)

**Rough Estimate. Todo List May Change.**

- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- smaller remap HAL interface
- AdaptiveThreshold
- BoxFilter
- Canny
- Convert
- Filter
- GaussianBlur
- MedianBlur
- Morph
- Pyrdown
- Resize
- Scharr
- SepFilter
- Sobel
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)
- FAST

### Performance Tests

The optimization does not contain floating point opreations.

**Absolute Difference**

Geometric mean (ms)

|Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)|
|---|:-:|:-:|:-:|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01|

**Bitwise Operation**

Geometric mean (ms)

|Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)|
|---|:-:|:-:|:-:|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55|

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-05-28 14:25:53 +03:00
Maksim Shabunin
224b9ee33f RISC-V: updated intrin_rvv071.hpp to work with modern toolchain 2.8.0
- intrinsics implementation (071) reworked to use modern RVV intrinsics syntax
- cmake toolchain file (071) now allows selecting from predefined configurations

Co-authored-by: Fang Sun <fangsun@linux.alibaba.com>
2024-01-17 20:34:12 +03:00
llh721113
a30c987f87 feat: RVP052 Optimization for DNN int8layers 2023-12-21 14:51:41 +08:00
HAN Liutong
f2962e5875
Merge pull request #24305 from hanliutong:toolchain
cmake: Fix riscv-gnu toolchain file. #24305

cmake(3.22.1) failed without the keyword `PATHS` on my device when I manually set `TOOLCHAIN_COMPILER_LOCATION_HINT` in command. And this patch is going to fix this issue.

[CMake Doc](https://cmake.org/cmake/help/latest/command/find_program.html):
> find_program (
>           <VAR>
>           name | NAMES name1 [name2 ...] [NAMES_PER_DIR]
>           [HINTS [path | ENV var]... ]
>           [PATHS [path | ENV var]... ]

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-09-25 13:06:22 +03:00
Maksim Shabunin
9cfced4650 RISC-V: fix hardcoded options in RVV 0.7.1 toolchain file 2023-03-27 18:42:56 +03:00
Alexander Alekhin
941d89e06d cmake: fix RISC-V toolchains
- RVV options are moved to configuration scripts instead of toolchains
2022-12-09 12:02:28 +00:00
Alexander Alekhin
50e66a2e53 riscv: use /opt/riscv path in toolchain
- align with defaults from https://github.com/riscv-collab/riscv-gnu-toolchain
2022-09-26 10:50:01 +00:00
Dmitry Kurtaev
e0b21dccf1 Detect RISC-V compiler from PATH environment variable 2022-09-21 11:40:54 +03:00
HAN Liutong
0ef803950b
Merge pull request #22179 from hanliutong:new-rvv
[GSoC] New universal intrinsic backend for RVV

* Add new rvv backend (partially implemented).

* Modify the framework of Universal Intrinsic.

* Add CV_SIMD macro guards to current UI code.

* Use vlanes() instead of nlanes.

* Modify the UI test.

* Enable the new RVV (scalable) backend.

* Remove whitespace.

* Rename and some others modify.

* Update intrin.hpp but still not work on AVX/SSE

* Update conditional compilation macros.

* Use static variable for vlanes.

* Use max_nlanes for array defining.
2022-07-19 20:02:00 +03:00
HAN Liutong
12338c1dc4 Update clang toolchain for RVV. 2022-02-16 16:01:38 +08:00
HAN Liutong
4935b14539
Merge pull request #21012 from hanliutong:rvv_clang
Update RVV backend for using Clang.

* Update cmake file of clang.

* Modify the RVV optimization on DNN to adapt to clang.

* Modify intrin_rvv: Disable some existing types.

* Modify intrin_rvv: Reinterpret instead of load&cast.

* Modify intrin_rvv: Update load&store without cast.

* Modify intrin_rvv: Rename vfredsum to fredosum.

* Modify intrin_rvv: Rewrite Check all/any by using vpopc.

* Modify intrin_rvv: Use reinterpret instead of c-style casting.

* Remove all macros which is not used in v_reinterpret

* Rename vpopc to vcpop according to spec.
2021-12-03 15:13:24 +00:00
Alexander Alekhin
4ff76cad2a cmake: fix cross-compilation problems
- unexpected pkg-config module (we should not use host binary)
- bump cmake_minimum_required to 3.5 in toolchain files
2021-08-05 11:42:58 +00:00
Zhang Yin
3a15a3821a Update RISC-V back-end to RVV 0.10 2021-06-18 15:44:38 +08:00
HAN Liutong
8bd5405228 Fix RVV toolchain conflicts. 2021-05-30 16:00:18 +08:00
damonyu1989
5f637e5a02
Merge pull request #19778 from damonyu1989:master-riscv-0.7.1
* Add the support for riscv64 vector 0.7.1.

* fixed GCC warnings

* cleaned whitespaces

* Remove the worning by the use of internal API of compiler.

* Update the license header.

* removed trailing whitespaces

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@me.com>
Co-authored-by: yulj <linjie.ylj@alibaba-inc.com>
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
2021-05-25 20:15:12 +03:00
Vadim Pisarevsky
e4ed1d08f4 Merge pull request #18462 from joy2myself:riscv_toolchian 2020-12-02 18:38:17 +00:00
Zhangyin
673e4e20f0 Added RISC-V backend of universal intrinsics 2020-12-02 14:25:03 +08:00
ZhangYin
b91e701f35 modified rvv option for clang to match LLVM upstream 2020-09-29 14:15:11 +08:00
Zhangyin
ff4c3873f2 Added cmake toolchain for RISC-V with clang.
- Added cross compile cmake file for target riscv64-clang
- Extended cmake for RISC-V and added instruction checks
- Created intrin_rvv.hpp with C++ version universal intrinsics
2020-08-03 20:18:56 +08:00
Alexander Smorkalov
7228d2a824 Added initial version of cmake toolchain for RISC-V architecture. 2020-04-27 12:42:38 +03:00
Alexander Alekhin
f0ffc52435 fix files permissions 2020-04-13 04:29:55 +00:00
Brian Wignall
f9c514b391 Fix spelling typos
backport commit 659ffaddb4
2019-12-27 12:46:53 +00:00
mipsopen-fwu
b1ea91d8bd Merge pull request #15422 from mipsopen-fwu:msa-dev
* Added MSA implementations for mips platforms. Intrinsics for MSA and build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed some unused code in mips.toolchain.cmake.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Added comments for mips toolchain configuration and disabled compiling warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Fixed the build error of unsupported opcode 'pause' when mips isa_rev is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed FP16 related item in MSA option defines in OpenCVCompilerOptimizations.cmake.
2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Removed unnecessary CV_SIMD128_64F guarding in intrin_msa.hpp.
2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* 1. Defined CPU_MSA_FLAGS_ON as "-mmsa".
2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>

* Removed unused msa_mlal_u16() and msa_mlal_s16 from msa_macros.h.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
2019-09-20 19:52:48 +03:00
Alexander Alekhin
c85e44697e cmake: drop unconditional forcing of CMAKE_SKIP_RPATH=TRUE
CMake "cache" entry for CMAKE_SKIP_RPATH is in the end of this file
2018-09-02 21:48:10 +00:00
luz.paz
d47b1f3b70 Misc. ./apps ./doc ./platoforms typos
Found via `codespell -q 3 --skip="./3rdparty" -I ../opencv-whitelist.txt`
2018-02-08 13:04:34 -05:00
Alexander Alekhin
98bd0a06dc cmake: fix gnu.toolchain
- CMAKE_FIND_ROOT_PATH_MODE_PROGRAM: ONLY => NEVER
- update handling of CMAKE_FIND_ROOT_PATH_MODE_* variables
2018-01-30 18:29:18 +00:00
Sayed Adel
2dc76d5009 cmake: Added Power toolchain 2017-10-28 17:50:43 +00:00
Alexander Alekhin
fddc9c5839 cmake: ARM toolchain: find ld/ar 2016-11-10 18:00:32 +03:00
Alexander Alekhin
cc121e06de cmake: ARM toolchain: do not force compiler version 2016-11-10 17:58:42 +03:00
Alexander Alekhin
73e16dbc3f cmake: update arm toolchain 2016-06-28 20:06:59 +03:00
Vladislav Vinogradov
5978ef1df5 Fix arm linux toolchain file
Use find_program() to set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER.

Originally the variables was set to compiler name, without absolute path.
CMake 3.0 tries to find them in CMAKE_FIND_ROOT_PATH (since 
CMAKE_FIND_ROOT_PATH_MODE_PROGRAM was set to ONLY) and fails.
2015-01-19 10:43:49 +03:00
Alexander Smorkalov
9941c6710d NEON instruction set control unified for regular and cross-compiler builds. 2013-12-20 16:02:37 +04:00
Alexander Smorkalov
f85cf5bdd9 Build fixes. Build scrips reorganized. 2013-05-28 12:27:56 +04:00
Vladislav Vinogradov
78c924baad removed obsolete CARMA toolchain and CMake variable 2013-02-14 16:27:17 +04:00
Alexander Smorkalov
192ee15520 Code review notes fixed;
HardFP and SoftFP toolchains joined to one;
RPATH skiping added.
2013-02-08 12:42:03 +04:00
Alexander Smorkalov
b81f0887f0 Carma board support fixed. 2013-02-06 14:47:42 +04:00
Alexander Smorkalov
6645f50dd0 CUDA toolkit support added to crosscompilation toolchain. 2013-02-06 14:43:57 +04:00
Alexander Smorkalov
1120289fdb Compiler and linker flags for arm cross compilation fixed. 2013-02-06 14:43:57 +04:00
Alexander Smorkalov
3ed99b7700 Code review notes applied.
Toolchain for arm hardfp added.
2013-02-06 14:43:57 +04:00
Alexander Smorkalov
60f056061a Cross compilation toolchain for arm linux added. 2013-02-06 14:43:57 +04:00