Commit Graph

28 Commits

Author SHA1 Message Date
Alexander Smorkalov
100db1bc0b Merge branch 4.x 2024-08-28 15:06:19 +03:00
Yuantao Feng
347d673a87
Merge pull request #23279 from fengyuentau:add_topk
dnn: add ONNX TopK #23279

Merge with https://github.com/opencv/opencv_extra/pull/1200

Partially fixes #22890 and #20258

To-do:

- [x] TopK forward impl
- [x] add tests
- [x] support Opset 1 & 10 if possible
- [ ] ~Support other backends~ (TopK has two outputs, which is not supported by other backends, such as openvino)


Perf:

M1 (time in millisecond)

| input shape     | axis | dnn  | ort  |
| --------------- | ---- | ---- | ---- |
| (1000, 100)     | 0    | 1.68 | 4.07 |
| (1000, 100) K5  | 0    | 1.13 | 0.12 |
| (1000, 100)     | 1    | 0.96 | 0.77 |
| (100, 100, 100) | 0    | 10.00 | 31.13 |
| (100, 100, 100) | 1    | 7.33 | 9.17 |
| (100, 100, 100) | 2    | 7.52 | 9.48 |

M2 (time in milisecond)

| input shape     | axis | dnn  | ort  |
| --------------- | ---- | ---- | ---- |
| (1000, 100)     | 0    | 0.76 | 2.44 |
| (1000, 100) K5 | 0 | 0.68 | 0.07 |
| (1000, 100)     | 1    | 0.41 | 0.50 |
| (100, 100, 100) | 0    | 4.83 | 17.52|
| (100, 100, 100) | 1    | 3.60 | 5.08 |
| (100, 100, 100) | 2    | 3.73 | 5.10 |

ONNXRuntime performance testing script: https://gist.github.com/fengyuentau/a119f94fd16721ec9974b8c7b0a45d4c

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-08-21 17:03:24 +03:00
Alexander Smorkalov
672a662dff Merge branch 4.x 2024-07-26 09:10:36 +03:00
Yuantao Feng
23b244d3a3
Merge pull request #25881 from fengyuentau:dnn/cpu/optimize_activations_with_v_exp
dnn: optimize activations with v_exp #25881

Merge with https://github.com/opencv/opencv_extra/pull/1191.

This PR optimizes the following activations:

- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish

### Performance (Updated on 2024-07-18)

#### AmLogic A311D2 (ARM Cortex A73 + A53)

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                              vs
                                                                         activations
                                                                          (x-factor)
Celu::Layer_Elementwise::OCV/CPU        115.859          27.930              4.15
Elu::Layer_Elementwise::OCV/CPU          27.846          27.003              1.03
Gelu::Layer_Elementwise::OCV/CPU         0.657           0.602               1.09
HardSwish::Layer_Elementwise::OCV/CPU    31.885          6.781               4.70
Mish::Layer_Elementwise::OCV/CPU         35.729          32.089              1.11
Selu::Layer_Elementwise::OCV/CPU         61.955          27.850              2.22
Swish::Layer_Elementwise::OCV/CPU        30.819          26.688              1.15
```

#### Apple M1

```
Geometric mean (ms)

               Name of Test                activations activations.patch activations.patch
                                                                                   vs
                                                                              activations
                                                                               (x-factor)
Celu::Layer_Elementwise::OCV/CPU              16.184          2.118               7.64
Celu::Layer_Elementwise::OCV/CPU_FP16         16.280          2.123               7.67
Elu::Layer_Elementwise::OCV/CPU               9.123           1.878               4.86
Elu::Layer_Elementwise::OCV/CPU_FP16          9.085           1.897               4.79
Gelu::Layer_Elementwise::OCV/CPU              0.089           0.081               1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16         0.086           0.074               1.17
HardSwish::Layer_Elementwise::OCV/CPU         1.560           1.555               1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16    1.536           1.523               1.01
Mish::Layer_Elementwise::OCV/CPU              6.077           2.476               2.45
Mish::Layer_Elementwise::OCV/CPU_FP16         5.990           2.496               2.40
Selu::Layer_Elementwise::OCV/CPU              11.351          1.976               5.74
Selu::Layer_Elementwise::OCV/CPU_FP16         11.533          1.985               5.81
Swish::Layer_Elementwise::OCV/CPU             4.687           1.890               2.48
Swish::Layer_Elementwise::OCV/CPU_FP16        4.715           1.873               2.52
```

#### Intel i7-12700K

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                    vs
                                                               activations
                                                                (x-factor)
Celu::Layer_Elementwise::OCV/CPU        17.106       3.560         4.81
Elu::Layer_Elementwise::OCV/CPU          5.064       3.478         1.46
Gelu::Layer_Elementwise::OCV/CPU         0.036       0.035         1.04
HardSwish::Layer_Elementwise::OCV/CPU    2.914       2.893         1.01
Mish::Layer_Elementwise::OCV/CPU         3.820       3.529         1.08
Selu::Layer_Elementwise::OCV/CPU        10.799       3.593         3.01
Swish::Layer_Elementwise::OCV/CPU        3.651       3.473         1.05
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-07-19 16:03:19 +03:00
Alexander Smorkalov
fc9208cff5 Merge branch 4.x 2024-07-17 10:08:16 +03:00
Yuantao Feng
e3858cc5a3
Merge pull request #25147 from fengyuentau:dnn/elementwise_layers/speedup
* added v_erf and implemented gelu acceleration via vectorization

* remove anonymous v_erf and use v_erf from intrin_math

* enable perf for ov and cuda backend
2024-07-08 14:24:36 +03:00
alexlyulkov
1d1faaabef
Merge pull request #24411 from alexlyulkov:al/dnn-type-inference
Added int32, int64 support and type inference to dnn #24411

**Added a type inference to dnn similar to the shape inference, added int32 and int64 support.**

- Added getTypes method for layers that calculates layer outputs types and internals types from inputs types (Similar to getMemoryShapes). By default outputs and internals types = input[0] type
- Added type inference pipeline similar to shape inference pipeline. LayersShapes struct (that is used in shape inference pipeline) now contains both shapes and types
- All layers output blobs are now allocated using the calculated types from the type inference.
- Inputs and constants with int32 and int64 types are not automatically converted into float32 now.
- Added int32 and int64 support for all the layers with indexing and for all the layers required in tests.

Added  int32 and int64 support for CUDA:
- Added host<->device data moving for int32 and int64
- Added int32 and int64 support for several layers (just slightly modified CUDA C++ templates)

Passed all the accuracy tests on CPU, OCL, OCL_FP16, CUDA, CUDA_FP16. (except RAFT model)

**CURRENT PROBLEMS**:
-  ONNX parser always converts int64 constants and layers attributes to int32, so some models with int64 constants doesn't work (e.g. RAFT). The solution is to disable int64->int32 conversion and fix attributes reading in a lot of ONNX layers parsers (https://github.com/opencv/opencv/issues/25102)
- I didn't add type inference and int support to VULCAN, so it doesn't work at all now.
- Some layers don't support int yet, so some unknown models may not work.

**CURRENT WORKAROUNDS**:
- CPU arg_layer indides are implemented in int32 followed by a int32->int64 conversion (the master branch has the same workaround with int32->float conversion)
- CPU and OCL pooling_layer indices are implemented in float followed by a float->int64 conversion
- CPU gather_layer indices are implemented in int32, so int64 indices are converted to int32 (the master branch has the same workaround with float->int32 conversion)

**DISABLED TESTS**:
- RAFT model

**REMOVED TESTS**:
- Greater_input_dtype_int64 (because it doesn't fit ONNX rules, the whole test is just comparing float tensor with int constant)

**TODO IN NEXT PULL REQUESTS**:
- Add int64 support for ONNX parser
- Add int support for more layers
- Add int support for OCL (currently int layers just run on CPU)
- Add int tests
- Add int support for other backends
2024-03-01 17:07:38 +03:00
Haosonn
87f749277d
Merge pull request #24768 from Haosonn:pre-pr-2
Vulkan backend for NaryEltwiseLayer in DNN module #24768

We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:

- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
  - Wrong info output in ``context.cpp``

Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by

- find out the best ``VkMemoryProperty`` for various discrete GPUs

- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: IskXCr <IskXCr@outlook.com>
2024-01-29 18:41:49 +03:00
Alexander Smorkalov
ac4c0bffac
Merge pull request #24813 from fengyuentau:speedup_scatter
dnn: improve scatter and scatterND speed with multi-threading
2024-01-17 17:16:50 +03:00
jimmylaw21
a7fa1e6f4b
Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer
dnn onnx: add group norm layer #24610

dnn onnx: add group norm layer

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2024-01-12 15:13:26 +03:00
fengyuentau
13127365e2 better comment 2024-01-08 11:55:06 +08:00
Yuantao Feng
b7d70613e4 fix failed assertion in debug build 2024-01-05 18:33:01 +00:00
fengyuentau
2ed97b9ef3 multi-threaded scatterND and refactor perf 2024-01-05 18:15:59 +08:00
fengyuentau
63cde0b90d multi-threaded scatter and refactor perf 2024-01-05 17:24:09 +08:00
Alexander Alekhin
f49b26182b dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
Yuantao Feng
0521a3a384
Merge pull request #24476 from fengyuentau:attention_layer
dnn: add attention layer #24476

Resolves #24609

Merge with: https://github.com/opencv/opencv_extra/pull/1128.

Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.

TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
    - [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack

## Benchmarks

Platform: Macbook Air M1.

### Attention Subgraph

Input scale: [1, 197, 768].

|                        | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75      | 3.68        | 3.22     |
| w/o Attention          | 9.06      | 9.01        | 8.24     |
| ORT (python)           | 4.32      | 2.63        | 2.50     |

### ViTs

All data in millisecond (ms).

| ViTs     | With Attention | Without Attention | ORT    |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77         | 365.35            | 109.70 |
| vit_b_32 | 89.92          | 116.22            | 30.36  |
| vit_l_16 | 1593.32        | 1730.74           | 419.92 |
| vit_l_32 | 468.11         | 577.41            | 134.12 |
| VitTrack | 3.80           | 3.87              | 2.25   |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-20 19:35:07 +03:00
Yuantao Feng
ee0822dc4d
Merge pull request #24378 from fengyuentau:instance_norm
dnn onnx: add instance norm layer #24378

Resolves https://github.com/opencv/opencv/issues/24377
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644

| Perf | multi-thread | single-thread |
| - | - | - |
| x: [2, 64, 180, 240] | 3.95ms | 11.12ms |

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

```
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
modules_filter:Custom=none
disable_ipp:Custom=ON
```
2023-11-07 12:59:10 +03:00
Wanli
ed52f7feea
Improve and refactor softmax layer (#24466)
* improve and refactor softmax layer

* fix building error

* compatible region layer

* fix axisStep when disable SIMD

* fix dynamic array

* try to fix error

* use nlanes from VTraits

* move axisBias to srcOffset

* fix bug caused by axisBias

* remove macro

* replace #ifdef with #if for CV_SIMD
2023-11-06 04:48:32 +03:00
Aser Atawya
240b245105
Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX
GSoC Add ONNX Support for GatherElements #24092

Merge with: https://github.com/opencv/opencv_extra/pull/1082
Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements)
Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-18 10:41:47 +03:00
Dmitry Kurtaev
d88ad46978 Remove explitit transB attribute from MatMul perf test 2023-08-18 15:10:14 +03:00
Dmitry Kurtaev
96f23e3da1
Merge pull request #24080 from dkurt:dnn_cuda_layers
Resolve uncovered CUDA dnn layer #24080

### Pull Request Readiness Checklist

* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX

resolves https://github.com/opencv/opencv/issues/24064

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-03 09:13:42 +03:00
wanli
e4360294c5 make 'abcd op 1b11' broadcast support cuda 2023-04-23 17:46:50 +08:00
wanli
c8f5e228fc release MUL and ADD operator on CUDA 2023-02-10 19:33:59 +08:00
Yuantao Feng
4d918ba40b
Merge pull request #23047 from fengyuentau:layer_norm
dnn: add layer normalization for vision transformers

* add layer norm onnx parser, impl and tests

* add onnx graph simplifier for layer norm expanded

* handle the case when constants are of type Initializer

* add test case for layer norm expanded with initializers

* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16

* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull

* template hasBias

* remove trailing whitespace

* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt

* refactor implementation, optimize parallel_for

* disable layer norm expanded

* remove the removal of layer norm optional outputs
2023-01-27 16:35:59 +03:00
zoom
11d492b0b9 Let part of the operators in nary_eltwise support cuda 2022-11-02 14:08:21 +08:00
fengyuentau
d24d8f2abe implementation of scatter and scatternd with conformance tests enabled 2022-10-17 11:30:32 +08:00
rogday
ed69bcae2d
Merge pull request #21865 from rogday:nary_eltwise_layers
Reimplementation of Element-wise layers with broadcasting support

* init

* semi-working initial version

* add small_vector

* wip

* remove smallvec

* add nary function

* replace auto with Mat in lambda expr used in transform

* uncomment asserts

* autobuffer shape_buf & step_buf

* fix a missing bracket

* fixed a missing addLayer in parseElementWise

* solve one-dimensional broadcast

* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called

* one autobuffer for step & shape

* temporal fix for the missing original dimension information

* fix parseUnsqueeze when it gets a 1d tensor constant

* support sum/mean/min/max with only one input

* reuse old code to handle cases of two non-constant inputs

* add condition to handle div & mul of two non-constant inputs

* use || instead of or

* remove trainling spaces

* enlarge buf in binary_forward to contain other buffer

* use autobuffer in nary_forward

* generate data randomly and add more cases for perf

* add op and, or & xor

* update perf_dnn

* remove some comments

* remove legacy; add two ONNX conformance tests in filter

* move from cpu_denylist to all_denylist

* adjust parsing for inputs>=2

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2022-07-19 06:14:05 +03:00
Alexander Alekhin
81e027eef7 dnn: fix OpenCL implementation of Slice layer 2020-07-16 04:33:52 +00:00