dnn: add ONNX TopK #23279
Merge with https://github.com/opencv/opencv_extra/pull/1200
Partially fixes#22890 and #20258
To-do:
- [x] TopK forward impl
- [x] add tests
- [x] support Opset 1 & 10 if possible
- [ ] ~Support other backends~ (TopK has two outputs, which is not supported by other backends, such as openvino)
Perf:
M1 (time in millisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 1.68 | 4.07 |
| (1000, 100) K5 | 0 | 1.13 | 0.12 |
| (1000, 100) | 1 | 0.96 | 0.77 |
| (100, 100, 100) | 0 | 10.00 | 31.13 |
| (100, 100, 100) | 1 | 7.33 | 9.17 |
| (100, 100, 100) | 2 | 7.52 | 9.48 |
M2 (time in milisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 0.76 | 2.44 |
| (1000, 100) K5 | 0 | 0.68 | 0.07 |
| (1000, 100) | 1 | 0.41 | 0.50 |
| (100, 100, 100) | 0 | 4.83 | 17.52|
| (100, 100, 100) | 1 | 3.60 | 5.08 |
| (100, 100, 100) | 2 | 3.73 | 5.10 |
ONNXRuntime performance testing script: https://gist.github.com/fengyuentau/a119f94fd16721ec9974b8c7b0a45d4c
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: optimize activations with v_exp #25881
Merge with https://github.com/opencv/opencv_extra/pull/1191.
This PR optimizes the following activations:
- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish
### Performance (Updated on 2024-07-18)
#### AmLogic A311D2 (ARM Cortex A73 + A53)
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15
Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03
Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09
HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70
Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11
Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22
Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15
```
#### Apple M1
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64
Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67
Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86
Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79
Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17
HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01
Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45
Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40
Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74
Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81
Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48
Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52
```
#### Intel i7-12700K
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81
Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46
Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04
HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01
Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08
Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01
Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added v_erf and implemented gelu acceleration via vectorization
* remove anonymous v_erf and use v_erf from intrin_math
* enable perf for ov and cuda backend
Added int32, int64 support and type inference to dnn #24411
**Added a type inference to dnn similar to the shape inference, added int32 and int64 support.**
- Added getTypes method for layers that calculates layer outputs types and internals types from inputs types (Similar to getMemoryShapes). By default outputs and internals types = input[0] type
- Added type inference pipeline similar to shape inference pipeline. LayersShapes struct (that is used in shape inference pipeline) now contains both shapes and types
- All layers output blobs are now allocated using the calculated types from the type inference.
- Inputs and constants with int32 and int64 types are not automatically converted into float32 now.
- Added int32 and int64 support for all the layers with indexing and for all the layers required in tests.
Added int32 and int64 support for CUDA:
- Added host<->device data moving for int32 and int64
- Added int32 and int64 support for several layers (just slightly modified CUDA C++ templates)
Passed all the accuracy tests on CPU, OCL, OCL_FP16, CUDA, CUDA_FP16. (except RAFT model)
**CURRENT PROBLEMS**:
- ONNX parser always converts int64 constants and layers attributes to int32, so some models with int64 constants doesn't work (e.g. RAFT). The solution is to disable int64->int32 conversion and fix attributes reading in a lot of ONNX layers parsers (https://github.com/opencv/opencv/issues/25102)
- I didn't add type inference and int support to VULCAN, so it doesn't work at all now.
- Some layers don't support int yet, so some unknown models may not work.
**CURRENT WORKAROUNDS**:
- CPU arg_layer indides are implemented in int32 followed by a int32->int64 conversion (the master branch has the same workaround with int32->float conversion)
- CPU and OCL pooling_layer indices are implemented in float followed by a float->int64 conversion
- CPU gather_layer indices are implemented in int32, so int64 indices are converted to int32 (the master branch has the same workaround with float->int32 conversion)
**DISABLED TESTS**:
- RAFT model
**REMOVED TESTS**:
- Greater_input_dtype_int64 (because it doesn't fit ONNX rules, the whole test is just comparing float tensor with int constant)
**TODO IN NEXT PULL REQUESTS**:
- Add int64 support for ONNX parser
- Add int support for more layers
- Add int support for OCL (currently int layers just run on CPU)
- Add int tests
- Add int support for other backends
Vulkan backend for NaryEltwiseLayer in DNN module #24768
We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:
- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
- Wrong info output in ``context.cpp``
Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by
- find out the best ``VkMemoryProperty`` for various discrete GPUs
- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: IskXCr <IskXCr@outlook.com>
dnn onnx: add group norm layer #24610
dnn onnx: add group norm layer
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
dnn: add attention layer #24476Resolves#24609
Merge with: https://github.com/opencv/opencv_extra/pull/1128.
Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
- [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack
## Benchmarks
Platform: Macbook Air M1.
### Attention Subgraph
Input scale: [1, 197, 768].
| | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75 | 3.68 | 3.22 |
| w/o Attention | 9.06 | 9.01 | 8.24 |
| ORT (python) | 4.32 | 2.63 | 2.50 |
### ViTs
All data in millisecond (ms).
| ViTs | With Attention | Without Attention | ORT |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77 | 365.35 | 109.70 |
| vit_b_32 | 89.92 | 116.22 | 30.36 |
| vit_l_16 | 1593.32 | 1730.74 | 419.92 |
| vit_l_32 | 468.11 | 577.41 | 134.12 |
| VitTrack | 3.80 | 3.87 | 2.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add instance norm layer #24378
Resolves https://github.com/opencv/opencv/issues/24377
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644
| Perf | multi-thread | single-thread |
| - | - | - |
| x: [2, 64, 180, 240] | 3.95ms | 11.12ms |
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
```
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
modules_filter:Custom=none
disable_ipp:Custom=ON
```
* improve and refactor softmax layer
* fix building error
* compatible region layer
* fix axisStep when disable SIMD
* fix dynamic array
* try to fix error
* use nlanes from VTraits
* move axisBias to srcOffset
* fix bug caused by axisBias
* remove macro
* replace #ifdef with #if for CV_SIMD
GSoC Add ONNX Support for GatherElements #24092
Merge with: https://github.com/opencv/opencv_extra/pull/1082
Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements)
Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Resolve uncovered CUDA dnn layer #24080
### Pull Request Readiness Checklist
* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX
resolves https://github.com/opencv/opencv/issues/24064
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: add layer normalization for vision transformers
* add layer norm onnx parser, impl and tests
* add onnx graph simplifier for layer norm expanded
* handle the case when constants are of type Initializer
* add test case for layer norm expanded with initializers
* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16
* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull
* template hasBias
* remove trailing whitespace
* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt
* refactor implementation, optimize parallel_for
* disable layer norm expanded
* remove the removal of layer norm optional outputs
Reimplementation of Element-wise layers with broadcasting support
* init
* semi-working initial version
* add small_vector
* wip
* remove smallvec
* add nary function
* replace auto with Mat in lambda expr used in transform
* uncomment asserts
* autobuffer shape_buf & step_buf
* fix a missing bracket
* fixed a missing addLayer in parseElementWise
* solve one-dimensional broadcast
* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called
* one autobuffer for step & shape
* temporal fix for the missing original dimension information
* fix parseUnsqueeze when it gets a 1d tensor constant
* support sum/mean/min/max with only one input
* reuse old code to handle cases of two non-constant inputs
* add condition to handle div & mul of two non-constant inputs
* use || instead of or
* remove trainling spaces
* enlarge buf in binary_forward to contain other buffer
* use autobuffer in nary_forward
* generate data randomly and add more cases for perf
* add op and, or & xor
* update perf_dnn
* remove some comments
* remove legacy; add two ONNX conformance tests in filter
* move from cpu_denylist to all_denylist
* adjust parsing for inputs>=2
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>