dnn: add attention layer #24476Resolves#24609
Merge with: https://github.com/opencv/opencv_extra/pull/1128.
Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
- [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack
## Benchmarks
Platform: Macbook Air M1.
### Attention Subgraph
Input scale: [1, 197, 768].
| | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75 | 3.68 | 3.22 |
| w/o Attention | 9.06 | 9.01 | 8.24 |
| ORT (python) | 4.32 | 2.63 | 2.50 |
### ViTs
All data in millisecond (ms).
| ViTs | With Attention | Without Attention | ORT |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77 | 365.35 | 109.70 |
| vit_b_32 | 89.92 | 116.22 | 30.36 |
| vit_l_16 | 1593.32 | 1730.74 | 419.92 |
| vit_l_32 | 468.11 | 577.41 | 134.12 |
| VitTrack | 3.80 | 3.87 | 2.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add instance norm layer #24378
Resolves https://github.com/opencv/opencv/issues/24377
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644
| Perf | multi-thread | single-thread |
| - | - | - |
| x: [2, 64, 180, 240] | 3.95ms | 11.12ms |
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
```
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
modules_filter:Custom=none
disable_ipp:Custom=ON
```
* improve and refactor softmax layer
* fix building error
* compatible region layer
* fix axisStep when disable SIMD
* fix dynamic array
* try to fix error
* use nlanes from VTraits
* move axisBias to srcOffset
* fix bug caused by axisBias
* remove macro
* replace #ifdef with #if for CV_SIMD
GSoC Add ONNX Support for GatherElements #24092
Merge with: https://github.com/opencv/opencv_extra/pull/1082
Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements)
Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Resolve uncovered CUDA dnn layer #24080
### Pull Request Readiness Checklist
* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX
resolves https://github.com/opencv/opencv/issues/24064
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: add layer normalization for vision transformers
* add layer norm onnx parser, impl and tests
* add onnx graph simplifier for layer norm expanded
* handle the case when constants are of type Initializer
* add test case for layer norm expanded with initializers
* use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16
* use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull
* template hasBias
* remove trailing whitespace
* use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt
* refactor implementation, optimize parallel_for
* disable layer norm expanded
* remove the removal of layer norm optional outputs
Reimplementation of Element-wise layers with broadcasting support
* init
* semi-working initial version
* add small_vector
* wip
* remove smallvec
* add nary function
* replace auto with Mat in lambda expr used in transform
* uncomment asserts
* autobuffer shape_buf & step_buf
* fix a missing bracket
* fixed a missing addLayer in parseElementWise
* solve one-dimensional broadcast
* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called
* one autobuffer for step & shape
* temporal fix for the missing original dimension information
* fix parseUnsqueeze when it gets a 1d tensor constant
* support sum/mean/min/max with only one input
* reuse old code to handle cases of two non-constant inputs
* add condition to handle div & mul of two non-constant inputs
* use || instead of or
* remove trainling spaces
* enlarge buf in binary_forward to contain other buffer
* use autobuffer in nary_forward
* generate data randomly and add more cases for perf
* add op and, or & xor
* update perf_dnn
* remove some comments
* remove legacy; add two ONNX conformance tests in filter
* move from cpu_denylist to all_denylist
* adjust parsing for inputs>=2
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>