opencv/modules
Yuantao Feng 0521a3a384
Merge pull request #24476 from fengyuentau:attention_layer
dnn: add attention layer #24476

Resolves #24609

Merge with: https://github.com/opencv/opencv_extra/pull/1128.

Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.

TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
    - [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack

## Benchmarks

Platform: Macbook Air M1.

### Attention Subgraph

Input scale: [1, 197, 768].

|                        | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75      | 3.68        | 3.22     |
| w/o Attention          | 9.06      | 9.01        | 8.24     |
| ORT (python)           | 4.32      | 2.63        | 2.50     |

### ViTs

All data in millisecond (ms).

| ViTs     | With Attention | Without Attention | ORT    |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77         | 365.35            | 109.70 |
| vit_b_32 | 89.92          | 116.22            | 30.36  |
| vit_l_16 | 1593.32        | 1730.74           | 419.92 |
| vit_l_32 | 468.11         | 577.41            | 134.12 |
| VitTrack | 3.80           | 3.87              | 2.25   |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-20 19:35:07 +03:00
..
calib3d Merge pull request #24546 from thewoz:checkerboard 2023-12-20 18:01:39 +03:00
core Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
dnn Merge pull request #24476 from fengyuentau:attention_layer 2023-12-20 19:35:07 +03:00
features2d Added Java bindings for BOWImgDescriptorExtractor constructor. 2023-10-31 11:23:47 +03:00
flann Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
gapi Merge pull request #24576 from AsyaPronina:ot_to_python 2023-12-20 15:26:01 +03:00
highgui Update window_QT.cpp 2023-11-13 12:10:52 +03:00
imgcodecs Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
imgproc Fix typo 2023-12-15 09:21:23 +08:00
java Merge pull request #24685 from AleksandrPanov:fix_build_grandle 2023-12-12 09:11:22 +03:00
js Merge pull request #24458 from laolaolulu:4.x 2023-11-13 14:51:20 +03:00
ml Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
objc Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
objdetect Get code to compile without DNN 2023-12-08 10:54:59 +01:00
photo Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
python Enabled VAS OT in G-API Python interface 2023-12-19 17:51:59 +00:00
stitching fix: supress GCC13 warnings (#24434) 2023-10-26 09:00:58 +03:00
ts Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
video Merge pull request #24461 from fengyuentau:tracker_vit_backend_target 2023-10-27 14:12:44 +03:00
videoio Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
world cmake: use /INCREMENTAL:NO with MSVS 2015 2023-12-07 19:46:27 +00:00