dnn: optimize activations with v_exp #25881
Merge with https://github.com/opencv/opencv_extra/pull/1191.
This PR optimizes the following activations:
- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish
### Performance (Updated on 2024-07-18)
#### AmLogic A311D2 (ARM Cortex A73 + A53)
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15
Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03
Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09
HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70
Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11
Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22
Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15
```
#### Apple M1
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64
Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67
Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86
Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79
Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17
HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01
Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45
Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40
Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74
Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81
Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48
Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52
```
#### Intel i7-12700K
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81
Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46
Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04
HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01
Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08
Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01
Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added v_erf and implemented gelu acceleration via vectorization
* remove anonymous v_erf and use v_erf from intrin_math
* enable perf for ov and cuda backend
Merge pull request #25861 from Abdurrahheem:ash/torch-attention-export-fix-4x
Support for Unflatten operation requred by Attention layer - 4.x #25861
### Pull Request Readiness Checklist
All test data and models for PR are located [#1190](https://github.com/opencv/opencv_extra/pull/1190)
This PR fixes issue reised when importing batched vanilla `Attention` layer from `PyTorch` via ONNX. Currently batched version of `Attention` layer in PyTorch [has unflatten operation inside](e3b3431c42/torch/nn/functional.py (L5500C17-L5500C31)). `unflatten` operation causes issue in `reshape` layer (see the Reshape_2 in the graph below) due to incorrect output of `slice` layer. This PR particularly fixes `slice` and `concat` layers to handle `unflatten` operation.
<img width="673" alt="image" src="https://github.com/opencv/opencv/assets/44877829/5b612b31-657a-47f1-83a4-0ac35a950abd">
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630
This PR introduces the following changes:
- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable
## Performance
### i7-12700K, RAM 64GB, Ubuntu 22.04
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.x64.0606 core.x64.0606 core.x64.0606
vs
opencv
perf
core.x64.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 16.116 11.161 1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 17.469 11.446 1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 17.531 11.469 1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 28.653 13.682 2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 21.899 13.422 1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 21.738 13.185 1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 16.172 11.473 1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 16.309 11.565 1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 16.166 11.454 1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 16.157 11.443 1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 163.459 15.234 10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 10.880 10.868 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 10.947 11.058 0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 10.948 10.910 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 10.874 10.871 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 10.971 10.920 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 17.546 11.462 1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 16.175 11.475 1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU 11.339 11.333 1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU 16.154 11.102 1.46
```
### Apple M1, RAM 16GB, macOS 14.4.1
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.m1.0606 core.m1.0606.patch core.m1.0606.patch
vs
opencv
perf
core.m1.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 28.418 3.768 7.54
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 6.942 5.679 1.22
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 5.822 5.653 1.03
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 5.751 5.628 1.02
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 5.797 5.599 1.04
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 7.272 5.578 1.30
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 5.777 5.562 1.04
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 5.819 5.559 1.05
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 5.830 5.574 1.05
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 5.759 5.567 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 342.260 74.655 4.58
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 8.338 8.280 1.01
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 8.359 8.309 1.01
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 8.412 8.295 1.01
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 8.380 8.297 1.01
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 8.356 8.323 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 6.818 5.561 1.23
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 5.805 5.570 1.04
NHWC_C::Layer_NaryEltwise::OCV/CPU 3.834 4.817 0.80
NHWC_H::Layer_NaryEltwise::OCV/CPU 28.402 3.771 7.53
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add sample support of YOLOv9 and YOLOv10 in OpenCV #25794
This PR adds sample support of [`YOLOv9`](https://github.com/WongKinYiu/yolov9) and [`YOLOv10`](https://github.com/THU-MIG/yolov10/tree/main)) in OpenCV. Models for this test are located in this [PR](https://github.com/opencv/opencv_extra/pull/1186).
**Running YOLOv10 using OpenCV.**
1. In oder to run `YOLOv10` one needs to cut off postporcessing with dynamic shapes from torch and then convert it to ONNX. If someone is looking for ready solution, there is [this forked branch](https://github.com/Abdurrahheem/yolov10/tree/ash/opencv-export) from official YOLOv10. Particularty follow this proceduce.
```bash
git clone git@github.com:Abdurrahheem/yolov10.git
conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
python export_opencv.py --model=<model-name> --imgsz=<input-img-size>
```
By default `model="yolov10s"` and `imgsz=(480,640)`. This will generate file `yolov10s.onnx`, which can be use for inference in OpenCV
2. For inference part on OpenCV. one can use `yolo_detector.cpp` [sample](https://github.com/opencv/opencv/blob/4.x/samples/dnn/yolo_detector.cpp). If you have followed above exporting procedure, then you can use following command to run the model.
``` bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov10s.onnx-file> --yolo=yolov10 --width=640 --height=480 --input=<path-to-image> --scale=0.003921568627 --padvalue=114
```
If you do not specify `--input` argument, OpenCV will grab first camera that is avaliable on your platform.
For more deatils on how to run the `yolo_detector.cpp` file see this [guide](https://docs.opencv.org/4.x/da/d9d/tutorial_dnn_yolo.html#autotoc_md443)
**Running YOLOv9 using OpenCV**
1. Export model following [official guide](https://github.com/WongKinYiu/yolov9)of the YOLOv9 repository. Particularly you can do following for converting.
```bash
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9
conda create -n yolov9 python=3.9
conda activate yolov9
pip install -r requirements.txt
wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-t-converted.pt
python export.py --weights=./yolov9-t-converted.pt --include=onnx --img-size=(480,640)
```
This will generate <yolov9-t-converted.onnx> file.
2. Inference on OpenCV.
```bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov9-t-converted.onnx> --yolo=yolov9 --width=640 --height=480 --scale=0.003921568627 --padvalue=114 --path=<path-to-image>
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: add DepthToSpace and SpaceToDepth #25779
We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However, it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes.
Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth.
Backend support:
- [x] CPU
- [x] CUDA
- [x] OpenCL
- [x] OpenVINO
- [x] CANN
- [x] TIMVX
- ~Vulkan~ (missing fundamental tools, like permutation and reshape)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Support Global_Pool_2D ops in .tflite model #25613
### Pull Request Readiness Checklist
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1180
This PR adds support for `GlobalAveragePooling2D` and `GlobalMaxPool2D` on the TFlite backend. When the k`eep_dims` option is enabled, the output is a 2D tensor, necessitating the inclusion of an additional flatten layer. Additionally, the names of these layers have been updated to match the output tensor names generated by `generate.py` from the opencv_extra repository.
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
Support Transpose op in TFlite #25297
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1168
The purpose of this PR is to introduce support for the Transpose op in TFlite format and to add a shape comparison between the output tensors and the references. In some occasional cases, the shape of the output tensor is `[1,4,1,1]`, while the shape of the reference tensor is `[1,4]`. Consequently, the norm check incorrectly reports that the test has passed, as the residual is zero.
Below is a Python script for generating testing data. The generated data can be integrated into the repo `opencv_extra`.
```python
import numpy as np
import tensorflow as tf
PREFIX_TFL = '/path/to/opencv_extra/testdata/dnn/tflite/'
def generator(input_tensor, model, saved_name):
# convert keras model to .tflite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
#converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.optimizations = [None]
tflite_model = converter.convert()
with open(f'{PREFIX_TFL}/{saved_name}.tflite', 'wb') as f:
f.write(tflite_model)
# save the input tensor to .npy
if input_tensor.ndim == 4:
opencv_tensor = np.transpose(input_tensor, (0,3,1,2))
else:
opencv_tensor = input_tensor
opencv_tensor = np.copy(opencv_tensor, order='C').astype(np.float32)
np.save(f'{PREFIX_TFL}/{saved_name}_inp.npy', opencv_tensor)
# generate output tenosr and save it to .npy
mat_out = model(input_tensor).numpy()
mat_out = np.copy(mat_out, order='C').astype(np.float32)
if mat_out.ndim == 4:
mat_out = np.transpose(mat_out, (0,3,1,2))
interpreter = tf.lite.Interpreter(model_content=tflite_model)
out_name = interpreter.get_output_details()[0]['name']
np.save(f'{PREFIX_TFL}/{saved_name}_out_{out_name}.npy', mat_out)
def build_transpose():
model_name = "keras_permute"
mat_in = np.array([[[1,2,3], [4,5,6]]], dtype=np.float32)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(2,3)))
model.add(tf.keras.layers.Permute((2,1)))
model.summary()
generator(mat_in, model, model_name)
if __name__ == '__main__':
build_transpose()
```
### Pull Request Readiness Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
Fixed OpenVINO gemm layer #25518
Fixed OpenVINO gemm layer
The problem was that our layer didn't properly handle all the possible gemm options in OpenVINO mode
Fixes#25472
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixed ONNX range layer #25414
Partially address https://github.com/opencv/opencv/issues/25363
Fixed ONNX range layer. It should support any input type.
Added tests (extra [PR](https://github.com/opencv/opencv_extra/pull/1170))
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
[BugFix] dnn (ONNX): Foce dropping constant inputs in parseClip if they are shared #25319
Resolves https://github.com/opencv/opencv/issues/25278
Merge with https://github.com/opencv/opencv_extra/pull/1165
In Gold-YOLO ,`Div` has a constant input `B=6` which is then parsed into a `Const` layer in the ONNX importer, but `Clip` also has the shared constant input `max=6` which is already a `Const` layer and then connected to `Elementwise` layer. This should not happen because in the `forward()` of `Elementwise` layer, the legacy code goes through and apply activation to each input. More details on https://github.com/opencv/opencv/issues/25278#issuecomment-2032199630.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Ownership check in TFLite importer #25312
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/25310
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Merge with https://github.com/opencv/opencv_extra/pull/1158
Todo:
- [x] Fix Attention pattern recognition.
- [x] Handle other backends.
Benchmark:
"VIT_B_32 OCV/CPU", M1, results in milliseconds.
| Model | 4.x | This PR |
| - | - | - |
| VIT_B_32 OCV/CPU | 87.66 | **83.83** |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Release convolution weightsMat after usage #25181
### Pull Request Readiness Checklist
related (but not resolved): https://github.com/opencv/opencv/issues/24134
Minor memory footprint improvement. Also, adds a test for VmHWM.
RAM top memory usage (-230MB)
| YOLOv3 (237MB file) | 4.x | PR |
|---------------------|---------|---------|
| no winograd | 808 MB | 581 MB |
| winograd | 1985 MB | 1750 MB |
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixed ReduceMean layer behaviour #25120
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)
Removed all pre-C++11 code, workarounds, and branches #23736
This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.
* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.
---------
Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
dnn onnx: add group norm layer #24610
dnn onnx: add group norm layer
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808
Merge with https://github.com/opencv/opencv_extra/pull/1137
Resolves https://github.com/opencv/opencv/issues/24797
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add mod #24765
Resolves https://github.com/opencv/opencv/issues/23174
TODO:
- [x] enable some conformance tests
- [x] add backends
- [x] CANN
- [x] OpenVINO
- [x] CUDA
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn onnx: support constaint inputs in einsum importer #24753
Merge with https://github.com/opencv/opencv_extra/pull/1132.
Resolves https://github.com/opencv/opencv/issues/24697
Credits to @LaurentBerger.
---
This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixes#22747. Support [crop] configuration for DarkNet #24384
Request for comments. This is my first PR.
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1112
resolves https://github.com/opencv/opencv/issues/22747
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn: add attention layer #24476Resolves#24609
Merge with: https://github.com/opencv/opencv_extra/pull/1128.
Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
- [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack
## Benchmarks
Platform: Macbook Air M1.
### Attention Subgraph
Input scale: [1, 197, 768].
| | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75 | 3.68 | 3.22 |
| w/o Attention | 9.06 | 9.01 | 8.24 |
| ORT (python) | 4.32 | 2.63 | 2.50 |
### ViTs
All data in millisecond (ms).
| ViTs | With Attention | Without Attention | ORT |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77 | 365.35 | 109.70 |
| vit_b_32 | 89.92 | 116.22 | 30.36 |
| vit_l_16 | 1593.32 | 1730.74 | 419.92 |
| vit_l_32 | 468.11 | 577.41 | 134.12 |
| VitTrack | 3.80 | 3.87 | 2.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add blobrecttoimage #24539
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/14659
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #14659
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Make default axis of softmax in onnx "-1" without opset option #24613
Try to solve problem: https://github.com/opencv/opencv/pull/24476#discussion_r1404821158
**ONNX**
`opset <= 11` use 1
`else` use -1
**TensorFlow**
`TF version = 2.x` use -1
`else` use 1
**Darknet, Caffe, Torch**
use 1 by definition
Add test for YoloX Yolo v6 and Yolo v8 #24611
This PR adds test for YOLOv6 model (which was absent before)
The onnx weights for the test are located in this PR [ #1126](https://github.com/opencv/opencv_extra/pull/1126)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add support for custom padding in DNN preprocessing #24569
This PR add functionality for specifying value in padding.
It is required in many preprocessing pipelines in DNNs such as Yolox object detection model
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix graph fusion with commutative ops #24577
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/24568
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1125
TODO:
- [x] replace recursive function to sequential
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add yolov5n to tests #24553
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ X] I agree to contribute to the project under Apache 2 License.
- [ X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ X] The PR is proposed to the proper branch
- [ X] There is a reference to the original bug report and related work
- [ X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ X] The feature is well documented and sample code can be built with the project CMake
dnn: add openvino, opencl and cuda backends for layer normalization layer #24552
Merge after https://github.com/opencv/opencv/pull/24544.
Todo:
- [x] openvino
- [x] opencl
- [x] cuda
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake