DNN(ONNX): Enabled several OpenCL conformance tests #26053
The tests also work in 5.x
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: add ONNX TopK #23279
Merge with https://github.com/opencv/opencv_extra/pull/1200
Partially fixes#22890 and #20258
To-do:
- [x] TopK forward impl
- [x] add tests
- [x] support Opset 1 & 10 if possible
- [ ] ~Support other backends~ (TopK has two outputs, which is not supported by other backends, such as openvino)
Perf:
M1 (time in millisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 1.68 | 4.07 |
| (1000, 100) K5 | 0 | 1.13 | 0.12 |
| (1000, 100) | 1 | 0.96 | 0.77 |
| (100, 100, 100) | 0 | 10.00 | 31.13 |
| (100, 100, 100) | 1 | 7.33 | 9.17 |
| (100, 100, 100) | 2 | 7.52 | 9.48 |
M2 (time in milisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 0.76 | 2.44 |
| (1000, 100) K5 | 0 | 0.68 | 0.07 |
| (1000, 100) | 1 | 0.41 | 0.50 |
| (100, 100, 100) | 0 | 4.83 | 17.52|
| (100, 100, 100) | 1 | 3.60 | 5.08 |
| (100, 100, 100) | 2 | 3.73 | 5.10 |
ONNXRuntime performance testing script: https://gist.github.com/fengyuentau/a119f94fd16721ec9974b8c7b0a45d4c
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
[GSoC] dnn: Blockwise quantization support #25644
This PR introduces blockwise quantization in DNN allowing the parsing of ONNX models quantized in blockwise style. In particular it modifies the `Quantize` and `Dequantize` operations. The related PR opencv/opencv_extra#1181 contains the test data.
Additional notes:
- The original quantization issue has been fixed. Previously, for 1D scale and zero-point, the operation applied was $y = int8(x/s - z)$ instead of $y = int8(x/s + z)$. Note that the operation was already correctly implemented when the scale and zero-point were scalars. The previous implementation failed the ONNX test cases, but now all have passed successfully. [Reference](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- the function `block_repeat` broadcasts scale and zero-point to the input shape. It repeats all the elements of a given axis n times. This function generalizes the behavior of `repeat` from the core module which is defined just for 2 axis assuming `Mat` has 2 dimensions. If appropriate and useful, you might consider moving `block_repeat` to the core module.
- Now, the scale and zero-point can be taken as layer inputs. This increases the ONNX layers' coverage and enables us to run the ONNX test cases (previously disabled) being fully compliant with ONNX standards. Since they are now supported, I have enabled the test cases for: `test_dequantizelinear`, `test_dequantizelinear_axis`, `test_dequantizelinear_blocked`, `test_quantizelinear`, `test_quantizelinear_axis`, `test_quantizelinear_blocked` just in CPU backend. All of them pass successfully.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: optimize activations with v_exp #25881
Merge with https://github.com/opencv/opencv_extra/pull/1191.
This PR optimizes the following activations:
- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish
### Performance (Updated on 2024-07-18)
#### AmLogic A311D2 (ARM Cortex A73 + A53)
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15
Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03
Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09
HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70
Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11
Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22
Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15
```
#### Apple M1
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64
Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67
Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86
Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79
Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17
HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01
Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45
Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40
Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74
Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81
Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48
Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52
```
#### Intel i7-12700K
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81
Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46
Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04
HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01
Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08
Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01
Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added v_erf and implemented gelu acceleration via vectorization
* remove anonymous v_erf and use v_erf from intrin_math
* enable perf for ov and cuda backend
Merge pull request #25861 from Abdurrahheem:ash/torch-attention-export-fix-4x
Support for Unflatten operation requred by Attention layer - 4.x #25861
### Pull Request Readiness Checklist
All test data and models for PR are located [#1190](https://github.com/opencv/opencv_extra/pull/1190)
This PR fixes issue reised when importing batched vanilla `Attention` layer from `PyTorch` via ONNX. Currently batched version of `Attention` layer in PyTorch [has unflatten operation inside](e3b3431c42/torch/nn/functional.py (L5500C17-L5500C31)). `unflatten` operation causes issue in `reshape` layer (see the Reshape_2 in the graph below) due to incorrect output of `slice` layer. This PR particularly fixes `slice` and `concat` layers to handle `unflatten` operation.
<img width="673" alt="image" src="https://github.com/opencv/opencv/assets/44877829/5b612b31-657a-47f1-83a4-0ac35a950abd">
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630
This PR introduces the following changes:
- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable
## Performance
### i7-12700K, RAM 64GB, Ubuntu 22.04
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.x64.0606 core.x64.0606 core.x64.0606
vs
opencv
perf
core.x64.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 16.116 11.161 1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 17.469 11.446 1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 17.531 11.469 1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 28.653 13.682 2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 21.899 13.422 1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 21.738 13.185 1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 16.172 11.473 1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 16.309 11.565 1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 16.166 11.454 1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 16.157 11.443 1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 163.459 15.234 10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 10.880 10.868 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 10.947 11.058 0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 10.948 10.910 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 10.874 10.871 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 10.971 10.920 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 17.546 11.462 1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 16.175 11.475 1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU 11.339 11.333 1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU 16.154 11.102 1.46
```
### Apple M1, RAM 16GB, macOS 14.4.1
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.m1.0606 core.m1.0606.patch core.m1.0606.patch
vs
opencv
perf
core.m1.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 28.418 3.768 7.54
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 6.942 5.679 1.22
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 5.822 5.653 1.03
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 5.751 5.628 1.02
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 5.797 5.599 1.04
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 7.272 5.578 1.30
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 5.777 5.562 1.04
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 5.819 5.559 1.05
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 5.830 5.574 1.05
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 5.759 5.567 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 342.260 74.655 4.58
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 8.338 8.280 1.01
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 8.359 8.309 1.01
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 8.412 8.295 1.01
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 8.380 8.297 1.01
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 8.356 8.323 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 6.818 5.561 1.23
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 5.805 5.570 1.04
NHWC_C::Layer_NaryEltwise::OCV/CPU 3.834 4.817 0.80
NHWC_H::Layer_NaryEltwise::OCV/CPU 28.402 3.771 7.53
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add sample support of YOLOv9 and YOLOv10 in OpenCV #25794
This PR adds sample support of [`YOLOv9`](https://github.com/WongKinYiu/yolov9) and [`YOLOv10`](https://github.com/THU-MIG/yolov10/tree/main)) in OpenCV. Models for this test are located in this [PR](https://github.com/opencv/opencv_extra/pull/1186).
**Running YOLOv10 using OpenCV.**
1. In oder to run `YOLOv10` one needs to cut off postporcessing with dynamic shapes from torch and then convert it to ONNX. If someone is looking for ready solution, there is [this forked branch](https://github.com/Abdurrahheem/yolov10/tree/ash/opencv-export) from official YOLOv10. Particularty follow this proceduce.
```bash
git clone git@github.com:Abdurrahheem/yolov10.git
conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
python export_opencv.py --model=<model-name> --imgsz=<input-img-size>
```
By default `model="yolov10s"` and `imgsz=(480,640)`. This will generate file `yolov10s.onnx`, which can be use for inference in OpenCV
2. For inference part on OpenCV. one can use `yolo_detector.cpp` [sample](https://github.com/opencv/opencv/blob/4.x/samples/dnn/yolo_detector.cpp). If you have followed above exporting procedure, then you can use following command to run the model.
``` bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov10s.onnx-file> --yolo=yolov10 --width=640 --height=480 --input=<path-to-image> --scale=0.003921568627 --padvalue=114
```
If you do not specify `--input` argument, OpenCV will grab first camera that is avaliable on your platform.
For more deatils on how to run the `yolo_detector.cpp` file see this [guide](https://docs.opencv.org/4.x/da/d9d/tutorial_dnn_yolo.html#autotoc_md443)
**Running YOLOv9 using OpenCV**
1. Export model following [official guide](https://github.com/WongKinYiu/yolov9)of the YOLOv9 repository. Particularly you can do following for converting.
```bash
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9
conda create -n yolov9 python=3.9
conda activate yolov9
pip install -r requirements.txt
wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-t-converted.pt
python export.py --weights=./yolov9-t-converted.pt --include=onnx --img-size=(480,640)
```
This will generate <yolov9-t-converted.onnx> file.
2. Inference on OpenCV.
```bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov9-t-converted.onnx> --yolo=yolov9 --width=640 --height=480 --scale=0.003921568627 --padvalue=114 --path=<path-to-image>
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: add DepthToSpace and SpaceToDepth #25779
We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However, it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes.
Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth.
Backend support:
- [x] CPU
- [x] CUDA
- [x] OpenCL
- [x] OpenVINO
- [x] CANN
- [x] TIMVX
- ~Vulkan~ (missing fundamental tools, like permutation and reshape)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Support Global_Pool_2D ops in .tflite model #25613
### Pull Request Readiness Checklist
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1180
This PR adds support for `GlobalAveragePooling2D` and `GlobalMaxPool2D` on the TFlite backend. When the k`eep_dims` option is enabled, the output is a 2D tensor, necessitating the inclusion of an additional flatten layer. Additionally, the names of these layers have been updated to match the output tensor names generated by `generate.py` from the opencv_extra repository.
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
Support Transpose op in TFlite #25297
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1168
The purpose of this PR is to introduce support for the Transpose op in TFlite format and to add a shape comparison between the output tensors and the references. In some occasional cases, the shape of the output tensor is `[1,4,1,1]`, while the shape of the reference tensor is `[1,4]`. Consequently, the norm check incorrectly reports that the test has passed, as the residual is zero.
Below is a Python script for generating testing data. The generated data can be integrated into the repo `opencv_extra`.
```python
import numpy as np
import tensorflow as tf
PREFIX_TFL = '/path/to/opencv_extra/testdata/dnn/tflite/'
def generator(input_tensor, model, saved_name):
# convert keras model to .tflite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
#converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.optimizations = [None]
tflite_model = converter.convert()
with open(f'{PREFIX_TFL}/{saved_name}.tflite', 'wb') as f:
f.write(tflite_model)
# save the input tensor to .npy
if input_tensor.ndim == 4:
opencv_tensor = np.transpose(input_tensor, (0,3,1,2))
else:
opencv_tensor = input_tensor
opencv_tensor = np.copy(opencv_tensor, order='C').astype(np.float32)
np.save(f'{PREFIX_TFL}/{saved_name}_inp.npy', opencv_tensor)
# generate output tenosr and save it to .npy
mat_out = model(input_tensor).numpy()
mat_out = np.copy(mat_out, order='C').astype(np.float32)
if mat_out.ndim == 4:
mat_out = np.transpose(mat_out, (0,3,1,2))
interpreter = tf.lite.Interpreter(model_content=tflite_model)
out_name = interpreter.get_output_details()[0]['name']
np.save(f'{PREFIX_TFL}/{saved_name}_out_{out_name}.npy', mat_out)
def build_transpose():
model_name = "keras_permute"
mat_in = np.array([[[1,2,3], [4,5,6]]], dtype=np.float32)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(2,3)))
model.add(tf.keras.layers.Permute((2,1)))
model.summary()
generator(mat_in, model, model_name)
if __name__ == '__main__':
build_transpose()
```
### Pull Request Readiness Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
Fixed OpenVINO gemm layer #25518
Fixed OpenVINO gemm layer
The problem was that our layer didn't properly handle all the possible gemm options in OpenVINO mode
Fixes#25472
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixed ONNX range layer #25414
Partially address https://github.com/opencv/opencv/issues/25363
Fixed ONNX range layer. It should support any input type.
Added tests (extra [PR](https://github.com/opencv/opencv_extra/pull/1170))
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
[BugFix] dnn (ONNX): Foce dropping constant inputs in parseClip if they are shared #25319
Resolves https://github.com/opencv/opencv/issues/25278
Merge with https://github.com/opencv/opencv_extra/pull/1165
In Gold-YOLO ,`Div` has a constant input `B=6` which is then parsed into a `Const` layer in the ONNX importer, but `Clip` also has the shared constant input `max=6` which is already a `Const` layer and then connected to `Elementwise` layer. This should not happen because in the `forward()` of `Elementwise` layer, the legacy code goes through and apply activation to each input. More details on https://github.com/opencv/opencv/issues/25278#issuecomment-2032199630.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Ownership check in TFLite importer #25312
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/25310
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Merge with https://github.com/opencv/opencv_extra/pull/1158
Todo:
- [x] Fix Attention pattern recognition.
- [x] Handle other backends.
Benchmark:
"VIT_B_32 OCV/CPU", M1, results in milliseconds.
| Model | 4.x | This PR |
| - | - | - |
| VIT_B_32 OCV/CPU | 87.66 | **83.83** |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Release convolution weightsMat after usage #25181
### Pull Request Readiness Checklist
related (but not resolved): https://github.com/opencv/opencv/issues/24134
Minor memory footprint improvement. Also, adds a test for VmHWM.
RAM top memory usage (-230MB)
| YOLOv3 (237MB file) | 4.x | PR |
|---------------------|---------|---------|
| no winograd | 808 MB | 581 MB |
| winograd | 1985 MB | 1750 MB |
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixed ReduceMean layer behaviour #25120
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)
Removed all pre-C++11 code, workarounds, and branches #23736
This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.
* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.
---------
Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
dnn onnx: add group norm layer #24610
dnn onnx: add group norm layer
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808
Merge with https://github.com/opencv/opencv_extra/pull/1137
Resolves https://github.com/opencv/opencv/issues/24797
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add mod #24765
Resolves https://github.com/opencv/opencv/issues/23174
TODO:
- [x] enable some conformance tests
- [x] add backends
- [x] CANN
- [x] OpenVINO
- [x] CUDA
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn onnx: support constaint inputs in einsum importer #24753
Merge with https://github.com/opencv/opencv_extra/pull/1132.
Resolves https://github.com/opencv/opencv/issues/24697
Credits to @LaurentBerger.
---
This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixes#22747. Support [crop] configuration for DarkNet #24384
Request for comments. This is my first PR.
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1112
resolves https://github.com/opencv/opencv/issues/22747
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn: add attention layer #24476Resolves#24609
Merge with: https://github.com/opencv/opencv_extra/pull/1128.
Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
- [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack
## Benchmarks
Platform: Macbook Air M1.
### Attention Subgraph
Input scale: [1, 197, 768].
| | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75 | 3.68 | 3.22 |
| w/o Attention | 9.06 | 9.01 | 8.24 |
| ORT (python) | 4.32 | 2.63 | 2.50 |
### ViTs
All data in millisecond (ms).
| ViTs | With Attention | Without Attention | ORT |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77 | 365.35 | 109.70 |
| vit_b_32 | 89.92 | 116.22 | 30.36 |
| vit_l_16 | 1593.32 | 1730.74 | 419.92 |
| vit_l_32 | 468.11 | 577.41 | 134.12 |
| VitTrack | 3.80 | 3.87 | 2.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add blobrecttoimage #24539
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/14659
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #14659
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake