Modified Caffe parser to support the new dnn engine #26208
Now the Caffe parser supports both the old and the new engine. It can be selected using newEngine argument in PopulateNet.
All cpu Caffe tests work fine except:
- Test_Caffe_nets.Colorization
- Test_Caffe_layers.FasterRCNN_Proposal
Both these tests doesn't work because of the bug in the new net.forward function. The function takes the name of the desired target last layer, but uses this name as the name of the desired output tensor.
Also Colorization test contains a strange model with a Silence layer in the end, so it doesn't have outputs. The old parser just ignored it. I think, the proper solution is to run this model until the (number_of_layers - 2) layer using proper net.forward arguments in the test.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Modified TFLite parser for the new dnn engine #26330
The new dnn graph is creating just by defining input and output names of each layer.
Some TFLite layers has fused activation, which doesn't have layer name and input and output names. Also some layers require additional preprocessing layers (e.g. NHWC -> NCHW). All these layers should be added to the graph with some unique layer and input and output names.
I solve this problem by adding additionalPreLayer and additionalPostLayer layers.
If a layer has a fused activation, I add additionalPostLayer and change input and output names this way:
**original**: conv_relu(conv123, conv123_input, conv123_output)
**new**: conv(conv123, conv123_input, conv123_output_additional_post_layer) + relu(conv123_relu, conv1_output_additional_post_layer, conv123_output)
If a layer has additional preprocessing layer, I change input and output names this way:
**original**: permute_reshape(reshape345, reshape345_input, reshape345_output)
**new**: permute(reshape345_permute, reshape345_input, reshape345_input_additional_pre_layer) + reshape(reshape345, reshape345_input_additional_pre_layer, reshape345_output)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
New dnn engine #26056
This is the 1st PR with the new engine; CI is green and PR is ready to be merged, I think.
Merge together with https://github.com/opencv/opencv_contrib/pull/3794
---
**Known limitations:**
* [solved] OpenVINO is temporarily disabled, but is probably easy to restore (it's not a deal breaker to merge this PR, I guess)
* The new engine does not support any backends nor any targets except for the default CPU implementation. But it's possible to choose the old engine when loading a model, then all the functionality is available.
* [Caffe patch is here: #26208] The new engine only supports ONNX. When a model is constructed manually or is loaded from a file of different format (.tf, .tflite, .caffe, .darknet), the old engine is used.
* Even in the case of ONNX some layers are not supported by the new engine, such as all quantized layers (including DequantizeLinear, QuantizeLinear, QLinearConv etc.), LSTM, GRU, .... It's planned, of course, to have full support for ONNX by OpenCV 5.0 gold release. When a loaded model contains unsupported layers, we switch to the old engine automatically (at ONNX parsing time, not at `forward()` time).
* Some layers , e.g. Expat, are only partially supported by the new engine. In the case of unsupported flavours it switches to the old engine automatically (at ONNX parsing time, not at `forward()` time).
* 'Concat' graph optimization is disabled. The optimization eliminates Concat layer and instead makes the layers that generate tensors to be concatenated to write the outputs to the final destination. Of course, it's only possible when `axis=0` or `axis=N=1`. The optimization is not compatible with dynamic shapes since we need to know in advance where to store the tensors. Because some of the layer implementations have been modified to become more compatible with the new engine, the feature appears to be broken even when the old engine is used.
* Some `dnn::Net` API is not available with the new engine. Also, shape inference may return false if some of the output or intermediate tensors' shapes cannot be inferred without running the model. Probably this can be fixed by a dummy run of the model with zero inputs.
* Some overloads of `dnn::Net::getFLOPs()` and `dnn::Net::getMemoryConsumption()` are not exposed any longer in wrapper generators; but the most useful overloads are exposed (and checked by Java tests).
* [in progress] A few Einsum tests related to empty shapes have been disabled due to crashes in the tests and in Einsum implementations. The code and the tests need to be repaired.
* OpenCL implementation of Deconvolution is disabled. It's very bad and very slow anyway; need to be completely revised.
* Deconvolution3D test is now skipped, because it was only supported by CUDA and OpenVINO backends, both of which are not supported by the new engine.
* Some tests, such as FastNeuralStyle, checked that the in the case of CUDA backend there is no fallback to CPU. Currently all layers in the new engine are processed on CPU, so there are many fallbacks. The checks, therefore, have been temporarily disabled.
---
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
❗ Potential conflicts with #25958
C-API cleanup: highgui, videoio #26025❗ Merge with: opencv/opencv_contrib#3780
This PR removes usage of C-API from highgui and videoio modules. Only source code is affected, tests were not using obsolete API.
It should be possible to backport these changes to 4.x branch preserving removed public headers and source files (`*_c.h` and `*_c.cpp`).
#### Checklist
I tried to verify as many backends as possible, though these checks were not as thorough as I'd like them to be. Below is the checklist covering all modified backends with their statuses.
> 🔹 - small changes
> 🟢 - consider working
> ⚪ - considered untested
##### highgui
Pass | Backend | Local check | CI check
-----|---------|-------------|---------
🟢 | GTK2 | build + test, plugin build | build + test ❔🟢 | GTK3 | build + test, plugin build | build + test
🟢 | QT | build + test, plugin build |
⚪ | Wayland 🔹 | |
🟢 | WIN32 🔹 | | build + test
🟢 | Cocoa 🔹 | | build + test
⚪ | WinRT | |
##### videoio
Pass | Backend | Local check | CI check
-----|---------|-------------|---------
🟢 | Android Camera/MediaNDK 🔹 | | build
🟢 | Aravis | build |
🟢 | AVFoundation OSX | | build + test
⚪ | AVFoundation iOS | | build
🟢 | DC1394 | build |
🟢 | DShow 🔹 | | build
🟢 | FFMpeg | build, plugin build | build + test
🟢 | GPhoto 🔹 | build |
🟢 | GStreamer | build, plugin build | build + test
🟢 | Images | build | build + test
🟢 | MSMF 🔹 | | build + test
🟢 | OpenNI | build |
🟢 | PVAPI | build |
🟢 | V4L | build + test | build
🟢 | XIMEA | build |
🟢 | XINE 🔹 | build |
#### Notes
- local linux build checks performed using [this framework](https://github.com/mshabunin/opencv-videoio-build-check)
- minor extra changes made in both `cap_avfoundation*.mm` to make them slightly more synchronized - it would be better to combine them into a single one in the future
- configurations with plugins have been build but not tested
- **moved unrelated changes to separate PRs** ~two issues have been fixed in separate commits:~
- ~imgproc: missing `cv::hal::` color conversion functions has been used in MediaSDK backend~
- ~videoio/V4L: wrong color conversion mode caused bad colors for NV12 camera input format (RGB instead of BGR)~
It would be nice to check following functionality manually:
- [ ] OSX: camera input
- [ ] iOS: camera and file input
- [ ] WinRT: build, some testing
- [x] Linux/Wayland: build
dnn(5.x): handle topk data type #26124
Resolves https://github.com/opencv/opencv/issues/26076
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Support for GatherND layer #26106
This PR adds support for GatherND layer. The layer was in comformance deny list initially.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Switch off OpenVINO backend for Hardmax #26110
Fix Hardmax Issue with backend mentioned in #26107
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
#25006#25314
This pull request removes hed_pretrained caffe model to the SOTA dexined onnx model for edge detection. Usage of conventional methods like canny has also been added
The obsolete cpp and python sample has been removed
TODO:
- [ ] Remove temporary hack for quantized models. Refer issue https://github.com/opencv/opencv_zoo/issues/273
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add Support for Hardmax Layer #26079
This PR add support for `Hardmax` layer, which as previously listed in conformance deny list.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
DNN(ONNX): Enabled several OpenCL conformance tests #26053
The tests also work in 5.x
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Einsum buffer allocation fix#26059
This PR fixed buffer allocation issue in Einsum layer that causes segmentation fault on 32bit platforms. Related issue #26008
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Split Javascript white-list to support contrib modules #25986
Single whitelist converted to several per-module json files. They are concatenated automatically and can be overriden by user config.
Related to https://github.com/opencv/opencv/pull/25656
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn: add ONNX TopK #23279
Merge with https://github.com/opencv/opencv_extra/pull/1200
Partially fixes#22890 and #20258
To-do:
- [x] TopK forward impl
- [x] add tests
- [x] support Opset 1 & 10 if possible
- [ ] ~Support other backends~ (TopK has two outputs, which is not supported by other backends, such as openvino)
Perf:
M1 (time in millisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 1.68 | 4.07 |
| (1000, 100) K5 | 0 | 1.13 | 0.12 |
| (1000, 100) | 1 | 0.96 | 0.77 |
| (100, 100, 100) | 0 | 10.00 | 31.13 |
| (100, 100, 100) | 1 | 7.33 | 9.17 |
| (100, 100, 100) | 2 | 7.52 | 9.48 |
M2 (time in milisecond)
| input shape | axis | dnn | ort |
| --------------- | ---- | ---- | ---- |
| (1000, 100) | 0 | 0.76 | 2.44 |
| (1000, 100) K5 | 0 | 0.68 | 0.07 |
| (1000, 100) | 1 | 0.41 | 0.50 |
| (100, 100, 100) | 0 | 4.83 | 17.52|
| (100, 100, 100) | 1 | 3.60 | 5.08 |
| (100, 100, 100) | 2 | 3.73 | 5.10 |
ONNXRuntime performance testing script: https://gist.github.com/fengyuentau/a119f94fd16721ec9974b8c7b0a45d4c
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add support for boolan input/outputs in python bindings #26026
This PR add support boolean input/output binding in python. The issue what mention in ticket https://github.com/opencv/opencv/issues/26024 and the PR soleves it. Data and models are located in [here](https://github.com/opencv/opencv_extra/pull/1201)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
[GSoC] dnn: Blockwise quantization support #25644
This PR introduces blockwise quantization in DNN allowing the parsing of ONNX models quantized in blockwise style. In particular it modifies the `Quantize` and `Dequantize` operations. The related PR opencv/opencv_extra#1181 contains the test data.
Additional notes:
- The original quantization issue has been fixed. Previously, for 1D scale and zero-point, the operation applied was $y = int8(x/s - z)$ instead of $y = int8(x/s + z)$. Note that the operation was already correctly implemented when the scale and zero-point were scalars. The previous implementation failed the ONNX test cases, but now all have passed successfully. [Reference](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- the function `block_repeat` broadcasts scale and zero-point to the input shape. It repeats all the elements of a given axis n times. This function generalizes the behavior of `repeat` from the core module which is defined just for 2 axis assuming `Mat` has 2 dimensions. If appropriate and useful, you might consider moving `block_repeat` to the core module.
- Now, the scale and zero-point can be taken as layer inputs. This increases the ONNX layers' coverage and enables us to run the ONNX test cases (previously disabled) being fully compliant with ONNX standards. Since they are now supported, I have enabled the test cases for: `test_dequantizelinear`, `test_dequantizelinear_axis`, `test_dequantizelinear_blocked`, `test_quantizelinear`, `test_quantizelinear_axis`, `test_quantizelinear_blocked` just in CPU backend. All of them pass successfully.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: optimize activations with v_exp #25881
Merge with https://github.com/opencv/opencv_extra/pull/1191.
This PR optimizes the following activations:
- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish
### Performance (Updated on 2024-07-18)
#### AmLogic A311D2 (ARM Cortex A73 + A53)
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15
Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03
Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09
HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70
Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11
Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22
Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15
```
#### Apple M1
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64
Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67
Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86
Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79
Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17
HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01
Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45
Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40
Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74
Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81
Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48
Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52
```
#### Intel i7-12700K
```
Geometric mean (ms)
Name of Test activations activations.patch activations.patch
vs
activations
(x-factor)
Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81
Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46
Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04
HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01
Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08
Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01
Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Upgrade RISC-V Vector intrinsic and cleanup the obsolete RVV backend. #25883
This patch upgrade RISC-V Vector intrinsic from `v0.10` to `v0.12`/`v1.0`:
- Update cmake check and options;
- Upgrade RVV implement for Universal Intrinsic;
- Upgrade RVV optimized DNN kernel.
- Cleanup the obsolete RVV backend (`intrin_rvv.hpp`) and compatable header file.
With this patch, RVV backend require Clang 17+ or GCC 14+ (which means `__riscv_v_intrinsic >= 12000`, see https://godbolt.org/z/es7ncETE3)
This patch is test with Clang 17.0.6 (require extra `-DWITH_PNG=OFF` due to ICE), Clang 18.1.8 and GCC 14.1.0 on QEMU and k230 (with `--gtest_filter="*hal_*"`).
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
code clean #25931
Align code and remove redundant CMake code
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add sample for GPT2 inference #25868
### Pull Request Readiness Checklist
This PR adds sample for inferencing GPT-2 model. More specificly implementation of GPT-2 from [this repository](https://github.com/karpathy/build-nanogpt). Currently inference in OpenCV is only possible to do with fixed window size due to not supported dynamic shapes.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
1D test for Reduce layer #25101
This PR introduces test for `Reduce` layer to test its functionality for 1D arrays
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix Reduce layer for cosnt inputs #25917
### Pull Request Readiness Checklist
This PR adds support for const inputs for reducing the layer. Particularly, it fixes the following case. The test model and data are located in [1194](https://github.com/opencv/opencv_extra/pull/1194)
<img width="190" alt="image" src="https://github.com/user-attachments/assets/45a90f0a-b798-4529-bece-24c7bfc9e7ba">
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: merge #25630 to 5.x #25900
Sync changes from https://github.com/opencv/opencv/pull/25630 to 5.x.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* added v_erf and implemented gelu acceleration via vectorization
* remove anonymous v_erf and use v_erf from intrin_math
* enable perf for ov and cuda backend
Merge pull request #25861 from Abdurrahheem:ash/torch-attention-export-fix-4x
Support for Unflatten operation requred by Attention layer - 4.x #25861
### Pull Request Readiness Checklist
All test data and models for PR are located [#1190](https://github.com/opencv/opencv_extra/pull/1190)
This PR fixes issue reised when importing batched vanilla `Attention` layer from `PyTorch` via ONNX. Currently batched version of `Attention` layer in PyTorch [has unflatten operation inside](e3b3431c42/torch/nn/functional.py (L5500C17-L5500C31)). `unflatten` operation causes issue in `reshape` layer (see the Reshape_2 in the graph below) due to incorrect output of `slice` layer. This PR particularly fixes `slice` and `concat` layers to handle `unflatten` operation.
<img width="673" alt="image" src="https://github.com/opencv/opencv/assets/44877829/5b612b31-657a-47f1-83a4-0ac35a950abd">
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
python: attempts to fix 3d mat parsing problem for dnn #25810
Fixes https://github.com/opencv/opencv/issues/25762https://github.com/opencv/opencv/issues/23242
Relates https://github.com/opencv/opencv/issues/25763https://github.com/opencv/opencv/issues/19091
Although `cv.Mat` has already been introduced to workaround this problem, people do not know it and it kind of leads to confusion with `numpy.array`. This patch adds a "switch" to turn off the auto multichannel feature when the API is from cv::dnn::Net (more specifically, `setInput`) and the parameter is of type `Mat`. This patch only leads to changes of three places in `pyopencv_generated_types_content.h`:
```.diff
static PyObject* pyopencv_cv_dnn_dnn_Net_setInput(PyObject* self, PyObject* py_args, PyObject* kw)
{
...
- pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 0)) &&
+ pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 8)) &&
...
}
// I guess we also need to change this as one-channel blob is expected for param
static PyObject* pyopencv_cv_dnn_dnn_Net_setParam(PyObject* self, PyObject* py_args, PyObject* kw)
{
...
- pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 0)) )
+ pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 8)) )
...
- pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 0)) )
+ pyopencv_to_safe(pyobj_blob, blob, ArgInfo("blob", 8)) )
...
}
```
Others are unchanged, e.g. `dnn_SegmentationModel` and stuff like that.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630
This PR introduces the following changes:
- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable
## Performance
### i7-12700K, RAM 64GB, Ubuntu 22.04
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.x64.0606 core.x64.0606 core.x64.0606
vs
opencv
perf
core.x64.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 16.116 11.161 1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 17.469 11.446 1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 17.531 11.469 1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 28.653 13.682 2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 21.899 13.422 1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 21.738 13.185 1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 16.172 11.473 1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 16.309 11.565 1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 16.166 11.454 1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 16.157 11.443 1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 163.459 15.234 10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 10.880 10.868 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 10.947 11.058 0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 10.948 10.910 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 10.874 10.871 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 10.971 10.920 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 17.546 11.462 1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 16.175 11.475 1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU 11.339 11.333 1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU 16.154 11.102 1.46
```
### Apple M1, RAM 16GB, macOS 14.4.1
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.m1.0606 core.m1.0606.patch core.m1.0606.patch
vs
opencv
perf
core.m1.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 28.418 3.768 7.54
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 6.942 5.679 1.22
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 5.822 5.653 1.03
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 5.751 5.628 1.02
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 5.797 5.599 1.04
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 7.272 5.578 1.30
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 5.777 5.562 1.04
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 5.819 5.559 1.05
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 5.830 5.574 1.05
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 5.759 5.567 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 342.260 74.655 4.58
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 8.338 8.280 1.01
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 8.359 8.309 1.01
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 8.412 8.295 1.01
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 8.380 8.297 1.01
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 8.356 8.323 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 6.818 5.561 1.23
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 5.805 5.570 1.04
NHWC_C::Layer_NaryEltwise::OCV/CPU 3.834 4.817 0.80
NHWC_H::Layer_NaryEltwise::OCV/CPU 28.402 3.771 7.53
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add sample support of YOLOv9 and YOLOv10 in OpenCV #25794
This PR adds sample support of [`YOLOv9`](https://github.com/WongKinYiu/yolov9) and [`YOLOv10`](https://github.com/THU-MIG/yolov10/tree/main)) in OpenCV. Models for this test are located in this [PR](https://github.com/opencv/opencv_extra/pull/1186).
**Running YOLOv10 using OpenCV.**
1. In oder to run `YOLOv10` one needs to cut off postporcessing with dynamic shapes from torch and then convert it to ONNX. If someone is looking for ready solution, there is [this forked branch](https://github.com/Abdurrahheem/yolov10/tree/ash/opencv-export) from official YOLOv10. Particularty follow this proceduce.
```bash
git clone git@github.com:Abdurrahheem/yolov10.git
conda create -n yolov10 python=3.9
conda activate yolov10
pip install -r requirements.txt
python export_opencv.py --model=<model-name> --imgsz=<input-img-size>
```
By default `model="yolov10s"` and `imgsz=(480,640)`. This will generate file `yolov10s.onnx`, which can be use for inference in OpenCV
2. For inference part on OpenCV. one can use `yolo_detector.cpp` [sample](https://github.com/opencv/opencv/blob/4.x/samples/dnn/yolo_detector.cpp). If you have followed above exporting procedure, then you can use following command to run the model.
``` bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov10s.onnx-file> --yolo=yolov10 --width=640 --height=480 --input=<path-to-image> --scale=0.003921568627 --padvalue=114
```
If you do not specify `--input` argument, OpenCV will grab first camera that is avaliable on your platform.
For more deatils on how to run the `yolo_detector.cpp` file see this [guide](https://docs.opencv.org/4.x/da/d9d/tutorial_dnn_yolo.html#autotoc_md443)
**Running YOLOv9 using OpenCV**
1. Export model following [official guide](https://github.com/WongKinYiu/yolov9)of the YOLOv9 repository. Particularly you can do following for converting.
```bash
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9
conda create -n yolov9 python=3.9
conda activate yolov9
pip install -r requirements.txt
wget https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-t-converted.pt
python export.py --weights=./yolov9-t-converted.pt --include=onnx --img-size=(480,640)
```
This will generate <yolov9-t-converted.onnx> file.
2. Inference on OpenCV.
```bash
build opencv from source
cd build
./bin/example_dnn_yolo_detector --model=<path-to-yolov9-t-converted.onnx> --yolo=yolov9 --width=640 --height=480 --scale=0.003921568627 --padvalue=114 --path=<path-to-image>
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add support for v_exp (exponential) #24941
This PR aims to implement `v_exp(v_float16 x)`, `v_exp(v_float32 x)` and `v_exp(v_float64 x)`.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Added more types support to dnn layers #25755
Added support of more types to dnn layers for CPU, CUDA and OpenVINO backends.
Now most of the multi-type layers support uint8, int8, int32, int64, float32, float16, bool types.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixed blank layer for OpenVINO 2022.1 #25739
Changed blank layer because it didn't work with old OpenVINO versions(2022.1). The blank layer was implemented using ConvertLike layer, now it is implemented using ShapeOf and Reshape layers.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake