bug fix infinite loop #24987Fixes#24967
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Bugfix to #24967
* started adding support for new types (16f, 16bf, 32u, 64u, 64s) to arithmetic functions
* fixed several tests; refactored and extended sum(), extended inRange().
* extended countNonZero(), mean(), meanStdDev(), minMaxIdx(), norm() and sum() to support new types (F16, BF16, U32, U64, S64)
* put missing CV_DEPTH_MAX to some function dispatcher tables
* extended findnonzero, hasnonzero with the new types support
* extended mixChannels() to support new types
* minor fix
* fixed a few compile errors on Linux and a few failures in core tests
* fixed a few more warnings and test failures
* trying to fix the remaining warnings and test failures. The test `MulTestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations.
* found that in the current snapshot G-API produces incorrect results in Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.
Allow multiple flags with OPENCV_GRADLE_VERBOSE_OPTIONS #24969
### Pull Request Readiness Checklist
Merge with https://github.com/opencv/ci-gha-workflow/pull/144
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix bug in ChessBoardDetector::findQuadNeighbors #24779
### Pull Request Readiness Checklist
`corners` and `neighbors` indices means not filling order, but relative position. So, for example if `quad->count = 2`, it doesn't mean that `quad->neighbors[0]` and `quad->neighbors[1]` are filled. And we should should iterate over all four `neighbors`.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
QR codes Structured Append decoding mode #24548
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/23245
Merge after https://github.com/opencv/opencv/pull/24299
Current proposal is to use `detectAndDecodeMulti` or `decodeMulti` for structured append mode decoding. 0-th QR code in a sequence gets a full message while the rest of codes will correspond to empty strings.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Modified Java tests to run on Android #24910
To run the tests you need to:
1. Build OpenCV using Android pipeline. For example:
`cmake -DBUILD_TEST=ON -DANDROID=ON -DANDROID_ABI=arm64-v8a -DCMAKE_TOOLCHAIN_FILE=/usr/lib/android-sdk/ndk/25.1.8937393/build/cmake/android.toolchain.cmake -DANDROID_NDK=/usr/lib/android-sdk/ndk/25.1.8937393 -DANDROID_SDK=/usr/lib/android-sdk ../opencv`
`make`
2. Connect Android Phone
3. Run tests:
`cd android_tests`
`./gradlew tests_module:connectedAndroidTest`
Related CI pipeline: https://github.com/opencv/ci-gha-workflow/pull/138
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
G-API: Implement concurrent executor #24845
## Overview
This PR introduces the new G-API executor called `GThreadedExecutor` which can be selected when the `GComputation` is compiled in `serial` mode (a.k.a `GComputation::compile(...)`)
### ThreadPool
`cv::gapi::own::ThreadPool` has been introduced in order to abstract usage of threads in `GThreadedExecutor`.
`ThreadPool` is implemented by using `own::concurrent_bounded_queue`
`ThreadPool` has only as single method `schedule` that will push task into the queue for the further execution.
The **important** notice is that if `Task` executed in `ThreadPool` throws exception - this is `UB`.
### GThreadedExecutor
The `GThreadedExecutor` is mostly copy-paste of `GExecutor`, should we extend `GExecutor` instead?
#### Implementation details
1. Build the dependency graph for `Island` nodes.
2. Store the tasks that don't have dependencies into separate `vector` in order to run them first.
3. at the `GThreadedExecutor::run()` schedule the tasks that don't have dependencies that will schedule their dependents and wait for the completion.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Vulkan backend for NaryEltwiseLayer in DNN module #24768
We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:
- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
- Wrong info output in ``context.cpp``
Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by
- find out the best ``VkMemoryProperty`` for various discrete GPUs
- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: IskXCr <IskXCr@outlook.com>
Handle warnings in loongson-related code #24925
See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16
Warnings needs to be handled before we add the loongson server to our CI.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
core(OpenCL): optimize convertTo() with CV_16F (convertFp16() replacement) #24918
relates #24909
relates #24917
relates #24892
Performance changes:
- [x] 12700K (1 thread) + Intel iGPU
|Name of Test|noOCL|convertFp16|convertTo BASE|convertTo PATCH|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.152|3.127|3.136|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|3.996|3.007|2.671|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|3.101|3.056|2.854|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|3.298|2.072|2.061|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.652|2.723|2.721|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.268|2.662|2.947|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|2.601|2.603|2.528|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|3.215|1.999|1.988|
Patched version is not worse than convertFp16 and convertTo baseline (except MatUMat 32->16, baseline uses CPU code+dst buffer map).
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization).
- [x] 12700K + AMD dGPU
|Name of Test|noOCL|convertFp16 dGPU|convertTo BASE dGPU|convertTo PATCH dGPU|
|---|:-:|:-:|:-:|:-:|
|ConvertFP16FP32MatMat::OCL_Core|3.130|3.133|3.172|3.087|
|ConvertFP16FP32MatUMat::OCL_Core|3.030|1.713|9.559|1.729|
|ConvertFP16FP32UMatMat::OCL_Core|3.010|6.515|6.309|4.452|
|ConvertFP16FP32UMatUMat::OCL_Core|3.016|0.242|23.597|0.170|
|ConvertFP32FP16MatMat::OCL_Core|2.697|2.641|2.713|2.689|
|ConvertFP32FP16MatUMat::OCL_Core|2.752|4.076|6.483|4.191|
|ConvertFP32FP16UMatMat::OCL_Core|2.706|9.042|16.481|1.834|
|ConvertFP32FP16UMatUMat::OCL_Core|2.704|0.229|15.730|0.176|
convertTo-baseline can't compile OpenCL kernel for FP16 properly - FIXED.
dGPU has much more power, so results are x16-17 better than single cpu core.
Patched version is not worse than convertFp16 and convertTo baseline.
There are still gaps against noOpenCL(CPU only) mode due to T-API implementation issues (unnecessary synchronization) and required memory transfers.
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Removed all pre-C++11 code, workarounds, and branches #23736
This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.
* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.
---------
Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
Added screen rotation support to JavaCamera2View amd NativeCameraView. Fixed JavaCamera2View initialization. #24869
Added automatic image rotation to JavaCamera2View and NativeCameraView so the video preview was matched with screen orientation.
Fixed double preview initialization bug in JavaCamera2View.
Added proper cameraID parsing to NativeCameraView similar to JavaCameraView
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
- intrinsics implementation (071) reworked to use modern RVV intrinsics syntax
- cmake toolchain file (071) now allows selecting from predefined configurations
Co-authored-by: Fang Sun <fangsun@linux.alibaba.com>
Make \epsilon parameter accessible in VariationalRefinement #24852Resolves#24847
I believe this is necessary to expose \epsilon parameter.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
python: accept path-like objects wherever file names are expected #24773
Merry Christmas, all 🎄
Implements #15731
Support is enabled for all arguments named `filename` or `filepath` (case-insensitive), or annotated with `CV_WRAP_FILE_PATH`.
Support is based on `PyOS_FSPath`, which is available in Python 3.6+. When running on older Python versions the arguments must have a `str` value as before.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add group norm layer #24610
dnn onnx: add group norm layer
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
Replace interactive batched Matrix Multiply. #24812
This PR replaces iterative batch matrix multiplication which `FastGemmBatch` in Einsum layer.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808
Merge with https://github.com/opencv/opencv_extra/pull/1137
Resolves https://github.com/opencv/opencv/issues/24797
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add mod #24765
Resolves https://github.com/opencv/opencv/issues/23174
TODO:
- [x] enable some conformance tests
- [x] add backends
- [x] CANN
- [x] OpenVINO
- [x] CUDA
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
FreeBSD does not have the /proc file system. FreeBSD was added to the code path
for aarch64 before the use of the /proc file system with f7b4b750d8
but then /proc usage was added not long after with b3269b08a1
- Added JavaDoc package build and publishing
- Added Source package build and publishing
- More metadata for publishing
- Disable native samples build with aar, because prefab is not complete yet
dnn onnx: support constaint inputs in einsum importer #24753
Merge with https://github.com/opencv/opencv_extra/pull/1132.
Resolves https://github.com/opencv/opencv/issues/24697
Credits to @LaurentBerger.
---
This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix to convert float32 to int32/uint32 with rounding to nearest (ties to even). #24271
Fix https://github.com/opencv/opencv/issues/24163
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
(carotene is BSD)
Fix mismatch and simplify code in ChessBoardDetector::findQuadNeighbors #24667
### Pull Request Readiness Checklist
Сode doesn't match comment.
If we want check `1:4` edges ratio and `edge_len` is squared edge length, then we should check
```
ediff > 15*edge_len
```
with constant `15`, not `32`, because
```
ediff > 15*edge_len2 <=> edge_len1 - edge_len2 > 15*edge_len2 <=> edge_len1 > 16*edge_len2 <=> 1:4 edges ratio
```
But for me it's better and simpler to directly check `edge_len1 > 16*edge_len2`
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Currently, if `PNG_FOUND`, cmake scripts will check include and parse
header while we can use `PNG_VERSION_STRING` conveniently. If
`BUILD_PNG`, parse version from `PNG_LIBPNG_VER_STRING` directly is more
convenient than parsing major, minor and patch and concatenate them.
The comment of png.h also supports this.
```
/* These should match the first 3 components of PNG_LIBPNG_VER_STRING: */
```
https://github.com/glennrp/libpng/blob/libpng16/png.h#L287
This patch also modifies `ocv_parse_header_version` macro to receive
another parameter to make it more general.
The reason why changing `PNG_VERSION` to `PNG_VERSION_STRING` is to be
consistent with cmake's FindPNG.
This patch removes `HAVE_LIBPNG_PNG_H` variable because `PNG_INCLUDE_DIR`
is where to find png.h, etc according to
https://cmake.org/cmake/help/latest/module/FindPNG.html.
This patch also removes `PNG_PNG_INCLUDE_DIR` variable which is an
advanced variable used in cmake's FindPNG and is not used in opencv.
Fixes#22747. Support [crop] configuration for DarkNet #24384
Request for comments. This is my first PR.
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1112
resolves https://github.com/opencv/opencv/issues/22747
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Try to enable Winograd by default in FP32 mode and disable it by default in FP16 mode #24709
Hopefully, it will resolve regressions since 4.8.1 (see also https://github.com/opencv/opencv/pull/24587)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn: add attention layer #24476Resolves#24609
Merge with: https://github.com/opencv/opencv_extra/pull/1128.
Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.
TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
- [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack
## Benchmarks
Platform: Macbook Air M1.
### Attention Subgraph
Input scale: [1, 197, 768].
| | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75 | 3.68 | 3.22 |
| w/o Attention | 9.06 | 9.01 | 8.24 |
| ORT (python) | 4.32 | 2.63 | 2.50 |
### ViTs
All data in millisecond (ms).
| ViTs | With Attention | Without Attention | ORT |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77 | 365.35 | 109.70 |
| vit_b_32 | 89.92 | 116.22 | 30.36 |
| vit_l_16 | 1593.32 | 1730.74 | 419.92 |
| vit_l_32 | 468.11 | 577.41 | 134.12 |
| VitTrack | 3.80 | 3.87 | 2.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Check Checkerboard Corners #24546
What I did was get you to pull out of findChessboardCorners cornres the whole part that "checks" and sorts the corners of the checkerboard if present.
The main reason for this is that findChessboardCorners is often very slow to find the corners and this depends in that the size the contrast etc of the checkerboards can be very different from each other and writing a function that works on all kinds of images is complicated.
So I find it very useful to have the ability to write your own code to process the image and then have a function that controls or orders the corners.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add experimental support for Apple VisionOS platform #24136
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
This is dependent on cmake support for VisionOs which is currently in progress.
Creating PR now to test that there are no regressions in iOS and macOS builds
Add blobrecttoimage #24539
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/14659
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #14659
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: refactor ONNX MatMul with fastGemm #24694
Done:
- [x] add backends
- [x] CUDA
- [x] OpenVINO
- [x] CANN
- [x] OpenCL
- [x] Vulkan
- [x] add perf tests
- [x] const B case
### Benchmark
Tests are done on M1. All data is in milliseconds (ms).
| Configuration | MatMul (Prepacked) | MatMul | InnerProduct |
| - | - | - | - |
| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 | **0.39** | 0.41 | 1.33 |
| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0 | **0.42** | 0.42 | 1.17 |
| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0 | **0.13** | 0.15 | 0.33 |
| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0 | **0.11** | 0.13 | 0.22 |
| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 | **0.46** | 0.54 | 1.46 |
| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0 | **0.46** | 0.95 | 1.74 |
| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0 | **0.18** | 0.32 | 0.43 |
| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0 | **0.15** | 0.25 | 0.25 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add support for Orbbec Gemini2 and Gemini2 XL camera #24666
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
Make default axis of softmax in onnx "-1" without opset option #24613
Try to solve problem: https://github.com/opencv/opencv/pull/24476#discussion_r1404821158
**ONNX**
`opset <= 11` use 1
`else` use -1
**TensorFlow**
`TF version = 2.x` use -1
`else` use 1
**Darknet, Caffe, Torch**
use 1 by definition
G-API: Support CoreML Execution Providers for ONNXRT Backend #24068
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
G-API: Get input model layout from the IR if possible in OV 2.0 backend #24658
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Classify and extend convolution and depthwise performance tests #24547
This PR aims to:
1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added)
2. Classify the existing convolution performance test to below cases
- CONV_1x1
- CONV_3x3_S1_D1 (winograd)
- CONV
- DEPTHWISE
3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
(i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
(ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]`
(iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... `
> **Reproduced**: 1. follow step in https://github.com/alalek/opencv/commit/dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt)
**Performance test result on Apple M2**
**Test result details**: [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md)
**Additional test result details with FP16**: [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip)
**Brief summary for 4.8.1 vs 4.7.0 or 4.6.0**:
1. `CONV_1x1_S1_D1` dropped significant with small or large input shape.
2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0.
---
**Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)**: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
**Test result details**: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md)
**Brief summary for 4.8.1 vs 4.5.5**:
1. `CONV_5x5_S1_D1` dropped significant.
2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape.
---
TODO:
- [x] Perform tests on arm with each opencv version
- [x] Perform tests on x86 with each opencv version
- [x] Split each test classification with single test config
- [x] test enable fp16
Speed up ChessBoardDetector::findQuadNeighbors #24605
### Pull Request Readiness Checklist
Replaced brute-force algorithm with O(N^2) time complexity with kd-tree with something like O(N * log N) time complexity (maybe only in average case).
For example, on image from #23558 without quads filtering (by using `CALIB_CB_FILTER_QUADS` flag) finding chessboards corners took ~770 seconds on my laptop, of which finding quads neighbors took ~620 seconds.
Now finding chessboards corners takes ~155-160 seconds, of which finding quads neighbors takes only ~5-10 seconds.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Add test for YoloX Yolo v6 and Yolo v8 #24611
This PR adds test for YOLOv6 model (which was absent before)
The onnx weights for the test are located in this PR [ #1126](https://github.com/opencv/opencv_extra/pull/1126)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn cuda: support Sub #24647
Related https://github.com/opencv/opencv/issues/24606#issuecomment-1837390257
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn onnx graph simplifier: handle optional inputs of Slice #24655
Resolves https://github.com/opencv/opencv/issues/24609
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix bug in ChessBoardDetector::findQuadNeighbors #24597
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
I do not have more info on the platform as it is internal.
Without this fix, the error is:
core/src/arithm.simd.hpp:868:1: error: too few arguments provided to function-like macro invocation
868 | DEFINE_SIMD_ALL(cmp)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:93:5: note: expanded from macro 'DEFINE_SIMD_ALL'
93 | DEFINE_SIMD_NSAT(fun, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:89:5: note: expanded from macro 'DEFINE_SIMD_NSAT'
89 | DEFINE_SIMD_F64(fun, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:77:9: note: expanded from macro 'DEFINE_SIMD_F64'
77 | DEFINE_NOSIMD(__CV_CAT(fun, 64f), double, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:47:56: note: expanded from macro 'DEFINE_NOSIMD'
47 | DEFINE_NOSIMD_FUN(fun_name, c_type, __VA_ARGS__)
| ^
./third_party/OpenCV/public/modules/./core/src/arithm.simd.hpp:860:9: note: macro 'DEFINE_NOSIMD_FUN' defined here
860 | #define DEFINE_NOSIMD_FUN(fun, _T1, _Tvec, ...) \
G-API: Implement inference only mode for OV backend #24584
### Changes overview
Introduced `cv::gapi::wip::ov::benchmark_mode{}` compile argument which if enabled force `OpenVINO` backend to run only inference without populating input and copying back output tensors.
This mode is only relevant for measuring the performance of pure inference without data transfers. Similar approach is using on OpenVINO side in `benchmark_app`: https://github.com/openvinotoolkit/openvino/blob/master/samples/cpp/benchmark_app/benchmark_app.hpp#L134-L139
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix typo in ChessBoardDetector::generateQuads #24595
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fix race condition in color_lab.cpp initLabTabs(). #24581
There is a race condition between when the static bool is initialized (which is thread safe) and its value check. This PR changes the static bool to a static lambda call to make it thread safe. The static_cast<void> in the end is to prevent unused variables warnings.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
Add support for custom padding in DNN preprocessing #24569
This PR add functionality for specifying value in padding.
It is required in many preprocessing pipelines in DNNs such as Yolox object detection model
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix graph fusion with commutative ops #24577
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/24568
**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1125
TODO:
- [x] replace recursive function to sequential
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add yolov5n to tests #24553
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ X] I agree to contribute to the project under Apache 2 License.
- [ X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ X] The PR is proposed to the proper branch
- [ X] There is a reference to the original bug report and related work
- [ X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ X] The feature is well documented and sample code can be built with the project CMake
Fix verify unsupported new mat depth for nonzero/minmax/lut #24578
`cv::LUI()`, `cv::minMaxLoc()`, `cv::minMaxIdx()`, `cv::countNonZero()`, `cv::findNonZero()` and `cv::hasNonZero()` uses depth-based function table. However, it is too short for `CV_16BF`, `CV_Bool`, `CV_64U`, `CV_64S` and `CV_32U` and it may occur out-boundary-access. This patch fix it. And If necessary, when someone extends these functions to support, please relax this test.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix out of image corners in cv::cornerSubPix #24527
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn: add openvino, opencl and cuda backends for layer normalization layer #24552
Merge after https://github.com/opencv/opencv/pull/24544.
Todo:
- [x] openvino
- [x] opencl
- [x] cuda
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
This patch change lsx to baseline feature, and lasx to dispatch
feature. Additionally, the runtime detection methods for lasx and
lsx have been modified.
Replace double atomic in USAC #24499
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Reference to issue with atomic variable: #24281
Reference to bug with essential matrix: #24482
* add Winograd FP16 implementation
* fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime
* fixed some warnings
* hopefully fixed winograd on x64 (and maybe other platforms)
---------
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
dnn test: move layer norm tests into conformance tests #24544
Merge with https://github.com/opencv/opencv_extra/pull/1122
## Motivation
Some ONNX operators, such as `LayerNormalization`, `BatchNormalization` and so on, produce outputs for training (mean, stdev). So they have reference outputs of conformance tests for those training outputs as well. However, when it comes to inference, we do not need and produce those outputs for training here in dnn. Hence, output size does not match if we use dnn to infer those conformance models. This has become the barrier if we want to test these operators using their conformance tests.
<!--
| Operator | Inference needed | Outputs (required - total) | Optional outputs for training? |
| ----------------------- | ----------------------------------- | -------------------------- | ------------------------------ |
| BatchNormalization | Yes | 1 - 3 | Yes |
| Dropout | Maybe, can be eliminated via fusion | 1 - 2 | Yes |
| GRU | Yes | 0 - 2 | No |
| LSTM | Yes | 0 - 3 | No |
| LayerNormalization | Yes | 1 - 3 | Yes |
| MaxPool | Yes | 1 - 2 | Yes |
| RNN | Yes | 0 - 2 | No |
| SoftmaxCrossEntropyLoss | No | 1 - 2 | -- |
-->
**I checked all ONNX operators with optional outputs. Turns out there are only `BatchNormalization`, `Dropout`, `LayerNormalization` and `MaxPool` has optional outputs for training. All except `LayerNormalization` have models set for training mode and eval mode. Blame ONNX for that.**
## Solution
In this pull request, we remove graph outputs if the graph looks like the following:
```
[X] [Scale] [Bias] [X] [Scale] [Bias]
\ | / this patch \ | /
LayerNormalization -----------> LayerNormalization
/ | \ |
[Y] [Mean] [Stdev] [Y]
```
We can update conformance tests and turn on some cases as well if extending to more layers.
Notes:
1. This workaround does not solve expanded function operators if they are fused into a single operator, such as `$onnx/onnx/backend/test/data/node/test_layer_normalization_2d_axis1_expanded`, but they can be run without fusion. Note that either dnn or onnxruntime does not fuse those expanded function operators.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Bugfix/qrcode version estimator #24364
Fixes https://github.com/opencv/opencv/issues/24366
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fast gemm for einsum #24509
## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
G-API: Advanced device selection for ONNX DirectML Execution Provider #24060
### Overview
Extend `cv::gapi::onnx::ep::DirectML` to accept `adapter name` as `ctor` parameter in order to select execution device by `name`.
E.g:
```
pp.cfgAddExecutionProvider(cv::gapi::onnx::ep::DirectML("Intel Graphics"));
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Handle huge images in IPP distanceTransform #24535
### Pull Request Readiness Checklist
* Do not use IPP for huge Mat (reproduced with https://github.com/opencv/opencv/issues/23895#issuecomment-1708132367 on `DIST_MASK_5`)
I have observed two types of errors on the reproducer from the issue:
1. When `temp` is not allocated:
```
Thread 1 "app" received signal SIGSEGV, Segmentation fault.
0x00007ffff65dc755 in icv_l9_ownDistanceTransform_5x5_8u32f_C1R_21B_g9e9 () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408
(gdb) bt
#0 0x00007ffff65dc755 in icv_l9_ownDistanceTransform_5x5_8u32f_C1R_21B_g9e9 () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408
#1 0x00007ffff659e8df in icv_l9_ippiDistanceTransform_5x5_8u32f_C1R () from /home/dkurtaev/opencv_install/bin/../lib/libopencv_imgproc.so.408
#2 0x00007ffff5c390f0 in cv::distanceTransform (_src=..., _dst=..., _labels=..., distType=2, maskSize=5, labelType=1) at /home/dkurtaev/opencv/modules/imgproc/src/distransform.cpp:854
#3 0x00007ffff5c396ef in cv::distanceTransform (_src=..., _dst=..., distanceType=2, maskSize=5, dstType=5) at /home/dkurtaev/opencv/modules/imgproc/src/distransform.cpp:903
#4 0x000055555555669e in main (argc=1, argv=0x7fffffffdef8) at /home/dkurtaev/main.cpp:18
```
2. When we keep `temp` allocated every time:
```
OpenCV(4.8.0-dev) Error: Assertion failed (udata < (uchar*)ptr && ((uchar*)ptr - udata) <= (ptrdiff_t)(sizeof(void*)+64)) in fastFree, file /home/dkurtaev/opencv/modules/core/src/alloc.cpp, line 191
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.8.0-dev) /home/dkurtaev/opencv/modules/core/src/alloc.cpp:191: error: (-215:Assertion failed) udata < (uchar*)ptr && ((uchar*)ptr - udata) <= (ptrdiff_t)(sizeof(void*)+64) in function 'fastFree'
```
* Try enable IPP for 3x3 (see https://github.com/opencv/opencv/issues/15904)
* Reduce memory footprint with IPP
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Enable softmax layer vectorization on RISC-V RVV #24510
Related: https://github.com/opencv/opencv/pull/24466
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake