Added int32, int64 support and type inference to dnn #24411
**Added a type inference to dnn similar to the shape inference, added int32 and int64 support.**
- Added getTypes method for layers that calculates layer outputs types and internals types from inputs types (Similar to getMemoryShapes). By default outputs and internals types = input[0] type
- Added type inference pipeline similar to shape inference pipeline. LayersShapes struct (that is used in shape inference pipeline) now contains both shapes and types
- All layers output blobs are now allocated using the calculated types from the type inference.
- Inputs and constants with int32 and int64 types are not automatically converted into float32 now.
- Added int32 and int64 support for all the layers with indexing and for all the layers required in tests.
Added int32 and int64 support for CUDA:
- Added host<->device data moving for int32 and int64
- Added int32 and int64 support for several layers (just slightly modified CUDA C++ templates)
Passed all the accuracy tests on CPU, OCL, OCL_FP16, CUDA, CUDA_FP16. (except RAFT model)
**CURRENT PROBLEMS**:
- ONNX parser always converts int64 constants and layers attributes to int32, so some models with int64 constants doesn't work (e.g. RAFT). The solution is to disable int64->int32 conversion and fix attributes reading in a lot of ONNX layers parsers (https://github.com/opencv/opencv/issues/25102)
- I didn't add type inference and int support to VULCAN, so it doesn't work at all now.
- Some layers don't support int yet, so some unknown models may not work.
**CURRENT WORKAROUNDS**:
- CPU arg_layer indides are implemented in int32 followed by a int32->int64 conversion (the master branch has the same workaround with int32->float conversion)
- CPU and OCL pooling_layer indices are implemented in float followed by a float->int64 conversion
- CPU gather_layer indices are implemented in int32, so int64 indices are converted to int32 (the master branch has the same workaround with float->int32 conversion)
**DISABLED TESTS**:
- RAFT model
**REMOVED TESTS**:
- Greater_input_dtype_int64 (because it doesn't fit ONNX rules, the whole test is just comparing float tensor with int constant)
**TODO IN NEXT PULL REQUESTS**:
- Add int64 support for ONNX parser
- Add int support for more layers
- Add int support for OCL (currently int layers just run on CPU)
- Add int tests
- Add int support for other backends
dnn: try improving performance of Attention layer #25076
Checklist:
- [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward
- [x] Use layer internal buffer over temporary Mat buffer
- [x] Try a single fastGemmBatch on the Q/K/V calculation
Performance:
Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.
Data is in millisecond.
| | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K |
| - | - | - |
| Current | 10.96 | 1.58 |
| w/ Mat | 6.27 | 1.41 |
| w/ Internals | 5.87 | 1.38 |
| w/ fastGemmBatch | 6.12 | 2.14 |
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fix issue #25077#25100
Fixes https://github.com/opencv/opencv/issues/25077
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Primitive 1D Tests #24977
This PR is designed to add tests for 1D inputs for layer, which is required after introducing 1d support in 5.x. Currently tests are written for following layers:
- [x] `Add`, `Sub`
- [x] `Product`, `Div`
- [x] `Min`, `Max`
- [x] `Argmin`, `Argmax`
- [x] `Gather`
This list is to be extended for more layer such `gemm`, `conv` etc.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn cleanup: On-fly-quantization removal #2498
On-fly-quantization is first introduced via https://github.com/opencv/opencv/pull/20228.
We decided to remove it but keep int8 layers implementation because on-fly-quantization
is less practical given the fact that there has been so many dedicated tools for model
quantization.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Fixes#24974 support HardSwishInt8 #24985
As given very clearly in the issue #24974 I made the required 2 changes to implement HardSwish Layer in INT8. Requesting comments.
resolves https://github.com/opencv/opencv/issues/24974
- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: Dhanwanth1803 <dhanwanthvarala@gmail,com>
Vulkan backend for NaryEltwiseLayer in DNN module #24768
We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:
- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
- Wrong info output in ``context.cpp``
Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by
- find out the best ``VkMemoryProperty`` for various discrete GPUs
- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: IskXCr <IskXCr@outlook.com>
Handle warnings in loongson-related code #24925
See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16
Warnings needs to be handled before we add the loongson server to our CI.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Removed all pre-C++11 code, workarounds, and branches #23736
This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.
* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.
---------
Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
python: accept path-like objects wherever file names are expected #24773
Merry Christmas, all 🎄
Implements #15731
Support is enabled for all arguments named `filename` or `filepath` (case-insensitive), or annotated with `CV_WRAP_FILE_PATH`.
Support is based on `PyOS_FSPath`, which is available in Python 3.6+. When running on older Python versions the arguments must have a `str` value as before.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add group norm layer #24610
dnn onnx: add group norm layer
Todo:
- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
Replace interactive batched Matrix Multiply. #24812
This PR replaces iterative batch matrix multiplication which `FastGemmBatch` in Einsum layer.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808
Merge with https://github.com/opencv/opencv_extra/pull/1137
Resolves https://github.com/opencv/opencv/issues/24797
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
dnn onnx: add mod #24765
Resolves https://github.com/opencv/opencv/issues/23174
TODO:
- [x] enable some conformance tests
- [x] add backends
- [x] CANN
- [x] OpenVINO
- [x] CUDA
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake