Remove torch (old torch7) from dnn in 5.x #24294
Merge with https://github.com/opencv/opencv_extra/pull/1097
Completely removed torch (old torch7) from dnn:
- removed modules/dnn/src/torch directory that contained torch7 model parser
- removed readNetFromTorch() and readTorchBlob() public functions
- removed torch7 references from comments and help texts
- replaced links to t7 models by links to similar onnx models in js_style_transfer turtorial (similar to https://github.com/opencv/opencv/pull/24245/files)
dnn: cleanup of halide backend for 5.x #24231
Merge with https://github.com/opencv/opencv_extra/pull/1092.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* remove Conformance from test names
* integrate neon optimization into default
* quick fix: define CV_NEON_AARCH64 0 for non NEON platforms
* remove var batch that leads to memory leak
* put neon code back to fast_gemm_kernels.simd
* reorganize code to reduce duplicate code
Rewrite Universal Intrinsic code: float related part #24325
The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.
The series of PRs is listed below:
#23885 First patch, an example
#23980 Core module
#24058 ImgProc module, part 1
#24132 ImgProc module, part 2
#24166 ImgProc module, part 3
#24301 Features2d and calib3d module
#24324 Gapi module
This patch (hopefully) is the last one in the series.
This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`,
then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
- Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
- Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
- ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
- ./modules/imgproc/src/color_lab.cpp (Array of vector type)
- ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
- ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
These algorithms will need to be redesigned to accommodate scalable backends.
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Fixed CumSum dnn layer #24353Fixes#20110
The algorithm had several errors, so I rewrote it.
Also the layer didn't work with non constant axis tensor. Fixed it.
Enabled CumSum layer tests from ONNX conformance.
OpenVINO backend for INT8 models #23987
### Pull Request Readiness Checklist
TODO:
- [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069)
- [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum)
- [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039)
- [x] Single layer tests (convolution)
- [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~
Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`:
| backend | performance (median time) |
|---|---|
| OpenCV | 77.42ms |
| OpenVINO 2023.0 | 10.90ms |
CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz`
Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef
---
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
dnn: merge tests from test_halide_layers to test_backends #24283
Context: https://github.com/opencv/opencv/pull/24231#pullrequestreview-1628649980
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Add Support for Einsum Layer #24037
### This PR adding support for [Einsum Layer](https://pytorch.org/docs/stable/generated/torch.einsum.html) (in progress).
This PR is currently not to be merged but only reviewed. Test cases are located in [#1090](https://github.com/opencv/opencv_extra/pull/1090)RP in OpenCV extra
**DONE**:
- [x] 2-5D GMM support added
- [x] Matrix transpose support added
- [x] Reduction type comupte 'ij->j'
- [x] 2nd shape computation - during forward
**Next PRs**:
- [ ] Broadcasting reduction "...ii ->...i"
- [ ] Add lazy shape deduction. "...ij, ...jk->...ik"
- [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik")
- [ ] Add support for CUDA backend
- [ ] BatchWiseMultiply optimize
**Later in 5.x version (requires support for 1D matrices)**:
- [ ] Add 1D vector multiplication support
- [ ] Inter product "i, i" (problems with 1D shapes)
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
* attempt to add 0d/1d mat support to OpenCV
* revised the patch; now 1D mat is treated as 1xN 2D mat rather than Nx1.
* a step towards 'green' tests
* another little step towards 'green' tests
* calib test failures seem to be fixed now
* more fixes _core & _dnn
* another step towards green ci; even 0D mat's (a.k.a. scalars) are now partly supported!
* * fixed strange bug in aruco/charuco detector, not sure why it did not work
* also fixed a few remaining failures (hopefully) in dnn & core
* disabled failing GAPI tests - too complex to dig into this compiler pipeline
* hopefully fixed java tests
* trying to fix some more tests
* quick followup fix
* continue to fix test failures and warnings
* quick followup fix
* trying to fix some more tests
* partly fixed support for 0D/scalar UMat's
* use updated parseReduce() from upstream
* trying to fix the remaining test failures
* fixed [ch]aruco tests in Python
* still trying to fix tests
* revert "fix" in dnn's CUDA tensor
* trying to fix dnn+CUDA test failures
* fixed 1D umat creation
* hopefully fixed remaining cuda test failures
* removed training whitespaces
* first commit
* turned C from input to constant; force C constant in impl; better handling 0d/1d cases
* integrate with gemm from ficus nn
* fix const inputs
* adjust threshold for int8 tryQuantize
* adjust threshold for int8 quantized 2
* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet
* add gemm perf against innerproduct
* add perf tests for innerproduct with bias
* fix perf
* add memset
* renamings for next step
* add dedicated perf gemm
* add innerproduct in perf_gemm
* remove gemm and innerproduct perf tests from perf_layer
* add perf cases for vit sizes; prepack constants
* remove batched gemm; fix wrong trans; optimize KC
* remove prepacking for const A; several fixes for const B prepacking
* add todos and gemm expression
* add optimized branch for avx/avx2
* trigger build
* update macros and signature
* update signature
* fix macro
* fix bugs for neon aarch64 & x64
* add backends: cuda, cann, inf_ngraph and vkcom
* fix cuda backend
* test commit for cuda
* test cuda backend
* remove debug message from cuda backend
* use cpu dispatcher
* fix neon macro undef in dispatcher
* fix dispatcher
* fix inner kernel for neon aarch64
* fix compiling issue on armv7; try fixing accuracy issue on other platforms
* broadcast C with beta multiplied; improve func namings
* fix bug for avx and avx2
* put all platform-specific kernels in dispatcher
* fix typos
* attempt to fix compile issues on x64
* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon
* fix typo
* quick fix: add macros for pack4
* quick fix: use vmlaq_f32 for armv7
* quick fix for missing macro of fast gemm pack f32 4
* disable conformance tests when optimized branches are not supported
* disable perf tests when optimized branches are not supported
* decouple cv_try_neon and cv_neon_aarch64
* drop googlenet_2023; add fastGemmBatched
* fix step in fastGemmBatched
* cpu: fix initialization ofb; gpu: support batch
* quick followup fix for cuda
* add default kernels
* quick followup fix to avoid macro redef
* optmized kernels for lasx
* resolve mis-alignment; remove comments
* tune performance for x64 platform
* tune performance for neon aarch64
* tune for armv7
* comment time consuming tests
* quick follow-up fix
In the previous code, there was a memory leak issue where the
previously allocated memory was not freed upon a failed realloc
operation. This commit addresses the problem by releasing the old
memory before setting the pointer to NULL in case of a realloc failure.
This ensures that memory is properly managed and avoids potential
memory leaks.
Added default dimension value to tensorflow ArgMax and ArgMin layers #24266
Added default dimension value to tensorflow ArgMax and ArgMin layers.
Added exception when accessing layer's input with out of range index.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48452
Use ngraph::Output in OpenVINO backend wrapper #24196
### Pull Request Readiness Checklist
resolves https://github.com/opencv/opencv/issues/24102
* Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: 588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)
* All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`).
* Correct inference if only intermediate blobs requested.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
If building with -mcpu=native or any other setting which implies the current
CPU has FP16 but with intrinsics disabled, we mistakenly try to use it even
though convolution.hpp conditionally defines it correctly based on whether
we should *use it*. convolution.cpp on the other hand was mismatched and
trying to use it if the CPU supported it, even if not enabled in the build
system.
Make the guards match.
Bug: https://bugs.gentoo.org/913031
Signed-off-by: Sam James <sam@gentoo.org>
OCL_FP16 MatMul with large batch
* Workaround FP16 MatMul with large batch
* Fix OCL reinitialization
* Higher thresholds for INT8 quantization
* Try fix gemm_buffer_NT for half (columns)
* Fix GEMM by rows
* Add batch dimension to InnerProduct layer test
* Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding
* Batch 16
* Replace all vload4
* Version suffix for MobileNetSSD_deploy Caffe model
dnn: cleanup of tengine backend #24122🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine.
Tengine backend related PRs:
- https://github.com/opencv/opencv/pull/16724
- https://github.com/opencv/opencv/pull/18323
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake