Commit Graph

2133 Commits

Author SHA1 Message Date
Yuantao Feng
8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
Alexander Smorkalov
157b0e7760
Merge pull request #24275 from alexlyulkov:al/fix-tf-graph-simplifier
Fixed removePhaseSwitches in tf_graph_simplifier
2023-09-18 11:02:44 +03:00
Alexander Lyulkov
d4cb564ce2 Fixed removePhaseSwitches in tf_graph_simplifier 2023-09-15 14:22:21 +07:00
Dmitry Kurtaev
c5edd20354 Higher threshold for FasterRCNN_vgg16 2023-09-14 13:11:53 +03:00
alexlyulkov
1e54e56579
Merge pull request #24266 from alexlyulkov:al/tf-argmax-default-dim
Added default dimension value to tensorflow ArgMax and ArgMin layers #24266

Added default dimension value to tensorflow ArgMax and ArgMin layers.
Added exception when accessing layer's input with out of range index.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48452
2023-09-14 10:25:24 +03:00
Alexander Smorkalov
62c0556c58
Merge pull request #24252 from opencv-pushbot:gitee/alalek/refactor_24218
cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration
2023-09-11 08:55:19 +03:00
Alexander Alekhin
02525abd9f cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration
- disable message on default value
2023-09-10 13:11:36 +00:00
Dmitry Kurtaev
5dc5b27858 Enable build with OpenVINO in Debug 2023-09-09 20:38:59 +03:00
Alexander Smorkalov
e60825e75b
Merge pull request #24218 from CSBVision:patch-5
Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT
2023-09-08 14:21:39 +03:00
Alexander Smorkalov
5350fba319
Merge pull request #24128 from CSBVision:CSBVision-patch-1
Fix bug at blobFromImagesWithParams
2023-09-06 16:20:37 +03:00
CSBVision
674c618471 Update dnn_utils.cpp 2023-09-06 10:01:07 +03:00
Dmitry Kurtaev
178fdbbda8
Merge pull request #24196 from dkurt:ov_backend_cleanups
Use ngraph::Output in OpenVINO backend wrapper #24196

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/24102

* Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: 588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)
* All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`).
* Correct inference if only intermediate blobs requested.


See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-05 18:08:28 +03:00
Björn Böken
639836ebf0 Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT 2023-09-05 10:05:12 +02:00
Wanli
84f32bbb24 increase Fast Math threshold 2023-09-05 14:03:54 +08:00
Sam James
c20febdbb0
Fix compilation on arm64 with FP16 when disabled
If building with -mcpu=native or any other setting which implies the current
CPU has FP16 but with intrinsics disabled, we mistakenly try to use it even
though convolution.hpp conditionally defines it correctly based on whether
we should *use it*. convolution.cpp on the other hand was mismatched and
trying to use it if the CPU supported it, even if not enabled in the build
system.

Make the guards match.

Bug: https://bugs.gentoo.org/913031
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-29 03:05:49 +01:00
Dmitry Kurtaev
a0debc3a9a Enable OpenVINO max pooling with indices since 2022.1 2023-08-23 10:39:38 +03:00
Dmitry Kurtaev
d88ad46978 Remove explitit transB attribute from MatMul perf test 2023-08-18 15:10:14 +03:00
autoantwort
f5a14532c2
Merge pull request #24167 from autoantwort:missing-include
* add missing include

* Apply CR
2023-08-17 09:34:19 +00:00
Dmitry Kurtaev
8ad5eb521a
Merge pull request #24120 from dkurt:actualize_dnn_links
OCL_FP16 MatMul with large batch

* Workaround FP16 MatMul with large batch

* Fix OCL reinitialization

* Higher thresholds for INT8 quantization

* Try fix gemm_buffer_NT for half (columns)

* Fix GEMM by rows

* Add batch dimension to InnerProduct layer test

* Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding

* Batch 16

* Replace all vload4

* Version suffix for MobileNetSSD_deploy Caffe model
2023-08-16 15:46:11 +03:00
MuZihao
16681d1080 fix the issue in layer fused 2023-08-16 09:34:59 +08:00
Yuantao Feng
ba70ec99b3
Merge pull request #24122 from fengyuentau:remove_tengine
dnn: cleanup of tengine backend #24122

🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine.

Tengine backend related PRs:
- https://github.com/opencv/opencv/pull/16724
- https://github.com/opencv/opencv/pull/18323

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-09 09:26:02 +03:00
unknown
87b7ce4415 Solved issue 24044 2023-08-04 21:57:22 +02:00
Laurent Berger
2ff16d4c45
Merge pull request #24101 from LaurentBerger:I24076
Invalid memory access fix for ONNX split layer parser #24076 #24101

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work https://github.com/opencv/opencv/issues/24076
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-04 12:18:49 +03:00
Alexander Smorkalov
5466fd2606
Merge pull request #24104 from cudawarped:cuda_fix_cuda_toolkit_12_2
`cuda`: fix for compatibility with CUDA Toolkit >= 12.2.0
2023-08-04 12:11:15 +03:00
Dmitry Kurtaev
4b8aeb1129
Merge pull request #24039 from dkurt:tflite_test_backends
TFLite models on different backends (tests and improvements) #24039

### Pull Request Readiness Checklist

* MaxUnpooling with OpenVINO
* Fully connected with transposed inputs/weights with OpenVINO
* Enable backends tests for TFLite (related to https://github.com/opencv/opencv/issues/23992#issuecomment-1640691722)
* Increase existing tests thresholds

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-04 11:28:51 +03:00
Dmitry Kurtaev
96f23e3da1
Merge pull request #24080 from dkurt:dnn_cuda_layers
Resolve uncovered CUDA dnn layer #24080

### Pull Request Readiness Checklist

* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX

resolves https://github.com/opencv/opencv/issues/24064

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-03 09:13:42 +03:00
Dmitry Kurtaev
0245c0cd10
Merge pull request #24072 from dkurt:openvino_cpu_tests
Remove legacy nGraph logic #24072

### Pull Request Readiness Checklist

TODO:
- [x] Test with OpenVINO 2021.4 (tested locally)

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-02 14:39:11 +03:00
Dmitry Kurtaev
195aad8e6a
Merge pull request #24069 from dkurt:openvino_detection_layer
DetectionOutput layer on OpenVINO without limitations #24069

### Pull Request Readiness Checklist

required for https://github.com/opencv/opencv/pull/23987

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-02 14:28:47 +03:00
cudawarped
ab8cb6f8a9 cuda: fix for compatibility with CUDA Toolkit >= 12.2.0 2023-08-01 13:02:42 +03:00
Dmitry Kurtaev
677a28fd2a
Merge pull request #24056 from dkurt:eltwise_prelu
PReLU with element-wise scales #24056

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/24051

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-07-27 16:36:40 +03:00
SaltFish-T
ab6bffc6f8
Merge pull request #23936 from SaltFish-T:4.x
Update opencv dnn to support cann version >=6.3 #23936

1.modify the search path of "libopsproto.so" in OpenCVFindCANN.cmake
2.add the search path of "libgraph_base.so" in OpenCVFindCANN.cmake
3.automatic check Ascend socVersion,and test on Ascend310/Ascend310B/Ascend910B well
2023-07-27 14:21:30 +03:00
Alexander Smorkalov
e5e1a3bfde
Merge pull request #24043 from zixianweei:use-vaddq_f32-on-arm64
fix compilation error on Windows ARM, use vaddq_f32 instead of +=
2023-07-25 11:41:09 +03:00
zixgo
ec7689421d fix compilation error on Windows ARM, use vaddq_f32 instead of += 2023-07-21 19:54:43 +08:00
Alexander Smorkalov
d96ff496b4 Increase eps for Test_Torch_nets.FastNeuralStyle_accuracy to prevent sporadic test failres with CUDA. 2023-07-21 13:51:03 +03:00
Dmitry Kurtaev
e41ba90f17
Merge pull request #24004 from dkurt:tflite_new_layers
[TFLite] Pack layer and other fixes for SSD from Keras #24004

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/23992

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1076

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-07-21 09:13:37 +03:00
Alexander Smorkalov
23f27d8dbe Use OpenCV logging instead of std::cerr. 2023-07-19 10:49:54 +03:00
Zihao Mu
1920993525
Merge pull request #23952 from zihaomu:fix_depth_conv_5x5
DNN: optimize the speed of general Depth-wise #23952

Try to solve the issue: https://github.com/opencv/opencv/issues/23941

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-07-14 17:34:39 +03:00
Alexander Smorkalov
bf06bc92aa Merge branch '3.4' into merge-3.4 2023-06-23 20:12:58 +03:00
Yuantao Feng
aff420329c
Merge pull request #23853 from fengyuentau:disable_fp16_warning
dnn: disable warning when loading a fp16 model #23853

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-06-23 19:52:04 +03:00
Alexander Smorkalov
d9a5603fa3
Merge pull request #23860 from fengyuentau:fix_overflow_sigmoid_v3.4
dnn: fix overflow in sigmoid layer for 3.4
2023-06-23 19:47:42 +03:00
fengyuentau
29388f80a5 fix overflow 2023-06-23 21:22:21 +08:00
Alexander Smorkalov
51702ffd92 pre: OpenCV 4.8.0 (version++) 2023-06-20 15:52:57 +03:00
Dmitry Kurtaev
433c364456
Merge pull request #23724 from dkurt:java_without_ant
Build Java without ANT #23724

### Pull Request Readiness Checklist

Enables a path of building Java bindings without ANT

* Able to build OpenCV JAR and Docs without ANT
  ```
  --   Java:
  --     ant:                         NO
  --     JNI:                         /usr/lib/jvm/default-java/include /usr/lib/jvm/default-java/include/linux /usr/lib/jvm/default-java/include
  --     Java wrappers:               YES
  --     Java tests:                  NO
  ```
* Possible to build OpenCV JAR without ANT but tests still require ANT

**Merge with**: https://github.com/opencv/opencv_contrib/pull/3502

Notes:
- Use `OPENCV_JAVA_IGNORE_ANT=1` to force "Java" flow for building Java bindings
- Java tests still require Apache ANT
- JAR doesn't include `.java` source code files.


See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-06-16 19:58:20 +03:00
Alexander Smorkalov
3c0b71bcec
Merge pull request #23795 from dkurt:tf_half_pixel_for_nn
Consider half pixel mode in ONNX resize
2023-06-16 10:21:20 +03:00
Dmitry Kurtaev
924c01dbec
Replace CV_Assert_N 2023-06-15 17:30:33 +03:00
Wang Kai
fc2d933224 removing unreachable code and fixing a typo 2023-06-15 01:09:02 +08:00
Dmitry Kurtaev
6909fffde1 Consider half pixel mode in ONNX resize 2023-06-14 14:21:28 +03:00
Dmitry Kurtaev
f9d7f47e28 Change Scalar assignment in Python from single value 2023-06-13 10:45:03 +03:00
Wang Kai
4622f1e89b fixing typo of a variable name in dnn::runFastConv 2023-06-11 01:54:03 +08:00
Zihao Mu
eec8a20c33
Merge pull request #23763 from zihaomu:add_runtime_check
DNN: fix bug for X86 Winograd #23763

Address https://github.com/opencv/opencv/issues/23760
The patch aims to add a runtime check for X86 platform without AVX(2).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-06-09 09:18:12 +03:00