Commit Graph

19 Commits

Author SHA1 Message Date
Yuantao Feng
3f13ce797b
Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace
dnn: add DepthToSpace and SpaceToDepth #25779

We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However,  it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes.

Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth.

Backend support:

- [x] CPU
- [x] CUDA
- [x] OpenCL
- [x] OpenVINO
- [x] CANN
- [x] TIMVX
-  ~Vulkan~ (missing fundamental tools, like permutation and reshape)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-06-21 19:28:22 +03:00
Alexander Alekhin
f49b26182b dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
Dmitry Kurtaev
c7ec0d599a
Merge pull request #23987 from dkurt:openvino_int8_backend
OpenVINO backend for INT8 models #23987

### Pull Request Readiness Checklist

TODO:
- [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069)
- [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum)
- [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039)
- [x] Single layer tests (convolution)
- [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~


Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`:
| backend | performance (median time) |
|---|---|
| OpenCV | 77.42ms |
| OpenVINO 2023.0 | 10.90ms |

CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz`

Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef

---

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-28 16:24:43 +03:00
Yuantao Feng
8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
Dmitry Kurtaev
8ad5eb521a
Merge pull request #24120 from dkurt:actualize_dnn_links
OCL_FP16 MatMul with large batch

* Workaround FP16 MatMul with large batch

* Fix OCL reinitialization

* Higher thresholds for INT8 quantization

* Try fix gemm_buffer_NT for half (columns)

* Fix GEMM by rows

* Add batch dimension to InnerProduct layer test

* Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding

* Batch 16

* Replace all vload4

* Version suffix for MobileNetSSD_deploy Caffe model
2023-08-16 15:46:11 +03:00
Alexander Alekhin
18cbfa4a4f Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2023-01-23 00:11:12 +00:00
Maksim Shabunin
d35fbe6bfc dnn: updated YOLOv4-tiny model and tests 2022-12-22 15:49:21 +03:00
Zihao Mu
bb64db98d8
Further optimization of Conv2D, fused Conv_Add_Activation, bring latest code from ficus OpConv.fx. (#22401) 2022-08-26 12:57:25 +03:00
Zihao Mu
0614c40b42 add more skip for very long test case in test_dnn. 2022-08-02 14:58:05 +08:00
Zihao Mu
a80fcacd90
Merge pull request #21372 from zihaomu:dnn_quantize_per_tensor
Add per_tensor_quantize to int8 quantize

* add per_tensor_quantize to dnn int8 module.

* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.

* change the default to hpp
2022-07-05 19:14:42 +03:00
Zihao Mu
59b870a87a
Merge pull request #21910 from zihaomu:fast_conv_ARM
DNN: Accelerating convolution

* Fast Conv of ARM, X86 and universal intrinsics.

* improve code style.

* error fixed.

* improve the License

* optimize memory allocated and Adjust the threshold.

* change FasterRCNN_vgg16 to 2GB memory.
2022-07-01 13:03:15 +03:00
Alexander Alekhin
d9bf522b27 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-05-23 16:06:14 +00:00
OpenCV Developers
be4a432bea dnn(test): update opencv_face_detector checks 2022-04-11 20:26:25 +00:00
Zihao Mu
7b582b71ba
Merge pull request #21036 from fengyuentau:timvx_backend_support
dnn: TIM-VX NPU backend support

* Add TimVX NPU backend for DNN module.

* use official branch from tim-vx repo; fix detecting viv sdk

Co-authored-by: fytao <yuantao.feng@outlook.com>
2022-03-31 21:42:11 +00:00
Alexander Alekhin
870c8d3c4e dnn(test): fix int8 tolerances 2022-01-31 12:54:01 +00:00
Alexander Alekhin
f55c9ed1ba dnn(test): drop non OCV/CPU cases for Int8
- zero code coverage and up to x3-x8 tests slowdown
- implementation executes OCV/CPU in all cases
- wrong skip conditions
2021-12-02 06:27:10 +00:00
Alexander Alekhin
94e92cd6c0 dnn(ocl): skip int8 tests due to memory access issues 2021-10-06 21:27:18 +00:00
Jebastin Nadar
cce78cc5e2
Merge pull request #20535 from SamFC10:onnx-q
dnn : int8 quantized layers support in onnx importer

* added quantized layers support in onnx importer

* added more cases in eltwise node, some more checks

* added tests for quantized nodes

* relax thresholds for failed tests, address review comments

* refactoring based on review comments

* added support for unsupported cases and pre-quantized resnet50 test

* relax thresholds due to int8 resize layer
2021-10-04 18:07:38 +00:00
SamFC10
fa90e14b06 int8 layers and 8-bit quantization support 2021-08-19 09:56:47 +05:30