opencv/modules/dnn/include/opencv2/dnn
Yuantao Feng 8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
..
utils Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-02-11 17:32:37 +00:00
all_layers.hpp dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897) 2023-09-20 00:53:34 +03:00
dict.hpp dnn: fix API - explicit ctors, const methods 2022-01-21 12:38:51 +00:00
dnn.hpp Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT 2023-09-05 10:05:12 +02:00
dnn.inl.hpp build warnings 2020-12-05 20:08:29 +00:00
layer_reg.private.hpp Merge pull request #20494 from rogday:onnx_diagnostic_fix 2021-08-20 14:43:47 +00:00
layer.details.hpp dnn: update "guard" inline namespace 2018-09-03 20:46:57 +00:00
layer.hpp fix model diagnostic tool 2022-01-18 01:22:22 +03:00
shape_utils.hpp Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2022-10-15 16:44:47 +00:00
version.hpp pre: OpenCV 4.8.0 (version++) 2023-06-20 15:52:57 +03:00