opencv/modules/dnn/perf
Yuantao Feng 8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
..
perf_caffe.cpp Merge pull request #24120 from dkurt:actualize_dnn_links 2023-08-16 15:46:11 +03:00
perf_common.cpp cmake: fix build of dnn tests with shared common code 2019-03-31 08:52:25 +00:00
perf_convolution1d.cpp Merge pull request #18783 from sl-sergei:fix_conv1d 2020-11-13 22:22:10 +00:00
perf_convolution3d.cpp Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2020-11-13 22:29:14 +00:00
perf_convolution.cpp Merge pull request #23952 from zihaomu:fix_depth_conv_5x5 2023-07-14 17:34:39 +03:00
perf_gemm.cpp dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897) 2023-09-20 00:53:34 +03:00
perf_layer.cpp Remove explitit transB attribute from MatMul perf test 2023-08-18 15:10:14 +03:00
perf_main.cpp Merge pull request #11897 from Jakub-Golinowski:hpx_backend 2018-08-31 16:23:26 +03:00
perf_net.cpp Merge pull request #24120 from dkurt:actualize_dnn_links 2023-08-16 15:46:11 +03:00
perf_precomp.hpp dnn(perf): fix and merge Convolution tests 2018-08-31 15:02:19 +03:00
perf_recurrent.cpp Merge pull request #20658 from smbz:lstm_optimisation 2021-11-29 21:43:00 +00:00