opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-20 19:17:36 +08:00

History

Yuantao Feng 8a96e34e33 dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix		2023-09-20 00:53:34 +03:00
..
perf_caffe.cpp	Merge pull request #24120 from dkurt:actualize_dnn_links	2023-08-16 15:46:11 +03:00
perf_common.cpp	cmake: fix build of dnn tests with shared common code	2019-03-31 08:52:25 +00:00
perf_convolution1d.cpp	Merge pull request #18783 from sl-sergei:fix_conv1d	2020-11-13 22:22:10 +00:00
perf_convolution3d.cpp	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2020-11-13 22:29:14 +00:00
perf_convolution.cpp	Merge pull request #23952 from zihaomu:fix_depth_conv_5x5	2023-07-14 17:34:39 +03:00
perf_gemm.cpp	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 )	2023-09-20 00:53:34 +03:00
perf_layer.cpp	Remove explitit transB attribute from MatMul perf test	2023-08-18 15:10:14 +03:00
perf_main.cpp	Merge pull request #11897 from Jakub-Golinowski:hpx_backend	2018-08-31 16:23:26 +03:00
perf_net.cpp	Merge pull request #24120 from dkurt:actualize_dnn_links	2023-08-16 15:46:11 +03:00
perf_precomp.hpp	dnn(perf): fix and merge Convolution tests	2018-08-31 15:02:19 +03:00
perf_recurrent.cpp	Merge pull request #20658 from smbz:lstm_optimisation	2021-11-29 21:43:00 +00:00