opencv/modules
Yuantao Feng 8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
..
calib3d More fixes for iterators-are-pointers case 2023-09-15 12:37:43 +03:00
core Merge pull request #24074 from Kumataro/fix24057 2023-09-19 10:32:47 +03:00
dnn dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897) 2023-09-20 00:53:34 +03:00
features2d Add missing std namespace qualifiers 2023-09-06 13:46:39 +03:00
flann Merge pull request #24028 from VadimLevin:dev/vlevin/fix-flann-python-bindings 2023-07-21 12:44:56 +03:00
gapi Merge pull request #23904 from kai-waang:removing-unreachable 2023-09-04 17:06:45 +03:00
highgui style: remove extraneous std::cout 2023-08-14 19:11:14 -04:00
imgcodecs imgcodecs: fix libtiff homepage 2023-08-27 19:49:37 +09:00
imgproc Merge pull request #24166 from hanliutong:rewrite-remaining 2023-09-19 15:12:52 +03:00
java build: w/a compiler warnings for GCC 11-12 and Clang 13, reduce build output 2023-07-10 11:27:59 +03:00
js Merge pull request #24288 from tailsu:sd/emscripten-3.1.45-fixes 2023-09-19 08:09:18 +03:00
ml Merge remote-tracking branch 'origin/3.4' into merge-3.4 2023-04-21 10:55:04 +03:00
objc Backport 5.x: Support for module names that start from digit in ObjC bindings generator. 2023-05-25 11:45:59 +03:00
objdetect More fixes for iterators-are-pointers case 2023-09-15 12:37:43 +03:00
photo Deprecated convertTypeStr and made new variant that also takes the buffer size 2023-04-26 09:48:15 -04:00
python Merge pull request #24074 from Kumataro/fix24057 2023-09-19 10:32:47 +03:00
stitching Merge pull request #23740 from Peekabooc:4.x 2023-06-09 13:40:02 +03:00
ts Merge pull request #24250 from dkurt:ts_fixture_constructor_skip_2 2023-09-18 10:23:24 +03:00
video Merge pull request #24201 from lpylpy0514:4.x 2023-09-19 15:36:38 +03:00
videoio Merge pull request #24239 from asmorkalov:as/msmf_returned_fourcc 2023-09-11 11:00:54 +03:00
world cmake: VERSION_GREATER_EQUAL is not supported in CMake 3.5.1 2022-12-26 17:41:53 +00:00