opencv

mirror/opencv

Fork 0

mirror of https://github.com/opencv/opencv.git synced 2024-12-14 00:39:13 +08:00

Commit Graph

Author	SHA1	Message	Date
Yuantao Feng	fa5ed62a66	Merge pull request #24694 from fengyuentau:matmul_refactor dnn: refactor ONNX MatMul with fastGemm #24694 Done: - [x] add backends - [x] CUDA - [x] OpenVINO - [x] CANN - [x] OpenCL - [x] Vulkan - [x] add perf tests - [x] const B case ### Benchmark Tests are done on M1. All data is in milliseconds (ms). \| Configuration \| MatMul (Prepacked) \| MatMul \| InnerProduct \| \| - \| - \| - \| - \| \| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 \| 0.39 \| 0.41 \| 1.33 \| \| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0 \| 0.42 \| 0.42 \| 1.17 \| \| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0 \| 0.13 \| 0.15 \| 0.33 \| \| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0 \| 0.11 \| 0.13 \| 0.22 \| \| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 \| 0.46 \| 0.54 \| 1.46 \| \| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0 \| 0.46 \| 0.95 \| 1.74 \| \| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0 \| 0.18 \| 0.32 \| 0.43 \| \| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0 \| 0.15 \| 0.25 \| 0.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-19 19:36:41 +03:00
Dmitry Kurtaev	2b6d0f36f0	Merge pull request #24309 from dkurt:gemm_ov_hotfix Update OpenVINO init of new GEMM layer #24309 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request CI validation: - [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368 - [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373 Checklist: - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 10:25:45 +03:00
Yuantao Feng	8a96e34e33	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix	2023-09-20 00:53:34 +03:00

Author

SHA1

Message

Date

Yuantao Feng

fa5ed62a66

Merge pull request #24694 from fengyuentau:matmul_refactor

dnn: refactor ONNX MatMul with fastGemm #24694

Done:
- [x] add backends
    - [x] CUDA
    - [x] OpenVINO
    - [x] CANN
    - [x] OpenCL
    - [x] Vulkan
- [x] add perf tests
- [x] const B case

### Benchmark

Tests are done on M1. All data is in milliseconds (ms).

| Configuration | MatMul (Prepacked) | MatMul | InnerProduct |
| - | - | - | - |
| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 | **0.39** | 0.41 | 1.33 |
| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0  | **0.42** | 0.42 | 1.17 |
| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0    | **0.13** | 0.15 | 0.33 |
| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0    | **0.11** | 0.13 | 0.22 |
| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 | **0.46** | 0.54 | 1.46 |
| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0  | **0.46** | 0.95 | 1.74 |
| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0    | **0.18** | 0.32 | 0.43 |
| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0    | **0.15** | 0.25 | 0.25 |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

2023-12-19 19:36:41 +03:00

Dmitry Kurtaev

2b6d0f36f0

Merge pull request #24309 from dkurt:gemm_ov_hotfix

Update OpenVINO init of new GEMM layer #24309

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

CI validation:

- [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368
- [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373

Checklist:
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

2023-09-27 10:25:45 +03:00

Yuantao Feng

8a96e34e33

dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 )

* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix

2023-09-20 00:53:34 +03:00

3 Commits