opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-16 18:39:12 +08:00

Author	SHA1	Message	Date
Yuantao Feng	347d673a87	Merge pull request #23279 from fengyuentau:add_topk dnn: add ONNX TopK #23279 Merge with https://github.com/opencv/opencv_extra/pull/1200 Partially fixes #22890 and #20258 To-do: - [x] TopK forward impl - [x] add tests - [x] support Opset 1 & 10 if possible - [ ] ~Support other backends~ (TopK has two outputs, which is not supported by other backends, such as openvino) Perf: M1 (time in millisecond) \| input shape \| axis \| dnn \| ort \| \| --------------- \| ---- \| ---- \| ---- \| \| (1000, 100) \| 0 \| 1.68 \| 4.07 \| \| (1000, 100) K5 \| 0 \| 1.13 \| 0.12 \| \| (1000, 100) \| 1 \| 0.96 \| 0.77 \| \| (100, 100, 100) \| 0 \| 10.00 \| 31.13 \| \| (100, 100, 100) \| 1 \| 7.33 \| 9.17 \| \| (100, 100, 100) \| 2 \| 7.52 \| 9.48 \| M2 (time in milisecond) \| input shape \| axis \| dnn \| ort \| \| --------------- \| ---- \| ---- \| ---- \| \| (1000, 100) \| 0 \| 0.76 \| 2.44 \| \| (1000, 100) K5 \| 0 \| 0.68 \| 0.07 \| \| (1000, 100) \| 1 \| 0.41 \| 0.50 \| \| (100, 100, 100) \| 0 \| 4.83 \| 17.52\| \| (100, 100, 100) \| 1 \| 3.60 \| 5.08 \| \| (100, 100, 100) \| 2 \| 3.73 \| 5.10 \| ONNXRuntime performance testing script: https://gist.github.com/fengyuentau/a119f94fd16721ec9974b8c7b0a45d4c ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-08-21 17:03:24 +03:00
Yuantao Feng	23b244d3a3	Merge pull request #25881 from fengyuentau:dnn/cpu/optimize_activations_with_v_exp dnn: optimize activations with v_exp #25881 Merge with https://github.com/opencv/opencv_extra/pull/1191. This PR optimizes the following activations: - [x] Swish - [x] Mish - [x] Elu - [x] Celu - [x] Selu - [x] HardSwish ### Performance (Updated on 2024-07-18) #### AmLogic A311D2 (ARM Cortex A73 + A53) ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15 Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03 Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09 HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70 Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11 Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22 Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15 ``` #### Apple M1 ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64 Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67 Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86 Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79 Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11 Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17 HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00 HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01 Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45 Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40 Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74 Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81 Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48 Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52 ``` #### Intel i7-12700K ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81 Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46 Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04 HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01 Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08 Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01 Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-07-19 16:03:19 +03:00
Yuantao Feng	e3858cc5a3	Merge pull request #25147 from fengyuentau:dnn/elementwise_layers/speedup * added v_erf and implemented gelu acceleration via vectorization * remove anonymous v_erf and use v_erf from intrin_math * enable perf for ov and cuda backend	2024-07-08 14:24:36 +03:00
Haosonn	87f749277d	Merge pull request #24768 from Haosonn:pre-pr-2 Vulkan backend for NaryEltwiseLayer in DNN module #24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>	2024-01-29 18:41:49 +03:00
Alexander Smorkalov	ac4c0bffac	Merge pull request #24813 from fengyuentau:speedup_scatter dnn: improve scatter and scatterND speed with multi-threading	2024-01-17 17:16:50 +03:00
jimmylaw21	a7fa1e6f4b	Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer dnn onnx: add group norm layer #24610 dnn onnx: add group norm layer Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2024-01-12 15:13:26 +03:00
fengyuentau	13127365e2	better comment	2024-01-08 11:55:06 +08:00
Yuantao Feng	b7d70613e4	fix failed assertion in debug build	2024-01-05 18:33:01 +00:00
fengyuentau	2ed97b9ef3	multi-threaded scatterND and refactor perf	2024-01-05 18:15:59 +08:00
fengyuentau	63cde0b90d	multi-threaded scatter and refactor perf	2024-01-05 17:24:09 +08:00
Alexander Alekhin	f49b26182b	dnn(test): skip very long debug tests, reduce test time	2023-12-25 08:44:06 +00:00
Yuantao Feng	0521a3a384	Merge pull request #24476 from fengyuentau:attention_layer dnn: add attention layer #24476 Resolves #24609 Merge with: https://github.com/opencv/opencv_extra/pull/1128. Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention. TODO: - [x] benchmark (before this PR vs. with this PR vs. ORT). - [x] Layer fusion: Take care Slice with end=INT64_MAX. - [x] Layer fusion: match more potential attention (VIT) patterns. - [x] Single-head attention is supported. - [x] Test AttentionSubgraph fusion. - [x] Add acc tests for VIT_B_32 and VitTrack - [x] Add perf tests for VIT_B_32 and VitTrack ## Benchmarks Platform: Macbook Air M1. ### Attention Subgraph Input scale: [1, 197, 768]. \| \| mean (ms) \| median (ms) \| min (ms) \| \| ---------------------- \| --------- \| ----------- \| -------- \| \| w/ Attention (this PR) \| 3.75 \| 3.68 \| 3.22 \| \| w/o Attention \| 9.06 \| 9.01 \| 8.24 \| \| ORT (python) \| 4.32 \| 2.63 \| 2.50 \| ### ViTs All data in millisecond (ms). \| ViTs \| With Attention \| Without Attention \| ORT \| \| -------- \| -------------- \| ----------------- \| ------ \| \| vit_b_16 \| 302.77 \| 365.35 \| 109.70 \| \| vit_b_32 \| 89.92 \| 116.22 \| 30.36 \| \| vit_l_16 \| 1593.32 \| 1730.74 \| 419.92 \| \| vit_l_32 \| 468.11 \| 577.41 \| 134.12 \| \| VitTrack \| 3.80 \| 3.87 \| 2.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-20 19:35:07 +03:00
Yuantao Feng	ee0822dc4d	Merge pull request #24378 from fengyuentau:instance_norm dnn onnx: add instance norm layer #24378 Resolves https://github.com/opencv/opencv/issues/24377 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644 \| Perf \| multi-thread \| single-thread \| \| - \| - \| - \| \| x: [2, 64, 180, 240] \| 3.95ms \| 11.12ms \| Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` force_builders=Linux OpenCL,Win64 OpenCL,Custom buildworker:Custom=linux-4 build_image:Custom=ubuntu:18.04 modules_filter:Custom=none disable_ipp:Custom=ON ```	2023-11-07 12:59:10 +03:00
Wanli	ed52f7feea	Improve and refactor softmax layer (#24466 ) * improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD	2023-11-06 04:48:32 +03:00
Aser Atawya	240b245105	Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX GSoC Add ONNX Support for GatherElements #24092 Merge with: https://github.com/opencv/opencv_extra/pull/1082 Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements) Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-18 10:41:47 +03:00
Dmitry Kurtaev	d88ad46978	Remove explitit transB attribute from MatMul perf test	2023-08-18 15:10:14 +03:00
Dmitry Kurtaev	96f23e3da1	Merge pull request #24080 from dkurt:dnn_cuda_layers Resolve uncovered CUDA dnn layer #24080 ### Pull Request Readiness Checklist * Gelu activation layer on CUDA * Try to relax GEMM from ONNX resolves https://github.com/opencv/opencv/issues/24064 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-03 09:13:42 +03:00
wanli	e4360294c5	make 'abcd op 1b11' broadcast support cuda	2023-04-23 17:46:50 +08:00
wanli	c8f5e228fc	release MUL and ADD operator on CUDA	2023-02-10 19:33:59 +08:00
Yuantao Feng	4d918ba40b	Merge pull request #23047 from fengyuentau:layer_norm dnn: add layer normalization for vision transformers * add layer norm onnx parser, impl and tests * add onnx graph simplifier for layer norm expanded * handle the case when constants are of type Initializer * add test case for layer norm expanded with initializers * use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16 * use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull * template hasBias * remove trailing whitespace * use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt * refactor implementation, optimize parallel_for * disable layer norm expanded * remove the removal of layer norm optional outputs	2023-01-27 16:35:59 +03:00
zoom	11d492b0b9	Let part of the operators in nary_eltwise support cuda	2022-11-02 14:08:21 +08:00
fengyuentau	d24d8f2abe	implementation of scatter and scatternd with conformance tests enabled	2022-10-17 11:30:32 +08:00
rogday	ed69bcae2d	Merge pull request #21865 from rogday:nary_eltwise_layers Reimplementation of Element-wise layers with broadcasting support * init * semi-working initial version * add small_vector * wip * remove smallvec * add nary function * replace auto with Mat in lambda expr used in transform * uncomment asserts * autobuffer shape_buf & step_buf * fix a missing bracket * fixed a missing addLayer in parseElementWise * solve one-dimensional broadcast * remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called * one autobuffer for step & shape * temporal fix for the missing original dimension information * fix parseUnsqueeze when it gets a 1d tensor constant * support sum/mean/min/max with only one input * reuse old code to handle cases of two non-constant inputs * add condition to handle div & mul of two non-constant inputs * use \|\| instead of or * remove trainling spaces * enlarge buf in binary_forward to contain other buffer * use autobuffer in nary_forward * generate data randomly and add more cases for perf * add op and, or & xor * update perf_dnn * remove some comments * remove legacy; add two ONNX conformance tests in filter * move from cpu_denylist to all_denylist * adjust parsing for inputs>=2 Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2022-07-19 06:14:05 +03:00
Alexander Alekhin	81e027eef7	dnn: fix OpenCL implementation of Slice layer	2020-07-16 04:33:52 +00:00

24 Commits