mirror of
https://github.com/opencv/opencv.git
synced 2024-12-17 02:48:01 +08:00
23b244d3a3
dnn: optimize activations with v_exp #25881 Merge with https://github.com/opencv/opencv_extra/pull/1191. This PR optimizes the following activations: - [x] Swish - [x] Mish - [x] Elu - [x] Celu - [x] Selu - [x] HardSwish ### Performance (Updated on 2024-07-18) #### AmLogic A311D2 (ARM Cortex A73 + A53) ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 115.859 27.930 4.15 Elu::Layer_Elementwise::OCV/CPU 27.846 27.003 1.03 Gelu::Layer_Elementwise::OCV/CPU 0.657 0.602 1.09 HardSwish::Layer_Elementwise::OCV/CPU 31.885 6.781 4.70 Mish::Layer_Elementwise::OCV/CPU 35.729 32.089 1.11 Selu::Layer_Elementwise::OCV/CPU 61.955 27.850 2.22 Swish::Layer_Elementwise::OCV/CPU 30.819 26.688 1.15 ``` #### Apple M1 ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 16.184 2.118 7.64 Celu::Layer_Elementwise::OCV/CPU_FP16 16.280 2.123 7.67 Elu::Layer_Elementwise::OCV/CPU 9.123 1.878 4.86 Elu::Layer_Elementwise::OCV/CPU_FP16 9.085 1.897 4.79 Gelu::Layer_Elementwise::OCV/CPU 0.089 0.081 1.11 Gelu::Layer_Elementwise::OCV/CPU_FP16 0.086 0.074 1.17 HardSwish::Layer_Elementwise::OCV/CPU 1.560 1.555 1.00 HardSwish::Layer_Elementwise::OCV/CPU_FP16 1.536 1.523 1.01 Mish::Layer_Elementwise::OCV/CPU 6.077 2.476 2.45 Mish::Layer_Elementwise::OCV/CPU_FP16 5.990 2.496 2.40 Selu::Layer_Elementwise::OCV/CPU 11.351 1.976 5.74 Selu::Layer_Elementwise::OCV/CPU_FP16 11.533 1.985 5.81 Swish::Layer_Elementwise::OCV/CPU 4.687 1.890 2.48 Swish::Layer_Elementwise::OCV/CPU_FP16 4.715 1.873 2.52 ``` #### Intel i7-12700K ``` Geometric mean (ms) Name of Test activations activations.patch activations.patch vs activations (x-factor) Celu::Layer_Elementwise::OCV/CPU 17.106 3.560 4.81 Elu::Layer_Elementwise::OCV/CPU 5.064 3.478 1.46 Gelu::Layer_Elementwise::OCV/CPU 0.036 0.035 1.04 HardSwish::Layer_Elementwise::OCV/CPU 2.914 2.893 1.01 Mish::Layer_Elementwise::OCV/CPU 3.820 3.529 1.08 Selu::Layer_Elementwise::OCV/CPU 10.799 3.593 3.01 Swish::Layer_Elementwise::OCV/CPU 3.651 3.473 1.05 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake |
||
---|---|---|
.. | ||
caffe | ||
cuda | ||
cuda4dnn | ||
darknet | ||
int8layers | ||
layers | ||
ocl4dnn | ||
onnx | ||
opencl | ||
tensorflow | ||
tflite | ||
torch | ||
vkcom | ||
webnn | ||
backend.cpp | ||
backend.hpp | ||
debug_utils.cpp | ||
dnn_common.hpp | ||
dnn_params.cpp | ||
dnn_read.cpp | ||
dnn_utils.cpp | ||
dnn.cpp | ||
factory.hpp | ||
graph_simplifier.cpp | ||
graph_simplifier.hpp | ||
halide_scheduler.cpp | ||
halide_scheduler.hpp | ||
ie_ngraph.cpp | ||
ie_ngraph.hpp | ||
init.cpp | ||
layer_factory.cpp | ||
layer_internals.hpp | ||
layer.cpp | ||
legacy_backend.cpp | ||
legacy_backend.hpp | ||
math_utils.hpp | ||
model.cpp | ||
net_cann.cpp | ||
net_impl_backend.cpp | ||
net_impl_fuse.cpp | ||
net_impl.cpp | ||
net_impl.hpp | ||
net_openvino.cpp | ||
net_quantization.cpp | ||
net.cpp | ||
nms.cpp | ||
nms.inl.hpp | ||
op_cann.cpp | ||
op_cann.hpp | ||
op_cuda.cpp | ||
op_cuda.hpp | ||
op_halide.cpp | ||
op_halide.hpp | ||
op_inf_engine.cpp | ||
op_inf_engine.hpp | ||
op_timvx.cpp | ||
op_timvx.hpp | ||
op_vkcom.cpp | ||
op_vkcom.hpp | ||
op_webnn.cpp | ||
op_webnn.hpp | ||
plugin_api.hpp | ||
plugin_wrapper.impl.hpp | ||
precomp.hpp | ||
registry.cpp |