opencv/modules/dnn/src
Yuantao Feng 23b244d3a3
Merge pull request #25881 from fengyuentau:dnn/cpu/optimize_activations_with_v_exp
dnn: optimize activations with v_exp #25881

Merge with https://github.com/opencv/opencv_extra/pull/1191.

This PR optimizes the following activations:

- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish

### Performance (Updated on 2024-07-18)

#### AmLogic A311D2 (ARM Cortex A73 + A53)

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                              vs
                                                                         activations
                                                                          (x-factor)
Celu::Layer_Elementwise::OCV/CPU        115.859          27.930              4.15
Elu::Layer_Elementwise::OCV/CPU          27.846          27.003              1.03
Gelu::Layer_Elementwise::OCV/CPU         0.657           0.602               1.09
HardSwish::Layer_Elementwise::OCV/CPU    31.885          6.781               4.70
Mish::Layer_Elementwise::OCV/CPU         35.729          32.089              1.11
Selu::Layer_Elementwise::OCV/CPU         61.955          27.850              2.22
Swish::Layer_Elementwise::OCV/CPU        30.819          26.688              1.15
```

#### Apple M1

```
Geometric mean (ms)

               Name of Test                activations activations.patch activations.patch
                                                                                   vs
                                                                              activations
                                                                               (x-factor)
Celu::Layer_Elementwise::OCV/CPU              16.184          2.118               7.64
Celu::Layer_Elementwise::OCV/CPU_FP16         16.280          2.123               7.67
Elu::Layer_Elementwise::OCV/CPU               9.123           1.878               4.86
Elu::Layer_Elementwise::OCV/CPU_FP16          9.085           1.897               4.79
Gelu::Layer_Elementwise::OCV/CPU              0.089           0.081               1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16         0.086           0.074               1.17
HardSwish::Layer_Elementwise::OCV/CPU         1.560           1.555               1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16    1.536           1.523               1.01
Mish::Layer_Elementwise::OCV/CPU              6.077           2.476               2.45
Mish::Layer_Elementwise::OCV/CPU_FP16         5.990           2.496               2.40
Selu::Layer_Elementwise::OCV/CPU              11.351          1.976               5.74
Selu::Layer_Elementwise::OCV/CPU_FP16         11.533          1.985               5.81
Swish::Layer_Elementwise::OCV/CPU             4.687           1.890               2.48
Swish::Layer_Elementwise::OCV/CPU_FP16        4.715           1.873               2.52
```

#### Intel i7-12700K

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                    vs
                                                               activations
                                                                (x-factor)
Celu::Layer_Elementwise::OCV/CPU        17.106       3.560         4.81
Elu::Layer_Elementwise::OCV/CPU          5.064       3.478         1.46
Gelu::Layer_Elementwise::OCV/CPU         0.036       0.035         1.04
HardSwish::Layer_Elementwise::OCV/CPU    2.914       2.893         1.01
Mish::Layer_Elementwise::OCV/CPU         3.820       3.529         1.08
Selu::Layer_Elementwise::OCV/CPU        10.799       3.593         3.01
Swish::Layer_Elementwise::OCV/CPU        3.651       3.473         1.05
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-07-19 16:03:19 +03:00
..
caffe Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage 2024-01-26 16:34:17 +03:00
cuda Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
cuda4dnn Merge pull request #25880 from Jamim:fix/cuda-no-fp16 2024-07-10 12:39:30 +03:00
darknet Merge pull request #24384 from Dhanwanth1803:feat-crop 2023-12-22 14:55:01 +03:00
int8layers Merge pull request #25883 from hanliutong:rvv-intrin-upgrade 2024-07-19 11:41:42 +03:00
layers Merge pull request #25881 from fengyuentau:dnn/cpu/optimize_activations_with_v_exp 2024-07-19 16:03:19 +03:00
ocl4dnn Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage 2024-01-26 16:34:17 +03:00
onnx Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
opencl Merge pull request #25147 from fengyuentau:dnn/elementwise_layers/speedup 2024-07-08 14:24:36 +03:00
tensorflow Merge pull request #25686 from Kumataro:fix25674 2024-06-02 14:14:04 +03:00
tflite Merge pull request #25613 from CNOCycle:tflite/ops 2024-05-31 19:31:21 +03:00
torch Merge pull request #25686 from Kumataro:fix25674 2024-06-02 14:14:04 +03:00
vkcom Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
webnn Merge pull request #20406 from MarkGHX:gsoc_2021_webnn 2021-11-23 21:15:31 +00:00
backend.cpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
backend.hpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
debug_utils.cpp fix model diagnostic tool 2022-01-18 01:22:22 +03:00
dnn_common.hpp speed up vulkan dnn, and support ios and apple m1 chip. (#23349) 2023-05-18 20:02:27 +03:00
dnn_params.cpp cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration 2023-09-10 13:11:36 +00:00
dnn_read.cpp Migrate Android Face Detection sample to DNN. 2023-11-29 11:02:44 +03:00
dnn_utils.cpp Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
dnn.cpp dnn: fix index access 2022-03-19 06:54:07 +00:00
factory.hpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
graph_simplifier.cpp Merge pull request #24577 from dkurt:dnn_graph_match_stack 2023-11-24 10:40:32 +03:00
graph_simplifier.hpp Merge pull request #24483 from dkurt:dnn_fusion_commutative_ops 2023-11-08 16:26:33 +03:00
halide_scheduler.cpp Merge pull request #22656 from dkurt:halide_fixes 2022-10-21 17:49:49 +03:00
halide_scheduler.hpp
ie_ngraph.cpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
ie_ngraph.hpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
init.cpp Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace 2024-06-21 19:28:22 +03:00
layer_factory.cpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
layer_internals.hpp Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage 2024-01-26 16:34:17 +03:00
layer.cpp Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage 2024-01-26 16:34:17 +03:00
legacy_backend.cpp Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
legacy_backend.hpp dnn: split dnn.cpp code 2022-03-08 19:22:46 +00:00
math_utils.hpp Implement ctc prefix beam search decode for TextRecognitionModel. 2021-08-12 20:33:31 +08:00
model.cpp change fcn8s-heavy-pascal tests from caffe to onnx 2024-05-03 00:15:09 +08:00
net_cann.cpp Merge pull request #23936 from SaltFish-T:4.x 2023-07-27 14:21:30 +03:00
net_impl_backend.cpp Merge pull request #25880 from Jamim:fix/cuda-no-fp16 2024-07-10 12:39:30 +03:00
net_impl_fuse.cpp Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast 2024-01-11 10:04:46 +03:00
net_impl.cpp Merge pull request #25582 from fengyuentau:dnn/dump_pbtxt 2024-05-17 11:07:05 +03:00
net_impl.hpp Merge pull request #25582 from fengyuentau:dnn/dump_pbtxt 2024-05-17 11:07:05 +03:00
net_openvino.cpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
net_quantization.cpp add enableWinograd API for Net. 2022-10-09 09:33:07 +08:00
net.cpp Merge pull request #25582 from fengyuentau:dnn/dump_pbtxt 2024-05-17 11:07:05 +03:00
nms.cpp batched nms impl 2022-11-29 15:32:34 +08:00
nms.inl.hpp boost NMS performance 2021-03-10 15:59:26 +00:00
op_cann.cpp Merge pull request #23319 from fengyuentau:fix_zoo_issue_136 2023-03-13 21:46:33 +03:00
op_cann.hpp Merge pull request #23936 from SaltFish-T:4.x 2023-07-27 14:21:30 +03:00
op_cuda.cpp Let part of the operators in nary_eltwise support cuda 2022-11-02 14:08:21 +08:00
op_cuda.hpp transfer output blobs in background 2020-07-04 12:55:12 +05:30
op_halide.cpp Merge pull request #24167 from autoantwort:missing-include 2023-08-17 09:34:19 +00:00
op_halide.hpp
op_inf_engine.cpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
op_inf_engine.hpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
op_timvx.cpp Merge pull request #21036 from fengyuentau:timvx_backend_support 2022-03-31 21:42:11 +00:00
op_timvx.hpp Merge pull request #21036 from fengyuentau:timvx_backend_support 2022-03-31 21:42:11 +00:00
op_vkcom.cpp speed up vulkan dnn, and support ios and apple m1 chip. (#23349) 2023-05-18 20:02:27 +03:00
op_vkcom.hpp speed up vulkan dnn, and support ios and apple m1 chip. (#23349) 2023-05-18 20:02:27 +03:00
op_webnn.cpp dnn: split dnn.cpp code 2022-03-08 19:22:46 +00:00
op_webnn.hpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
plugin_api.hpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
plugin_wrapper.impl.hpp dnn: plugin support for OpenVINO 2022-10-07 16:57:31 +00:00
precomp.hpp speed up vulkan dnn, and support ios and apple m1 chip. (#23349) 2023-05-18 20:02:27 +03:00
registry.cpp Merge pull request #25880 from Jamim:fix/cuda-no-fp16 2024-07-10 12:39:30 +03:00