opencv/modules/dnn/test
Yuantao Feng a7fd9446cf
Merge pull request #25630 from fengyuentau:nary-multi-thread
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630

This PR introduces the following changes:

- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable

## Performance

### i7-12700K, RAM 64GB, Ubuntu 22.04

```
Geometric mean (ms)

                Name of Test                     opencv        opencv        opencv
                                                  perf          perf          perf
                                              core.x64.0606 core.x64.0606 core.x64.0606
                                                                               vs
                                                                             opencv
                                                                              perf
                                                                          core.x64.0606
                                                                           (x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           16.116        11.161         1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        17.469        11.446         1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        17.531        11.469         1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      28.653        13.682         2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    21.899        13.422         1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       21.738        13.185         1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        16.172        11.473         1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       16.309        11.565         1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        16.166        11.454         1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        16.157        11.443         1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU        163.459       15.234         10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    10.880        10.868         1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    10.947        11.058         0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    10.948        10.910         1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    10.874        10.871         1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    10.971        10.920         1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        17.546        11.462         1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        16.175        11.475         1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU               11.339        11.333         1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU               16.154        11.102         1.46
```

### Apple M1, RAM 16GB, macOS 14.4.1

```
Geometric mean (ms)

                Name of Test                     opencv          opencv             opencv      
                                                  perf            perf               perf       
                                              core.m1.0606 core.m1.0606.patch core.m1.0606.patch
                                                                                      vs        
                                                                                    opencv      
                                                                                     perf       
                                                                                 core.m1.0606   
                                                                                  (x-factor)    
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           28.418          3.768               7.54       
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        6.942           5.679               1.22       
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        5.822           5.653               1.03       
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      5.751           5.628               1.02       
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    5.797           5.599               1.04       
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       7.272           5.578               1.30       
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        5.777           5.562               1.04       
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       5.819           5.559               1.05       
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        5.830           5.574               1.05       
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        5.759           5.567               1.03       
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU       342.260          74.655              4.58       
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    8.338           8.280               1.01       
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    8.359           8.309               1.01       
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    8.412           8.295               1.01       
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    8.380           8.297               1.01       
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    8.356           8.323               1.00       
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        6.818           5.561               1.23       
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        5.805           5.570               1.04       
NHWC_C::Layer_NaryEltwise::OCV/CPU               3.834           4.817               0.80       
NHWC_H::Layer_NaryEltwise::OCV/CPU               28.402          3.771               7.53
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-07-03 10:09:05 +03:00
..
cityscapes_semsegm_test_enet.py Misc. modules/ typos 2018-02-12 07:09:43 -05:00
imagenet_cls_test_alexnet.py change fcn8s-heavy-pascal tests from caffe to onnx 2024-05-03 00:15:09 +08:00
imagenet_cls_test_googlenet.py Misc. modules/ typos 2018-02-12 07:09:43 -05:00
imagenet_cls_test_inception.py fix 4.x links 2021-12-22 13:24:30 +00:00
npy_blob.cpp dnn: fix precomp.hpp usage 2018-02-28 17:06:26 +03:00
npy_blob.hpp dnn: fix precomp.hpp usage 2018-02-28 17:06:26 +03:00
pascal_semsegm_test_fcn.py change fcn8s-heavy-pascal tests from caffe to onnx 2024-05-03 00:15:09 +08:00
test_backends.cpp Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast 2024-01-11 10:04:46 +03:00
test_caffe_importer.cpp dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
test_common.cpp cmake: fix build of dnn tests with shared common code 2019-03-31 08:52:25 +00:00
test_common.hpp Merge pull request #25181 from dkurt:release_conv_weights 2024-03-25 09:03:28 +03:00
test_common.impl.hpp Merge pull request #25181 from dkurt:release_conv_weights 2024-03-25 09:03:28 +03:00
test_darknet_importer.cpp dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
test_googlenet.cpp Merge pull request #22275 from zihaomu:fp16_support_conv 2023-05-17 09:38:33 +03:00
test_graph_simplifier.cpp Merge pull request #25271 from fengyuentau:matmul_bias 2024-03-29 17:35:23 +03:00
test_ie_models.cpp Fix for OpenVINO 2024.0 2024-03-18 15:05:50 +04:00
test_int8_layers.cpp Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace 2024-06-21 19:28:22 +03:00
test_layers.cpp Skip Test_Caffe_layers.Concat with Vulkan due to sporadic failures. 2024-05-17 11:54:25 +03:00
test_main.cpp Merge pull request #23109 from seanm:misc-warnings 2023-10-06 13:33:21 +03:00
test_misc.cpp Merge pull request #23736 from seanm:c++11-simplifications 2024-01-19 16:53:08 +03:00
test_model.cpp Made fcn-resnet50-12.onnx model optional. 2024-05-03 16:14:22 +03:00
test_nms.cpp batched nms impl 2022-11-29 15:32:34 +08:00
test_onnx_conformance_layer_filter__cuda_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter__cuda_fp16_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter__halide_denylist.inl.hpp Merge pull request #24353 from alexlyulkov:al/fixed-cumsum-layer 2023-10-03 13:58:25 +03:00
test_onnx_conformance_layer_filter__openvino.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter__vulkan_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter_opencv_all_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter_opencv_cpu_denylist.inl.hpp Merge pull request #21865 from rogday:nary_eltwise_layers 2022-07-19 06:14:05 +03:00
test_onnx_conformance_layer_filter_opencv_denylist.inl.hpp move global skip out of if loop, and add opencv_deny_list 2023-03-13 22:16:51 +08:00
test_onnx_conformance_layer_filter_opencv_ocl_fp16_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance_layer_filter_opencv_ocl_fp32_denylist.inl.hpp implementation of scatter and scatternd with conformance tests enabled 2022-10-17 11:30:32 +08:00
test_onnx_conformance_layer_parser_denylist.inl.hpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_conformance.cpp Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
test_onnx_importer.cpp Merge pull request #25794 from Abdurrahheem:ash/yolov10-support 2024-07-02 18:26:34 +03:00
test_precomp.hpp dnn: reduce set of ignored warnings 2018-11-15 13:15:59 +03:00
test_tf_importer.cpp dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
test_tflite_importer.cpp Merge pull request #25613 from CNOCycle:tflite/ops 2024-05-31 19:31:21 +03:00
test_torch_importer.cpp DNN: add the Winograd fp16 support (#23654) 2023-11-20 13:45:37 +03:00