opencv/modules
Yuantao Feng a7fd9446cf
Merge pull request #25630 from fengyuentau:nary-multi-thread
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630

This PR introduces the following changes:

- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable

## Performance

### i7-12700K, RAM 64GB, Ubuntu 22.04

```
Geometric mean (ms)

                Name of Test                     opencv        opencv        opencv
                                                  perf          perf          perf
                                              core.x64.0606 core.x64.0606 core.x64.0606
                                                                               vs
                                                                             opencv
                                                                              perf
                                                                          core.x64.0606
                                                                           (x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           16.116        11.161         1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        17.469        11.446         1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        17.531        11.469         1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      28.653        13.682         2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    21.899        13.422         1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       21.738        13.185         1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        16.172        11.473         1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       16.309        11.565         1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        16.166        11.454         1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        16.157        11.443         1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU        163.459       15.234         10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    10.880        10.868         1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    10.947        11.058         0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    10.948        10.910         1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    10.874        10.871         1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    10.971        10.920         1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        17.546        11.462         1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        16.175        11.475         1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU               11.339        11.333         1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU               16.154        11.102         1.46
```

### Apple M1, RAM 16GB, macOS 14.4.1

```
Geometric mean (ms)

                Name of Test                     opencv          opencv             opencv      
                                                  perf            perf               perf       
                                              core.m1.0606 core.m1.0606.patch core.m1.0606.patch
                                                                                      vs        
                                                                                    opencv      
                                                                                     perf       
                                                                                 core.m1.0606   
                                                                                  (x-factor)    
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           28.418          3.768               7.54       
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        6.942           5.679               1.22       
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        5.822           5.653               1.03       
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      5.751           5.628               1.02       
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    5.797           5.599               1.04       
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       7.272           5.578               1.30       
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        5.777           5.562               1.04       
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       5.819           5.559               1.05       
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        5.830           5.574               1.05       
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        5.759           5.567               1.03       
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU       342.260          74.655              4.58       
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    8.338           8.280               1.01       
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    8.359           8.309               1.01       
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    8.412           8.295               1.01       
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    8.380           8.297               1.01       
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    8.356           8.323               1.00       
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        6.818           5.561               1.23       
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        5.805           5.570               1.04       
NHWC_C::Layer_NaryEltwise::OCV/CPU               3.834           4.817               0.80       
NHWC_H::Layer_NaryEltwise::OCV/CPU               28.402          3.771               7.53
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-07-03 10:09:05 +03:00
..
calib3d Merge pull request #25427 from MaximSmolskiy:make-finding-corner-neighbor-symmetrical-in-ChessBoardDetector-findQuadNeighbors 2024-06-10 09:42:56 +03:00
core Merge pull request #24941 from WanliZhong:v_exp 2024-07-02 12:32:49 +03:00
dnn Merge pull request #25630 from fengyuentau:nary-multi-thread 2024-07-03 10:09:05 +03:00
features2d Merge pull request #25424 from mshabunin:fix-features2d-test 2024-04-17 14:19:05 +03:00
flann Prevent signed integer overflow in LshTable 2024-05-24 23:47:36 +02:00
gapi Merge pull request #25791 from ujjayant-kadian:uk/extend-gapi-onnx-params-arbitrary-session-options 2024-06-21 14:34:26 +03:00
highgui Fix #25833: The correct way to disable top-most state is with HWND_NOTOPMOST, not HWND_TOP. 2024-06-29 21:39:49 +02:00
imgcodecs Fix file descriptor leak in HDR decoder 2024-06-30 18:43:04 +03:00
imgproc Merge pull request #25820 from asmorkalov:as/HAL_non_strict_equalizeHist 2024-06-28 16:51:15 +03:00
java doc: fix formulas in JavaDoc broken after Doxygen upgrade 2024-03-11 23:47:23 +03:00
js Merge pull request #25757 from dkurt:d.kurtaev/opencv_js_tests_old_emsdk 2024-06-17 12:46:10 +03:00
ml Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
objc Merge pull request #24136 from komakai:visionos_support 2023-12-20 15:35:10 +03:00
objdetect Merge pull request #25722 from AleksandrPanov:update_testSeveralBoardsWithCustomIds 2024-06-06 20:01:33 +03:00
photo photo: doc: Fix window range for fastNlMeansDenoisingMulti 2024-06-21 21:04:22 +08:00
python Merge pull request #25706 from cudawarped:fix_cuda_first_python_dep 2024-06-11 10:49:14 +03:00
stitching Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
ts test: use cv::theRNG instead of own generator 2024-06-07 13:36:11 +03:00
video Merge pull request #25771 from fengyuentau:vittrack_black_input 2024-06-18 12:48:28 +03:00
videoio Set using Orbbec SDK on MacOS OFF by default. 2024-07-02 17:23:20 +08:00
world cmake: use /INCREMENTAL:NO with MSVS 2015 2023-12-07 19:46:27 +00:00