opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-15 18:09:11 +08:00

Author	SHA1	Message	Date
Yuantao Feng	0507043a55	Merge pull request #24386 from fengyuentau:fix_dtype_nary_eltwise dnn: fix inconsistent input dtype for nary eltwise layers #24386 Resolves https://github.com/opencv/opencv/issues/24385 Merge with https://github.com/opencv/opencv_extra/pull/1107 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1353964405 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-13 11:56:18 +03:00
Alexander Smorkalov	58285e5468	Merge pull request #24359 from asmorkalov:as/FastNeuralStyle_eccv16_tuning Tuned threshold for FastNeuralStyle_eccv16 test	2023-10-13 10:29:41 +03:00
Yuantao Feng	590f150d5e	dnn: hotfixes for fast gemm (#24315 ) * remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code	2023-10-07 21:48:44 +03:00
Sean McBride	5fb3869775	Merge pull request #23109 from seanm:misc-warnings * Fixed clang -Wnewline-eof warnings * Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings * Removed trailing semi from various macros * Fixed various -Wunused-macros warnings * Fixed some trivial -Wdocumentation warnings * Fixed some -Wdocumentation-deprecated-sync warnings * Fixed incorrect indentation * Suppressed some clang warnings in 3rd party code * Fixed QRCodeEncoder::Params documentation. --------- Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2023-10-06 13:33:21 +03:00
HAN Liutong	07bf9cb013	Merge pull request #24325 from hanliutong:rewrite Rewrite Universal Intrinsic code: float related part #24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: #23885 First patch, an example #23980 Core module #24058 ImgProc module, part 1 #24132 ImgProc module, part 2 #24166 ImgProc module, part 3 #24301 Features2d and calib3d module #24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD \|\| CV_SIMD_SCALABLE)` instead of `#if CV_SIMD \|\| CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-05 17:57:25 +03:00
Dmitry Kurtaev	2c92eb3175	Enable more tests for OpenVINO 2023.0	2023-10-05 12:51:55 +03:00
Alexander Smorkalov	33d64d0491	Tuned threshold for FastNeuralStyle_eccv16 test for systems without AVX2.	2023-10-04 16:19:13 +03:00
Wanli	62b5470b78	Merge pull request #24298 from WanliZhong:extend_perf_net_test Extend performance test models #24298 Merged With https://github.com/opencv/opencv_extra/pull/1095 This PR aims to extend the performance tests. - YOLOv5 for object detection - YOLOv8 for object detection - EfficientNet for classification Models from OpenCV Zoo: - YOLOX for object detection - YuNet for face detection - SFace for face recognization - MPPalm for palm detection - MPHand for hand landmark - MPPose for pose estimation - ViTTrack for object tracking - PPOCRv3 for text detection - CRNN for text recognization - PPHumanSeg for human segmentation If other models should be added, please leave some comments. Thanks! Build opencv with script: ```shell -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_opencv_gapi=OFF -DINSTALL_PYTHON_EXAMPLES=OFF -DINSTALL_C_EXAMPLES=OFF -DBUILD_DOCS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_ZLIB=OFF -DWITH_FFMPEG=OFF ``` Performance Test on Apple M2 CPU ```shell MacOS 14.0 8 threads ``` 1 thread: \| Name of Test \| 4.5.5-1th \| 4.6.0-1th \| 4.7.0-1th \| 4.8.0-1th \| 4.8.1-1th \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 76.244 \| 76.611 \| 62.534 \| 57.678 \| 57.238 \| \| EfficientNet \| --- \| --- \| 109.224 \| 130.753 \| 109.076 \| \| MPHand \| --- \| --- \| 19.289 \| 22.727 \| 27.593 \| \| MPPalm \| 47.150 \| 47.061 \| 41.064 \| 65.598 \| 40.109 \| \| MPPose \| --- \| --- \| 26.592 \| 32.022 \| 26.956 \| \| PPHumanSeg \| 41.672 \| 41.790 \| 27.819 \| 27.212 \| 30.461 \| \| PPOCRv3 \| --- \| --- \| 140.371 \| 187.922 \| 170.026 \| \| SFace \| 43.830 \| 43.834 \| 27.575 \| 30.653 \| 26.387 \| \| ViTTrack \| --- \| --- \| --- \| 14.617 \| 15.028 \| \| YOLOX \| 1060.507 \| 1061.361 \| 495.816 \| 533.309 \| 549.713 \| \| YOLOv5 \| --- \| --- \| --- \| 191.350 \| 193.261 \| \| YOLOv8 \| --- \| --- \| 198.893 \| 218.733 \| 223.142 \| \| YuNet \| 27.084 \| 27.095 \| 26.238 \| 30.512 \| 34.439 \| \| MobileNet_SSD_Caffe \| 44.742 \| 44.565 \| 33.005 \| 29.421 \| 29.286 \| \| MobileNet_SSD_v1_TensorFlow \| 49.352 \| 49.274 \| 35.163 \| 32.134 \| 31.904 \| \| MobileNet_SSD_v2_TensorFlow \| 83.537 \| 83.379 \| 56.403 \| 42.947 \| 42.148 \| \| ResNet_50 \| 148.872 \| 148.817 \| 77.331 \| 67.682 \| 67.760 \| n threads: \| Name of Test \| 4.5.5-nth \| 4.6.0-nth \| 4.7.0-nth \| 4.8.0-nth \| 4.8.1-nth \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 44.262 \| 44.408 \| 41.540 \| 40.731 \| 41.151 \| \| EfficientNet \| --- \| --- \| 28.683 \| 42.676 \| 38.204 \| \| MPHand \| --- \| --- \| 6.738 \| 13.126 \| 8.155 \| \| MPPalm \| 16.613 \| 16.588 \| 12.477 \| 31.370 \| 17.048 \| \| MPPose \| --- \| --- \| 12.985 \| 19.700 \| 16.537 \| \| PPHumanSeg \| 14.993 \| 15.133 \| 13.438 \| 15.269 \| 15.252 \| \| PPOCRv3 \| --- \| --- \| 63.752 \| 85.469 \| 76.190 \| \| SFace \| 10.685 \| 10.822 \| 8.127 \| 8.318 \| 7.934 \| \| ViTTrack \| --- \| --- \| --- \| 10.079 \| 9.579 \| \| YOLOX \| 417.358 \| 422.977 \| 230.036 \| 234.662 \| 228.555 \| \| YOLOv5 \| --- \| --- \| --- \| 74.249 \| 75.480 \| \| YOLOv8 \| --- \| --- \| 63.762 \| 88.770 \| 70.927 \| \| YuNet \| 8.589 \| 8.731 \| 11.269 \| 16.466 \| 14.513 \| \| MobileNet_SSD_Caffe \| 12.575 \| 12.636 \| 11.529 \| 12.114 \| 12.236 \| \| MobileNet_SSD_v1_TensorFlow \| 13.922 \| 14.160 \| 13.078 \| 12.124 \| 13.298 \| \| MobileNet_SSD_v2_TensorFlow \| 25.096 \| 24.836 \| 22.823 \| 20.238 \| 20.319 \| \| ResNet_50 \| 41.561 \| 41.296 \| 29.092 \| 30.412 \| 29.339 \| Performance Test on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html) ```shell Ubuntu 22.04.2 LTS 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz) 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz) 20 threads ``` 1 thread: \| Name of Test \| 4.5.5-1th \| 4.6.0-1th \| 4.7.0-1th \| 4.8.0-1th \| 4.8.1-1th \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 16.752 \| 16.851 \| 16.840 \| 16.625 \| 16.663 \| \| EfficientNet \| --- \| --- \| 61.107 \| 76.037 \| 53.890 \| \| MPHand \| --- \| --- \| 8.906 \| 9.969 \| 8.403 \| \| MPPalm \| 24.243 \| 24.638 \| 18.104 \| 35.140 \| 18.387 \| \| MPPose \| --- \| --- \| 12.322 \| 16.515 \| 12.355 \| \| PPHumanSeg \| 15.249 \| 15.303 \| 10.203 \| 10.298 \| 10.353 \| \| PPOCRv3 \| --- \| --- \| 87.788 \| 144.253 \| 90.648 \| \| SFace \| 15.583 \| 15.884 \| 13.957 \| 13.298 \| 13.284 \| \| ViTTrack \| --- \| --- \| --- \| 11.760 \| 11.710 \| \| YOLOX \| 324.927 \| 325.173 \| 235.986 \| 253.653 \| 254.472 \| \| YOLOv5 \| --- \| --- \| --- \| 102.163 \| 102.621 \| \| YOLOv8 \| --- \| --- \| 87.013 \| 103.182 \| 103.146 \| \| YuNet \| 12.806 \| 12.645 \| 10.515 \| 12.647 \| 12.711 \| \| MobileNet_SSD_Caffe \| 23.556 \| 23.768 \| 24.304 \| 22.569 \| 22.602 \| \| MobileNet_SSD_v1_TensorFlow \| 26.136 \| 26.276 \| 26.854 \| 24.828 \| 24.961 \| \| MobileNet_SSD_v2_TensorFlow \| 43.521 \| 43.614 \| 46.892 \| 44.044 \| 44.682 \| \| ResNet_50 \| 73.588 \| 73.501 \| 75.191 \| 66.893 \| 65.144 \| n thread: \| Name of Test \| 4.5.5-nth \| 4.6.0-nth \| 4.7.0-nth \| 4.8.0-nth \| 4.8.1-nth \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 8.665 \| 8.827 \| 10.643 \| 7.703 \| 7.743 \| \| EfficientNet \| --- \| --- \| 16.591 \| 12.715 \| 9.022 \| \| MPHand \| --- \| --- \| 2.678 \| 2.785 \| 1.680 \| \| MPPalm \| 5.309 \| 5.319 \| 3.822 \| 10.568 \| 4.467 \| \| MPPose \| --- \| --- \| 3.644 \| 6.088 \| 4.608 \| \| PPHumanSeg \| 4.756 \| 4.865 \| 5.084 \| 5.179 \| 5.148 \| \| PPOCRv3 \| --- \| --- \| 32.023 \| 50.591 \| 32.414 \| \| SFace \| 3.838 \| 3.980 \| 4.629 \| 3.145 \| 3.155 \| \| ViTTrack \| --- \| --- \| --- \| 10.335 \| 10.357 \| \| YOLOX \| 68.314 \| 68.081 \| 82.801 \| 74.219 \| 73.970 \| \| YOLOv5 \| --- \| --- \| --- \| 47.150 \| 47.523 \| \| YOLOv8 \| --- \| --- \| 32.195 \| 30.359 \| 30.267 \| \| YuNet \| 2.604 \| 2.644 \| 2.622 \| 3.278 \| 3.349 \| \| MobileNet_SSD_Caffe \| 13.005 \| 5.935 \| 8.586 \| 4.629 \| 4.713 \| \| MobileNet_SSD_v1_TensorFlow \| 7.002 \| 7.129 \| 9.314 \| 5.271 \| 5.213 \| \| MobileNet_SSD_v2_TensorFlow \| 11.939 \| 12.111 \| 22.688 \| 12.038 \| 12.086 \| \| ResNet_50 \| 18.227 \| 18.600 \| 26.150 \| 15.584 \| 15.706 \|	2023-10-04 13:05:32 +03:00
alexlyulkov	9bd14d5417	Merge pull request #24353 from alexlyulkov:al/fixed-cumsum-layer Fixed CumSum dnn layer #24353 Fixes #20110 The algorithm had several errors, so I rewrote it. Also the layer didn't work with non constant axis tensor. Fixed it. Enabled CumSum layer tests from ONNX conformance.	2023-10-03 13:58:25 +03:00
Alexander Smorkalov	163d544ecf	Merge branch 4.x	2023-10-02 10:17:23 +03:00
Alexander Smorkalov	5caee5cc64	Fixed OpenCL PF16 fallback in Einsum layer.	2023-09-29 15:52:23 +03:00
Dmitry Kurtaev	c7ec0d599a	Merge pull request #23987 from dkurt:openvino_int8_backend OpenVINO backend for INT8 models #23987 ### Pull Request Readiness Checklist TODO: - [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069) - [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum) - [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039) - [x] Single layer tests (convolution) - [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~ Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`: \| backend \| performance (median time) \| \|---\|---\| \| OpenCV \| 77.42ms \| \| OpenVINO 2023.0 \| 10.90ms \| CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz` Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef --- See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-28 16:24:43 +03:00
Alexander Smorkalov	b8d4ac589d	Merge pull request #24334 from fengyuentau:fix_24319 dnn onnx: fix not-found constant indices for Gather if shared	2023-09-28 13:08:26 +03:00
fengyuentau	7fa0493ca0	init commit	2023-09-28 11:50:21 +08:00
Yuantao Feng	307324f4ac	Merge pull request #24283 from fengyuentau:halide_tests dnn: merge tests from test_halide_layers to test_backends #24283 Context: https://github.com/opencv/opencv/pull/24231#pullrequestreview-1628649980 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 14:09:47 +03:00
Dmitry Kurtaev	2b6d0f36f0	Merge pull request #24309 from dkurt:gemm_ov_hotfix Update OpenVINO init of new GEMM layer #24309 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request CI validation: - [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368 - [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373 Checklist: - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 10:25:45 +03:00
Yuantao Feng	bb171a0c05	dnn: expand refactor with cv::broadcast for onnx models (#24295 ) * add expand impl with cv::broadcast * remove expandMid * deduce shape from -1 * add constant folding * handle input constant; handle input constant 1d * add expand conformance tests; add checks to disallow shape of neg values; add early copy for unchanged total elements * fix ExpandSubgraph * dummy commit to trigger build * dummy commit to trigger build 1 * remove conformance from test names	2023-09-27 09:28:52 +03:00
Alexander Smorkalov	9942757bab	Merge pull request #24316 from alexlyulkov:al/fix-caffe-read-segfault Fixed segfault when reading Caffe model	2023-09-25 17:53:54 +03:00
Alexander Lyulkov	72e7672a6c	Fixed segfault when reading Caffe model	2023-09-25 12:55:11 +07:00
Abduragim Shtanchaev	865e7cacca	Merge pull request #24037 from Abdurrahheem:ash/dev_einsum Add Support for Einsum Layer #24037 ### This PR adding support for [Einsum Layer](https://pytorch.org/docs/stable/generated/torch.einsum.html) (in progress). This PR is currently not to be merged but only reviewed. Test cases are located in [#1090](https://github.com/opencv/opencv_extra/pull/1090)RP in OpenCV extra DONE: - [x] 2-5D GMM support added - [x] Matrix transpose support added - [x] Reduction type comupte 'ij->j' - [x] 2nd shape computation - during forward Next PRs: - [ ] Broadcasting reduction "...ii ->...i" - [ ] Add lazy shape deduction. "...ij, ...jk->...ik" - [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik") - [ ] Add support for CUDA backend - [ ] BatchWiseMultiply optimize Later in 5.x version (requires support for 1D matrices): - [ ] Add 1D vector multiplication support - [ ] Inter product "i, i" (problems with 1D shapes) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-22 11:25:02 +03:00
Vadim Pisarevsky	416bf3253d	attempt to add 0d/1d mat support to OpenCV (#23473 ) * attempt to add 0d/1d mat support to OpenCV * revised the patch; now 1D mat is treated as 1xN 2D mat rather than Nx1. * a step towards 'green' tests * another little step towards 'green' tests * calib test failures seem to be fixed now * more fixes _core & _dnn * another step towards green ci; even 0D mat's (a.k.a. scalars) are now partly supported! * * fixed strange bug in aruco/charuco detector, not sure why it did not work * also fixed a few remaining failures (hopefully) in dnn & core * disabled failing GAPI tests - too complex to dig into this compiler pipeline * hopefully fixed java tests * trying to fix some more tests * quick followup fix * continue to fix test failures and warnings * quick followup fix * trying to fix some more tests * partly fixed support for 0D/scalar UMat's * use updated parseReduce() from upstream * trying to fix the remaining test failures * fixed [ch]aruco tests in Python * still trying to fix tests * revert "fix" in dnn's CUDA tensor * trying to fix dnn+CUDA test failures * fixed 1D umat creation * hopefully fixed remaining cuda test failures * removed training whitespaces	2023-09-21 18:24:38 +03:00
Alexander Smorkalov	799bb0cd18	Merge pull request #24291 from visitorckw:fix-memory-leak Fix memory leak and handle realloc failure	2023-09-20 08:49:56 +03:00
Yuantao Feng	8a96e34e33	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix	2023-09-20 00:53:34 +03:00
Kuan-Wei Chiu	e16ca08b33	Fix memory leak and handle realloc failure In the previous code, there was a memory leak issue where the previously allocated memory was not freed upon a failed realloc operation. This commit addresses the problem by releasing the old memory before setting the pointer to NULL in case of a realloc failure. This ensures that memory is properly managed and avoids potential memory leaks.	2023-09-18 22:43:44 +08:00
Alexander Smorkalov	157b0e7760	Merge pull request #24275 from alexlyulkov:al/fix-tf-graph-simplifier Fixed removePhaseSwitches in tf_graph_simplifier	2023-09-18 11:02:44 +03:00
Alexander Lyulkov	d4cb564ce2	Fixed removePhaseSwitches in tf_graph_simplifier	2023-09-15 14:22:21 +07:00
Dmitry Kurtaev	c5edd20354	Higher threshold for FasterRCNN_vgg16	2023-09-14 13:11:53 +03:00
alexlyulkov	1e54e56579	Merge pull request #24266 from alexlyulkov:al/tf-argmax-default-dim Added default dimension value to tensorflow ArgMax and ArgMin layers #24266 Added default dimension value to tensorflow ArgMax and ArgMin layers. Added exception when accessing layer's input with out of range index. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48452	2023-09-14 10:25:24 +03:00
Alexander Smorkalov	fdab565711	Merge branch 4.x	2023-09-13 14:49:25 +03:00
Alexander Smorkalov	62c0556c58	Merge pull request #24252 from opencv-pushbot:gitee/alalek/refactor_24218 cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration	2023-09-11 08:55:19 +03:00
Alexander Alekhin	02525abd9f	cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration - disable message on default value	2023-09-10 13:11:36 +00:00
Dmitry Kurtaev	5dc5b27858	Enable build with OpenVINO in Debug	2023-09-09 20:38:59 +03:00
Alexander Smorkalov	e60825e75b	Merge pull request #24218 from CSBVision:patch-5 Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT	2023-09-08 14:21:39 +03:00
Alexander Smorkalov	5350fba319	Merge pull request #24128 from CSBVision:CSBVision-patch-1 Fix bug at blobFromImagesWithParams	2023-09-06 16:20:37 +03:00
CSBVision	674c618471	Update dnn_utils.cpp	2023-09-06 10:01:07 +03:00
Dmitry Kurtaev	178fdbbda8	Merge pull request #24196 from dkurt:ov_backend_cleanups Use ngraph::Output in OpenVINO backend wrapper #24196 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24102 * Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: `588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)` * All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`). * Correct inference if only intermediate blobs requested. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-05 18:08:28 +03:00
Björn Böken	639836ebf0	Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT	2023-09-05 10:05:12 +02:00
Wanli	84f32bbb24	increase Fast Math threshold	2023-09-05 14:03:54 +08:00
Sam James	c20febdbb0	Fix compilation on arm64 with FP16 when disabled If building with -mcpu=native or any other setting which implies the current CPU has FP16 but with intrinsics disabled, we mistakenly try to use it even though convolution.hpp conditionally defines it correctly based on whether we should use it. convolution.cpp on the other hand was mismatched and trying to use it if the CPU supported it, even if not enabled in the build system. Make the guards match. Bug: https://bugs.gentoo.org/913031 Signed-off-by: Sam James <sam@gentoo.org>	2023-08-29 03:05:49 +01:00
Dmitry Kurtaev	a0debc3a9a	Enable OpenVINO max pooling with indices since 2022.1	2023-08-23 10:39:38 +03:00
Dmitry Kurtaev	d88ad46978	Remove explitit transB attribute from MatMul perf test	2023-08-18 15:10:14 +03:00
autoantwort	f5a14532c2	Merge pull request #24167 from autoantwort:missing-include * add missing include * Apply CR	2023-08-17 09:34:19 +00:00
Dmitry Kurtaev	8ad5eb521a	Merge pull request #24120 from dkurt:actualize_dnn_links OCL_FP16 MatMul with large batch * Workaround FP16 MatMul with large batch * Fix OCL reinitialization * Higher thresholds for INT8 quantization * Try fix gemm_buffer_NT for half (columns) * Fix GEMM by rows * Add batch dimension to InnerProduct layer test * Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding * Batch 16 * Replace all vload4 * Version suffix for MobileNetSSD_deploy Caffe model	2023-08-16 15:46:11 +03:00
MuZihao	16681d1080	fix the issue in layer fused	2023-08-16 09:34:59 +08:00
Yuantao Feng	ba70ec99b3	Merge pull request #24122 from fengyuentau:remove_tengine dnn: cleanup of tengine backend #24122 🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine. Tengine backend related PRs: - https://github.com/opencv/opencv/pull/16724 - https://github.com/opencv/opencv/pull/18323 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-09 09:26:02 +03:00
Alexander Smorkalov	a6748df587	Merge branch 4.x	2023-08-08 17:32:17 +03:00
unknown	87b7ce4415	Solved issue 24044	2023-08-04 21:57:22 +02:00
Laurent Berger	2ff16d4c45	Merge pull request #24101 from LaurentBerger:I24076 Invalid memory access fix for ONNX split layer parser #24076 #24101 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work https://github.com/opencv/opencv/issues/24076 - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-04 12:18:49 +03:00
Alexander Smorkalov	5466fd2606	Merge pull request #24104 from cudawarped:cuda_fix_cuda_toolkit_12_2 `cuda`: fix for compatibility with CUDA Toolkit >= 12.2.0	2023-08-04 12:11:15 +03:00
Dmitry Kurtaev	4b8aeb1129	Merge pull request #24039 from dkurt:tflite_test_backends TFLite models on different backends (tests and improvements) #24039 ### Pull Request Readiness Checklist * MaxUnpooling with OpenVINO * Fully connected with transposed inputs/weights with OpenVINO * Enable backends tests for TFLite (related to https://github.com/opencv/opencv/issues/23992#issuecomment-1640691722) * Increase existing tests thresholds See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-04 11:28:51 +03:00
Dmitry Kurtaev	96f23e3da1	Merge pull request #24080 from dkurt:dnn_cuda_layers Resolve uncovered CUDA dnn layer #24080 ### Pull Request Readiness Checklist * Gelu activation layer on CUDA * Try to relax GEMM from ONNX resolves https://github.com/opencv/opencv/issues/24064 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-03 09:13:42 +03:00
Dmitry Kurtaev	0245c0cd10	Merge pull request #24072 from dkurt:openvino_cpu_tests Remove legacy nGraph logic #24072 ### Pull Request Readiness Checklist TODO: - [x] Test with OpenVINO 2021.4 (tested locally) See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-02 14:39:11 +03:00
Dmitry Kurtaev	195aad8e6a	Merge pull request #24069 from dkurt:openvino_detection_layer DetectionOutput layer on OpenVINO without limitations #24069 ### Pull Request Readiness Checklist required for https://github.com/opencv/opencv/pull/23987 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-02 14:28:47 +03:00
cudawarped	ab8cb6f8a9	cuda: fix for compatibility with CUDA Toolkit >= 12.2.0	2023-08-01 13:02:42 +03:00
Alexander Smorkalov	47188b7c7e	Merge branch 4.x	2023-07-28 13:05:36 +03:00
Dmitry Kurtaev	677a28fd2a	Merge pull request #24056 from dkurt:eltwise_prelu PReLU with element-wise scales #24056 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24051 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-07-27 16:36:40 +03:00
SaltFish-T	ab6bffc6f8	Merge pull request #23936 from SaltFish-T:4.x Update opencv dnn to support cann version >=6.3 #23936 1.modify the search path of "libopsproto.so" in OpenCVFindCANN.cmake 2.add the search path of "libgraph_base.so" in OpenCVFindCANN.cmake 3.automatic check Ascend socVersion,and test on Ascend310/Ascend310B/Ascend910B well	2023-07-27 14:21:30 +03:00
Alexander Smorkalov	e5e1a3bfde	Merge pull request #24043 from zixianweei:use-vaddq_f32-on-arm64 fix compilation error on Windows ARM, use vaddq_f32 instead of +=	2023-07-25 11:41:09 +03:00
zixgo	ec7689421d	fix compilation error on Windows ARM, use vaddq_f32 instead of +=	2023-07-21 19:54:43 +08:00
Alexander Smorkalov	d96ff496b4	Increase eps for Test_Torch_nets.FastNeuralStyle_accuracy to prevent sporadic test failres with CUDA.	2023-07-21 13:51:03 +03:00
Dmitry Kurtaev	e41ba90f17	Merge pull request #24004 from dkurt:tflite_new_layers [TFLite] Pack layer and other fixes for SSD from Keras #24004 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/23992 Merge with extra: https://github.com/opencv/opencv_extra/pull/1076 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-07-21 09:13:37 +03:00
Alexander Smorkalov	23f27d8dbe	Use OpenCV logging instead of std::cerr.	2023-07-19 10:49:54 +03:00
Zihao Mu	1920993525	Merge pull request #23952 from zihaomu:fix_depth_conv_5x5 DNN: optimize the speed of general Depth-wise #23952 Try to solve the issue: https://github.com/opencv/opencv/issues/23941 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-07-14 17:34:39 +03:00
Alexander Smorkalov	cea26341a5	Merge branch 4.x	2023-07-13 09:28:36 +03:00
Alexander Smorkalov	5af40a0269	Merge branch 4.x	2023-07-05 15:51:10 +03:00
Alexander Smorkalov	bf06bc92aa	Merge branch '3.4' into merge-3.4	2023-06-23 20:12:58 +03:00
Yuantao Feng	aff420329c	Merge pull request #23853 from fengyuentau:disable_fp16_warning dnn: disable warning when loading a fp16 model #23853 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-06-23 19:52:04 +03:00
Alexander Smorkalov	d9a5603fa3	Merge pull request #23860 from fengyuentau:fix_overflow_sigmoid_v3.4 dnn: fix overflow in sigmoid layer for 3.4	2023-06-23 19:47:42 +03:00
fengyuentau	29388f80a5	fix overflow	2023-06-23 21:22:21 +08:00
Alexander Smorkalov	51702ffd92	pre: OpenCV 4.8.0 (version++)	2023-06-20 15:52:57 +03:00
Dmitry Kurtaev	433c364456	Merge pull request #23724 from dkurt:java_without_ant Build Java without ANT #23724 ### Pull Request Readiness Checklist Enables a path of building Java bindings without ANT * Able to build OpenCV JAR and Docs without ANT ``` -- Java: -- ant: NO -- JNI: /usr/lib/jvm/default-java/include /usr/lib/jvm/default-java/include/linux /usr/lib/jvm/default-java/include -- Java wrappers: YES -- Java tests: NO ``` * Possible to build OpenCV JAR without ANT but tests still require ANT Merge with: https://github.com/opencv/opencv_contrib/pull/3502 Notes: - Use `OPENCV_JAVA_IGNORE_ANT=1` to force "Java" flow for building Java bindings - Java tests still require Apache ANT - JAR doesn't include `.java` source code files. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-06-16 19:58:20 +03:00
Alexander Smorkalov	3c0b71bcec	Merge pull request #23795 from dkurt:tf_half_pixel_for_nn Consider half pixel mode in ONNX resize	2023-06-16 10:21:20 +03:00
Dmitry Kurtaev	924c01dbec	Replace CV_Assert_N	2023-06-15 17:30:33 +03:00
Wang Kai	fc2d933224	removing unreachable code and fixing a typo	2023-06-15 01:09:02 +08:00
Dmitry Kurtaev	6909fffde1	Consider half pixel mode in ONNX resize	2023-06-14 14:21:28 +03:00
Dmitry Kurtaev	f9d7f47e28	Change Scalar assignment in Python from single value	2023-06-13 10:45:03 +03:00
Wang Kai	4622f1e89b	fixing typo of a variable name in dnn::runFastConv	2023-06-11 01:54:03 +08:00
Zihao Mu	eec8a20c33	Merge pull request #23763 from zihaomu:add_runtime_check DNN: fix bug for X86 Winograd #23763 Address https://github.com/opencv/opencv/issues/23760 The patch aims to add a runtime check for X86 platform without AVX(2). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-06-09 09:18:12 +03:00
Alexander Smorkalov	6d2cbc4055	Merge pull request #23761 from LaurentBerger:typeblobfromimages checktype in blobFromImages and blobFromImagesWithParams	2023-06-08 09:59:01 +03:00
unknown	5f8e43da85	checktype in blobFromImages and blobFromImagesWithParams	2023-06-07 16:15:58 +02:00
Abduragim Shtanchaev	6b53fe8f7b	Merge pull request #23746 from Abdurrahheem:ash/graph_simplifier Assertion Fix in Split Layer #23746 ### Pull Request Readiness Checklist This PR fixes issue mentioned in [#23663](https://github.com/opencv/opencv/issues/23663) Merge with https://github.com/opencv/opencv_extra/pull/1067 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-06-07 16:01:42 +03:00
Olivier Hotel	0442c6fa81	Addition of normalize_axis to ONNXImporter::parseSqueeze to support negative values for the axes attribut. Negative values are part of the ONNX optset>=11. Signed-off-by: Olivier Hotel <olivier.hotel@orange.com>	2023-05-30 10:21:27 +02:00
Abduragim Shtanchaev	ecd2e8ff47	added index that check all inputs of nodes that match	2023-05-29 14:48:42 +03:00
Alexander Smorkalov	cf0ba039c3	Merge pull request #23625 from zihaomu:improve_conv DNN: Remove unnecessary flags for convolution	2023-05-26 12:59:36 +03:00
Alexander Smorkalov	26a7b332cb	Merge pull request #23671 from zihaomu:fix_potential_bug DNN: fix potential bug, stride should not be set as 0.	2023-05-25 13:36:37 +03:00
Yuantao Feng	f07b01cc34	Merge pull request #23655 from fengyuentau:qlinearsoftmax Support ONNX operator QLinearSoftmax in dnn #23655 Resolves https://github.com/opencv/opencv/issues/23636. Merge with https://github.com/opencv/opencv_extra/pull/1064. This PR maps the QLinearSoftmax (from com.microsoft domain) to SoftmaxInt8 in dnn along with some speed optimization. Todo: - [x] support QLinearSoftmax with opset = 13 - [x] add model and test data for QLinearSoftmax with opset = 13 - [x] ensure all models have dims >= 3. - [x] add the script to generate model and test data ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-25 13:35:58 +03:00
zihaomu	4384e77bd1	when stride ==0, it should be bug	2023-05-24 21:57:59 +08:00
Alexander Smorkalov	4a559bc2ab	Merge pull request #23656 from peters:patch-2 Build fix for AVX 256	2023-05-23 09:20:34 +03:00
Alexander Smorkalov	b122a4b436	Merge pull request #23646 from dkurt:dnn_ie_region_fix Fix Region layer with OpenVINO in case of different width/height	2023-05-22 16:22:50 +03:00
Peter Rekdal Khan-Sunde	04970490ec	Build fix /build/build_cuda/3p/opencv/linux-x64/ubuntu22.04/Debug/modules/dnn/src/layers/cpu_kernels/convolution.cpp: In function 'void cv::dnn::packData8(char&, float&, int&, int&, int&, const int, int, int, int)': /build/build_cuda/3p/opencv/linux-x64/ubuntu22.04/Debug/modules/dnn/src/layers/cpu_kernels/convolution.cpp:448:43: error: 'CONV_NR' was not declared in this scope; did you mean 'CONV_3D'? 448 \| vx_store(inpbufC_FP32 + kCONV_NR, vx_load(inptrInC + k1)); \| ^~~~~~~ \| CONV_3D	2023-05-22 11:25:04 +02:00
Alexander Smorkalov	f2311d1bfd	Merge pull request #23645 from Abdurrahheem:ash/tf_init_input_check Add assert to check if layer input size is not empty	2023-05-19 13:28:24 +03:00
Zihao Mu	5025f29378	speed up vulkan dnn, and support ios and apple m1 chip. (#23349 )	2023-05-18 20:02:27 +03:00
Dmitry Kurtaev	af14780526	Fix Region layer with OpenVINO in case of different width/height	2023-05-18 17:45:30 +03:00
Abduragim Shtanchaev	2b9d2c726a	add assert to check if layer input size is not empty	2023-05-18 16:17:57 +03:00
Abduragim Shtanchaev	d2143bcd44	Merge pull request #23614 from Abdurrahheem:lstm_layout_attribute LSTM ONNX Layout Attribute Support #23614 ### Explanation This PR contains necessary changes to support `layout` attribute. This attributes is present in [ONNX](https://github.com/onnx/onnx/blob/main/docs/Operators.md#lstm) and [Torch](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#lstm) (in touch it is name as `batch_first=True`) libraries. When `layout = 1` input to LSTM layer is expected to have batch dimension first -> `[batch_size, sequence_length, features]` vs `layout = 0` - default `[sequence_length, batch_size, features]` ### Test Data Test data and data generator for PR located here [#1063](https://github.com/opencv/opencv_extra/pull/1063) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-17 22:46:56 +03:00
Yuantao Feng	eefee8574a	dnn: refactor reduce (#23613 ) * initial impl * remove reduce in8; fix reduce importer * fix bugs and add log sum exp * remove unnecessary header and fix indentation	2023-05-17 10:03:45 +03:00
Zihao Mu	5229312ad2	Merge pull request #22275 from zihaomu:fp16_support_conv DNN: FP16 support on Convolution 2D #22275 ## FP16 support on ARM platform This PR proposes to support FP16 backend in Convolution. For now, we only support FP16 at ARM aarch64. In addition to adding fp16, I also added `seperateIm2col` optimization in this patch. ## How to use FP16 to speed up convolution? ``` Net net = readNet(modelPath); net.setPreferableTarget(DNN_TARGET_CPU_FP16); net.setInput(blob); Mat output = net.forward(); ``` ### TODO List \| Task \| Status \| Remarks \| \|:-------:\|:--------:\|:------------:\| \| Convolution 2D FP16 \| ✔️ \| Done \| \| Winograd FP16 \| Because the current modification has reached 2k lines, winograd fp16 will be completed in the next PR. \| \| \| Accuracy Test \| ✔️ \| Done \| \| Performance Test \| ✔️ \| Done \| \| Compiler bug \| ✔️ \| Done \| ### Speed Test for FP 16. Test on M1 chip, 4 threads. \| Model Name \| FP32 (Conv+Wino) \| Conv(FP16) + Wino(FP 32) \| \|:-------:\|:--------:\|:------------:\| \| ReseNet 50 \| 26.0 ms \| 18.05 ms (25% speed up)\| \| MobileNet V2 \| 4.17 ms \| 3.09 ms (29% speed up) \| ### Speed Test for `seperateIm2col` trick on X86. Test on AMD 5600x, 12 threads. \| Model Name \| 4.x \| Patch \| \|:-------:\|:--------:\|:------------:\| \| MobileNet V2 \| 5.6 ms \| 3.0 ms (46% speed up) \| ### Performance Test #### Performance Test of X86 platform: AMD 5600X, with `-perf_threas=1` \|Name of Test\|4.x\|patch\|patch vs 4.x (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|Name of Test\|4.x 0\|fp16pr final\|fp16pr final vs 4.x 0 (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.00\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.03\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.001\|0.001\|0.92\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.002\|0.003\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.006\|0.006\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.045\|0.033\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.011\|0.009\|1.17\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.109\|0.078\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.040\|0.042\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.326\|0.342\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.580\|0.589\|0.99\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.293\|1.382\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.590\|3.710\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.120\|1.191\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.576\|2.872\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.599\|4.670\|0.98\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|9.230\|9.582\|0.96\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|65.946\|69.381\|0.95\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|18.915\|19.289\|0.98\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|1.404\|1.457\|0.96\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|2.060\|1.501\|1.37\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.409\|1.464\|0.96\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|1.793\|1.838\|0.98\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.207\|1.199\|1.01\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.277\|1.275\|1.00\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.319\|2.370\|0.98\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.351\|1.346\|1.00\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|3.520\|3.612\|0.97\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.876\|1.880\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.981\|1.995\|0.99\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|2.620\|2.627\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|4.202\|4.123\|1.02\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.429\|2.445\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|2.591\|2.576\|1.01\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|3.005\|2.998\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|3.515\|3.532\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|3.115\|3.134\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.937\|3.899\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|5.533\|5.471\|1.01\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.472\|3.464\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|4.302\|4.322\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|6.100\|6.035\|1.01\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|6.580\|6.484\|1.01\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|9.741\|9.634\|1.01\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|10.131\|10.156\|1.00\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|12.391\|12.350\|1.00\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|91.074\|87.893\|1.04\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|5.903\|5.903\|1.00\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.890\|6.794\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.160\|5.131\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|4.970\|5.036\|0.99\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|5.045\|5.015\|1.01\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|11.583\|11.343\|1.02\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.348\|5.320\|1.01\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|5.357\|5.396\|0.99\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|6.050\|6.006\|1.01\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|5.952\|5.953\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|8.014\|8.014\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.472\|12.577\|0.99\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|10.803\|10.655\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|18.429\|13.405\|1.37\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|6.659\|6.647\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|14.192\|13.819\|1.03\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|6.045\|6.068\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.742\|12.828\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|8.046\|7.773\|1.04\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.440\|17.192\|1.01\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|15.418\|14.972\|1.03\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.430\|0.430\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|6.692\|6.663\|1.00\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|6.350\|6.347\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.267\|0.265\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|7.755\|7.558\|1.03\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.203\|0.202\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.663\|10.576\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|10.827\|10.614\|1.02\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|7.049\|6.947\|1.01\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|6.900\|6.901\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.165\|0.165\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|17.953\|17.251\|1.04\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|7.430\|7.320\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|22.187\|21.705\|1.02\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|8.349\|8.126\|1.03\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|8.273\|8.297\|1.00\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|8.169\|8.094\|1.01\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|13.602\|13.359\|1.02\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|8.633\|8.584\|1.01\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|29.339\|28.897\|1.02\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|13.000\|12.920\|1.01\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|14.262\|13.319\|1.07\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|27.453\|27.253\|1.01\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|32.052\|27.269\|1.18\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|15.363\|15.208\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|18.543\|18.434\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|39.114\|37.954\|1.03\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|36.271\|36.972\|0.98\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|19.262\|19.427\|0.99\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.298\|19.349\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.261\|19.847\|1.02\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.867\|21.525\|1.02\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|51.756\|49.979\|1.04\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|28.133\|27.060\|1.04\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|25.035\|24.980\|1.00\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|25.858\|25.821\|1.00\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|27.313\|27.149\|1.01\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|28.219\|28.111\|1.00\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|46.025\|46.674\|0.99\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|30.220\|29.446\|1.03\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|49.410\|48.708\|1.01\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|38.203\|38.001\|1.01\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|39.961\|39.021\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|48.685\|47.075\|1.03\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|75.114\|72.586\|1.03\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|41.222\|41.144\|1.00\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|46.220\|46.353\|1.00\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|98.201\|98.771\|0.99\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|100.106\|96.971\|1.03\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|146.977\|140.445\|1.05\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|198.618\|194.665\|1.02\| #### Performance Test of ARM platform: apple M1, with `-perf_threas=1` Min (ms) \|Name of Test\|4.x\|patch\|4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.07\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.10\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.002\|0.002\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.003\|0.003\|0.84\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.009\|0.009\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.027\|0.030\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.008\|0.007\|1.07\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.066\|0.072\|0.91\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.090\|0.054\|1.68\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.328\|0.409\|0.80\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.659\|0.697\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.266\|1.403\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.550\|4.145\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.188\|1.375\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.683\|3.236\|0.83\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.491\|5.501\|0.82\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|8.916\|10.181\|0.88\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.995\|72.296\|0.97\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|22.531\|23.139\|0.97\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|2.239\|1.933\|1.16\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU_FP16)\|-\|1.010\|-\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|3.134\|2.068\|1.52\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU_FP16)\|-\|1.062\|-\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.918\|1.920\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU_FP16)\|-\|1.014\|-\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.340\|2.352\|0.99\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.247\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.116\|1.111\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU_FP16)\|-\|1.114\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.116\|1.112\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|1.113\|-\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|3.067\|3.085\|0.99\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.622\|-\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.153\|1.187\|0.97\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU_FP16)\|-\|1.150\|-\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|4.804\|4.849\|0.99\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU_FP16)\|-\|2.922\|-\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.463\|1.469\|1.00\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.459\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.577\|1.580\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|1.580\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|1.826\|1.818\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|1.817\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|6.541\|5.081\|1.29\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|2.809\|-\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.912\|1.919\|1.00\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.919\|-\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|1.961\|1.971\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|1.961\|-\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.317\|2.329\|0.99\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.322\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|2.920\|2.947\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|2.924\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|2.467\|2.466\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|2.496\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.028\|2.997\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|2.986\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|4.353\|4.355\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|4.355\|-\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.762\|2.793\|0.99\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.797\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|3.428\|3.226\|1.06\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU_FP16)\|-\|3.223\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|3.967\|3.957\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU_FP16)\|-\|3.960\|-\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|4.806\|4.387\|1.10\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|4.366\|-\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|14.509\|11.756\|1.23\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|6.510\|-\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|13.718\|13.287\|1.03\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.190\|-\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|15.133\|14.853\|1.02\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|8.671\|-\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|41.928\|43.328\|0.97\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|38.072\|-\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.409\|4.428\|1.00\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.427\|-\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.144\|5.363\|1.15\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU_FP16)\|-\|5.368\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.926\|3.932\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.938\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.920\|3.915\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.950\|-\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|3.767\|3.764\|1.00\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|3.762\|-\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|19.959\|13.875\|1.44\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|7.781\|-\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.951\|3.955\|1.00\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.969\|-\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|4.050\|4.034\|1.00\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.093\|-\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|4.923\|4.506\|1.09\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.509\|-\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|4.759\|4.476\|1.06\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.447\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|6.079\|5.628\|1.08\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|5.625\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.843\|17.523\|1.13\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.917\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|8.334\|8.247\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU_FP16)\|-\|8.246\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|23.164\|18.199\|1.27\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.305\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|5.184\|5.178\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|5.149\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.990\|18.103\|0.99\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.777\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|4.831\|4.522\|1.07\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.523\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.328\|17.319\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.948\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|5.944\|5.961\|1.00\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|5.936\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.811\|20.064\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|11.705\|-\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|22.398\|17.686\|1.27\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU_FP16)\|-\|9.859\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.416\|0.416\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.417\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|5.356\|5.110\|1.05\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|5.114\|-\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|5.092\|4.748\|1.07\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.754\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.260\|0.229\|1.13\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.229\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|5.872\|5.460\|1.08\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|5.460\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.161\|0.161\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.161\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|7.176\|7.175\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.162\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|7.174\|7.185\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|7.157\|-\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|5.400\|5.180\|1.04\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.201\|-\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|5.330\|5.188\|1.03\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.177\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.115\|0.115\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.115\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|26.156\|20.222\|1.29\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU_FP16)\|-\|11.203\|-\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|5.627\|5.543\|1.02\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.506\|-\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|27.925\|27.741\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|17.217\|-\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|6.359\|6.062\|1.05\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.048\|-\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|6.559\|6.322\|1.04\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|6.280\|-\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|6.412\|6.200\|1.03\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.197\|-\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|9.167\|8.624\|1.06\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU_FP16)\|-\|8.626\|-\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|6.755\|6.491\|1.04\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.520\|-\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|35.664\|34.752\|1.03\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|20.260\|-\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|9.514\|9.414\|1.01\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.462\|-\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|10.631\|9.963\|1.07\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.935\|-\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|37.465\|36.798\|1.02\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|19.569\|-\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|38.157\|36.157\|1.06\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU_FP16)\|-\|18.902\|-\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.356\|10.401\|1.00\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|10.360\|-\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|12.641\|12.150\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|12.162\|-\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|50.545\|50.505\|1.00\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|27.950\|-\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|54.233\|49.603\|1.09\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|26.515\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|13.779\|12.968\|1.06\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|12.984\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|15.809\|15.329\|1.03\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|15.433\|-\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|14.563\|14.527\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|14.480\|-\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|16.714\|16.484\|1.01\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|16.362\|-\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|77.832\|65.729\|1.18\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|32.065\|-\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|21.903\|20.386\|1.07\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|20.416\|-\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|20.405\|18.148\|1.12\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|18.128\|-\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.334\|18.521\|1.10\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|18.495\|-\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.527\|19.584\|1.10\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|19.630\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|22.715\|20.057\|1.13\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|20.068\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|26.228\|24.992\|1.05\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|24.957\|-\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|21.524\|21.581\|1.00\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|21.782\|-\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|34.094\|31.964\|1.07\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|31.925\|-\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|28.677\|27.813\|1.03\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.808\|-\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|31.274\|27.892\|1.12\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.910\|-\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|30.533\|30.007\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|30.089\|-\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|39.837\|38.312\|1.04\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|38.477\|-\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|32.480\|29.237\|1.11\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|29.452\|-\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|33.544\|32.832\|1.02\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|32.784\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|134.481\|130.678\|1.03\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU_FP16)\|-\|70.134\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|127.930\|126.530\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|65.261\|-\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|201.346\|187.007\|1.08\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|91.525\|-\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|252.038\|245.587\|1.03\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU_FP16)\|-\|125.477\|-\| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-05-17 09:38:33 +03:00
Alexander Smorkalov	59ca444b26	Merge pull request #23560 from WanliZhong:eltwise_cuda_bug DNN/CUDA: Solve the bug of same shape broadcast with CUDA	2023-05-16 14:16:37 +03:00
zihaomu	91b6c8507a	remove flag of convolution	2023-05-16 15:29:20 +08:00
Dmitry Kurtaev	a8d3d1f6f9	Merge pull request #23604 from dkurt:dnn_no_protobuf Build DNN without Protobuf DNN module can be built without Protobuf for Darknet, TFLite, OpenVINO, Torch (not PyTorch) models. ``` cmake \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_LIST=dnn \ -DWITH_PROTOBUF=OFF \ -DWITH_OPENCL=OFF 7.1M lib/libopencv_dnn.so.4.7.0 ``` ``` cmake \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_LIST=dnn \ -DWITH_OPENCL=OFF 3.9M lib/libopencv_dnn.so.4.7.0 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-15 12:23:18 +03:00
wanli	46991bcd62	Solve the bug of same shape broadcast with CUDA	2023-05-15 13:55:38 +08:00
Alexander Smorkalov	85b04f0b4d	Merge pull request #23557 from WanliZhong:eltwise_cpu_bug fix nary elementwise bug in cpu	2023-05-11 15:56:46 +03:00
Dmitry Kurtaev	676afdc494	Update FlatBuffers source code to 23.5.9	2023-05-10 14:39:36 +03:00
wanli	85cc4086c8	fix nary elementwise bug in cpu	2023-05-10 14:29:33 +08:00
Alexander Smorkalov	25c28c5da4	Merge pull request #23485 from zihaomu:add_onnx_where DNN: add ONNX where node support	2023-05-05 09:21:07 +03:00
zihaomu	0513741a85	add broadcast where node	2023-05-05 11:16:19 +08:00
Alexander Smorkalov	351589e5fb	Merge pull request #23491 from fengyuentau:patch_for_segment_anything Fixes for Segment Anything	2023-05-04 21:07:58 +03:00
Alexander Alekhin	3c76b33532	Merge pull request #22614 from zihaomu:add_std2DB_API	2023-05-01 19:37:23 +00:00
zihaomu	8be93a6de7	add scale factor to DB demo.	2023-04-30 22:03:21 +08:00
Abduragim Shtanchaev	3b1ee0549b	added test for lstm without hidden states initialization	2023-04-25 16:01:13 +03:00
Alexander Smorkalov	e3e1f704a4	Merge pull request #23528 from WanliZhong:issue23278 DNN/CUDA: make 'abcd op 1b11' broadcast eltwise operator support cuda	2023-04-24 19:31:55 +03:00
Dmitry Kurtaev	aa57833ad5	Merge pull request #23409 from dkurt:dnn_tflite_quant Import and inference INT8 quantized TFLite model #23409 ### Pull Request Readiness Checklist * Support quantized TFLite models * Enable fused activations (FP32, INT8) Merge with extra: https://github.com/opencv/opencv_extra/pull/1048 ![res](https://user-images.githubusercontent.com/25801568/231433201-566b4bd6-ccff-462c-9e74-adbdcdf3648b.png) on the image, green boxes are from TFLite and red boxes from OpenCV See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-24 13:44:10 +03:00
Abduragim Shtanchaev	e4e774d42b	Merge pull request #23475 from Abdurrahheem:lstm_fix_initialization Fix ONNX parser for single-layer LSTM hidden and cell states #23475 ### Fix ONNX parser for single-layer LSTM hidden and cell states ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake This PR addresses #21118 [issue](https://github.com/opencv/opencv/issues/21118). The problem is that the ONNX parser is unable to read the hidden state and cell state for single-layer LSTMs. This PR fixes the issue by updating the parser to correctly read hidden and cell states.	2023-04-24 13:39:41 +03:00
wanli	e4360294c5	make 'abcd op 1b11' broadcast support cuda	2023-04-23 17:46:50 +08:00
Alexander Alekhin	9ab0ff6cf2	Merge pull request #23511 from zihaomu:issue_23465	2023-04-22 04:01:26 +00:00
Zihao Mu	601778e0e6	Merge pull request #22750 from zihaomu:improve_blobFromImage DNN: Add New API blobFromImageParam #22750 The purpose of this PR: 1. Add new API `blobFromImageParam` to extend `blobFromImage` API. It can support the different data layout (NCHW or NHWC), and letter_box. 2. ~~`blobFromImage` can output `CV_16F`~~ ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-04-21 19:10:17 +03:00
zihaomu	54e1a8709d	fix the bug, disable the fast1x1 when padding is not 0.	2023-04-21 10:55:07 +08:00
Yuantao Feng	3c1fcd5deb	Merge pull request #23401 from fengyuentau:fix_cann_layer_support dnn: Support more operators in CANN backend #23401 This PR adds the support of following layers: - [x] Sub - [x] PRelu - [x] DeConv - [x] Also warn users if backend is switched back to default if some of the layers are not supported. - [ ] [Dropped] LSTM: some hacks (adding layers) were introduced which makes it even harder to build the graph for CANN backend. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-20 10:18:35 +03:00
Abduragim Shtanchaev	b3a2444bcf	Merge pull request #23501 from Abdurrahheem:additional_lstm_tests Added LSTM and GRU tests for various batch and input length sizes #23501 Added tests with various sequence length and batch sizes Test data: https://github.com/opencv/opencv_extra/pull/1057 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-20 10:11:33 +03:00
Alexander Smorkalov	aa17f881b1	Merge pull request #23482 from zihaomu:onnx_opset13_split DNN: support the split node of onnx opset >= 13	2023-04-14 11:59:57 +03:00
fengyuentau	4f99e5ab37	allow null constant_value in Pad and ignore it when loading	2023-04-14 16:50:16 +08:00
fengyuentau	88cacd35c5	support broadcast on axis > 1 for Expand	2023-04-14 15:52:27 +08:00
Alexander Smorkalov	136121f6ee	Merge pull request #22660 from zhouzq-thu:4.x Fix objectness is not assigned in dnn::region_layer	2023-04-12 09:34:58 +03:00
Alexander Smorkalov	3f02c9d5b9	Merge pull request #23310 from hanliutong:fix_hal_compatibility Fix HAL compatibility layer	2023-04-11 12:43:54 +03:00
zihaomu	51281f8d69	support the split node of onnx opset >= 13	2023-04-11 16:18:50 +08:00
Yuantao Feng	3a83a35ab0	Merge pull request #23296 from fengyuentau:fix_identifying_constant Fix identifying initializers in ONNX graph simplification #23296 Fixes https://github.com/opencv/opencv/issues/23295 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-06 15:35:31 +03:00
Dmitry Kurtaev	5e1d33329b	Several fixes for ONNX importer: Expand, Gather	2023-03-27 22:15:26 +03:00
HAN Liutong	a809ae4e88	Fix HAL compatibility layer and modify use cases.	2023-03-27 21:30:47 +08:00
Dmitry Kurtaev	5df6b4a756	Merge pull request #23325 from dkurt:dnn_input_info Propagate inputs info for ONNX and TFLite models ### Pull Request Readiness Checklist Needed for generic applications such as benchmarking pipelines. So OpenCV can tell about the default input shapes specified in the models. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-03-21 14:50:53 +03:00
Alexander Smorkalov	924a65413a	Merge pull request #23357 from zihaomu:fix_winograd_error_32bit DNN : fix bug in 32 bit cpu	2023-03-15 11:24:54 +03:00
zihaomu	6bac5453d1	fix bug in 32 bit cpu	2023-03-15 08:24:55 +08:00
Alexander Smorkalov	ccbc784195	Merge pull request #23354 from zihaomu:issue_23351 DNN : fix bug in layer fusion	2023-03-14 17:23:25 +03:00
zihaomu	386be97ce2	fix bug in layer fusion	2023-03-14 19:06:06 +08:00
tingbo.liao	7d032de7e8	Fix bugs of test case failure 4 failed tests in open_test_dnn listed below: * Test_Caffe_layers.Conv_Elu/0, where GetParam() = OCV/CPU * Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU * Test_TensorFlow_layers.tf_reshape_nhwc/0, where GetParam() = OCV/CPU * Test_Torch_layers.net_inception_block/0, where GetParam() = OCV/CPU In winofunc_AtXA_8x8_f32 and winofunc_BtXB_8x8_f32 implementation, incorrect input parameters cause tests failure. Add four new different variables for the last four input parameters of v_transpose4x4 to fix bugs, and update related comments. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2023-03-14 17:05:19 +08:00
Alexander Smorkalov	22a52766dc	Merge pull request #23343 from zihaomu:fix_test_onnx_conf DNN Test ONNX: Fix the logic of the test case	2023-03-13 21:48:41 +03:00
Yuantao Feng	b94e13c8ae	Merge pull request #23319 from fengyuentau:fix_zoo_issue_136 Related issue: https://github.com/opencv/opencv_zoo/issues/136 Features added: - Support operators with multiple output: ONNX Split. - Support Slice without steps. Bugs fixed: - Wrong settings in ClipByValue (Relu6). - Wrong calculation of pads in convolution layer (It is wrong generally but only fixed specifically for CANN for now). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-03-13 21:46:33 +03:00
zihaomu	ee3740af00	move global skip out of if loop, and add opencv_deny_list	2023-03-13 22:16:51 +08:00
Zihao Mu	e03e2e7f94	Merge pull request #23192 from zihaomu:clean_up_SIMD_code ### Purpose of this PR: - Move all dispatch and SIMD code of `convolution layer` into `simd.hpp` file. - Support Winograd at AVX-only machine. - Re-name the folder from `fast_conv` to `cpu_kernels`. In the future, we can put other layers of CPU optimization into it, like `GEMM` or `MatMul`. ## Performance Test Since this patch just focuses on the code style, the performance is expected as the same as before. Test with the following script: `./bin/opencv_perf_dnn '--gtest_filter=conv' --gtest_output="xml:../1-0th.xml" --perf_threads=1` ### Test on X86 platform Min (ms) \|Name of Test\|4.x \| patch \| 4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|0.98\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|0.95\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.001\|0.001\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.002\|0.002\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.002\|0.002\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.040\|0.044\|0.93\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.010\|0.010\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.106\|0.103\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.041\|0.040\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.340\|0.329\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.590\|0.567\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.374\|1.314\|1.05\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.715\|3.528\|1.05\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.181\|1.166\|1.01\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.689\|2.587\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.754\|4.500\|1.06\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|9.612\|9.112\|1.05\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.000\|64.676\|1.07\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|20.248\|18.451\|1.10\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|1.395\|1.392\|1.00\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|1.990\|1.984\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.393\|1.360\|1.02\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|1.813\|1.744\|1.04\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.190\|1.191\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.286\|1.284\|1.00\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.295\|2.279\|1.01\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.322\|1.331\|0.99\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|3.784\|3.533\|1.07\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.838\|1.844\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.957\|1.959\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|2.596\|2.573\|1.01\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|4.183\|4.083\|1.02\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.413\|2.406\|1.00\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|2.538\|2.546\|1.00\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.972\|2.980\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|3.452\|3.464\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|3.082\|3.105\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.043\|3.919\|1.03\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|5.538\|5.531\|1.00\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.393\|3.418\|0.99\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|4.325\|4.234\|1.02\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|6.009\|5.908\|1.02\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|6.557\|6.376\|1.03\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|10.114\|9.472\|1.07\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|10.373\|9.879\|1.05\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|12.782\|11.624\|1.10\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|90.931\|90.552\|1.00\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|6.091\|5.818\|1.05\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|7.083\|6.643\|1.07\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.054\|5.059\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|5.005\|4.931\|1.02\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|4.951\|5.065\|0.98\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|11.957\|11.293\|1.06\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.328\|5.250\|1.01\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|5.544\|5.292\|1.05\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|6.186\|5.893\|1.05\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|6.153\|5.834\|1.05\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|8.154\|8.107\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.699\|12.256\|1.04\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|11.355\|11.217\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.062\|17.814\|1.07\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|6.820\|6.531\|1.04\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|14.502\|13.483\|1.08\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|6.270\|6.123\|1.02\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|13.173\|12.451\|1.06\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|8.326\|7.652\|1.09\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.605\|16.465\|1.07\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|15.675\|14.771\|1.06\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.420\|0.423\|0.99\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|6.788\|6.491\|1.05\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|6.456\|6.168\|1.05\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.263\|0.261\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|7.690\|7.398\|1.04\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.200\|0.202\|0.99\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.542\|10.464\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|10.876\|10.728\|1.01\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|7.194\|6.768\|1.06\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|7.099\|6.731\|1.05\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.147\|0.162\|0.91\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|18.558\|17.141\|1.08\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|7.641\|7.219\|1.06\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|22.666\|20.999\|1.08\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|8.523\|7.921\|1.08\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|8.514\|8.109\|1.05\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|8.300\|7.878\|1.05\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|13.403\|13.131\|1.02\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|8.920\|8.357\|1.07\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|28.827\|27.616\|1.04\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|12.895\|12.670\|1.02\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|14.120\|13.078\|1.08\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|27.541\|27.582\|1.00\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|32.367\|31.140\|1.04\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|14.934\|14.910\|1.00\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|18.289\|18.491\|0.99\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|37.857\|36.845\|1.03\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|37.402\|36.566\|1.02\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|19.031\|19.164\|0.99\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.019\|19.135\|0.99\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.077\|19.400\|1.03\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.883\|21.302\|1.03\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|51.288\|49.851\|1.03\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|27.349\|28.359\|0.96\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|24.915\|25.130\|0.99\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|25.488\|25.899\|0.98\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|27.346\|27.390\|1.00\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|28.033\|28.301\|0.99\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|50.216\|49.970\|1.00\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|29.670\|29.513\|1.01\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|50.565\|49.634\|1.02\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|37.900\|37.814\|1.00\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|41.367\|39.742\|1.04\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|49.128\|50.350\|0.98\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|79.643\|80.645\|0.99\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|41.439\|40.895\|1.01\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|46.504\|46.220\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|98.086\|96.842\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|102.447\|97.299\|1.05\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|145.047\|144.996\|1.00\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|206.104\|195.543\|1.05\| ### Test on M1(ARM) platform \|Name of Test\|4.x\|patch\|4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|0.97\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|0.94\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.002\|0.002\|0.92\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.003\|0.003\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.003\|0.003\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.031\|0.031\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.009\|0.009\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.066\|0.066\|1.01\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.102\|0.102\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.328\|0.328\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.693\|0.747\|0.93\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.268\|1.266\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.530\|3.581\|0.99\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.186\|1.188\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.682\|2.683\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.490\|4.501\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|8.914\|8.938\|1.00\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.819\|69.876\|1.00\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|24.058\|22.420\|1.07\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|2.240\|2.236\|1.00\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|3.132\|3.136\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.920\|1.919\|1.00\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.343\|2.346\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.234\|1.116\|1.11\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.109\|1.121\|0.99\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|3.197\|3.084\|1.04\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.123\|1.148\|0.98\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|4.836\|5.061\|0.96\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.535\|1.463\|1.05\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.756\|1.584\|1.11\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|1.821\|1.820\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|7.049\|6.672\|1.06\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.967\|1.922\|1.02\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|1.943\|1.977\|0.98\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.464\|2.310\|1.07\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|2.860\|2.904\|0.98\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|2.428\|2.483\|0.98\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|2.955\|2.983\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|4.328\|4.484\|0.97\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.712\|2.778\|0.98\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|3.205\|3.331\|0.96\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|4.193\|4.412\|0.95\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|5.026\|4.565\|1.10\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|14.490\|14.213\|1.02\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|14.886\|14.003\|1.06\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|15.923\|15.184\|1.05\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|45.136\|41.696\|1.08\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.995\|4.631\|1.08\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.402\|6.261\|1.02\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|4.478\|3.965\|1.13\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.908\|3.978\|0.98\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|4.176\|4.206\|0.99\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|21.509\|21.136\|1.02\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|4.426\|4.082\|1.08\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|4.098\|4.289\|0.96\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|4.646\|5.105\|0.91\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|4.746\|4.724\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|5.614\|5.779\|0.97\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|21.909\|20.718\|1.06\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|8.256\|8.290\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|25.196\|23.267\|1.08\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|5.721\|5.172\|1.11\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|20.066\|18.322\|1.10\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|4.448\|4.542\|0.98\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.193\|19.013\|1.01\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|6.009\|5.964\|1.01\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|20.169\|20.009\|1.01\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|22.584\|23.423\|0.96\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.372\|0.504\|0.74\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|5.426\|5.456\|0.99\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|4.945\|5.221\|0.95\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.210\|0.261\|0.81\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|5.720\|5.997\|0.95\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.149\|0.161\|0.93\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|7.154\|7.225\|0.99\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|7.184\|7.223\|0.99\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|5.324\|5.343\|1.00\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|5.114\|5.238\|0.98\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.111\|0.121\|0.92\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|25.907\|26.804\|0.97\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|5.695\|5.654\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|27.435\|27.566\|1.00\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|6.944\|6.164\|1.13\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|7.180\|6.717\|1.07\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|6.817\|6.050\|1.13\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|9.225\|8.660\|1.07\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|7.496\|6.625\|1.13\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|35.520\|36.056\|0.99\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|9.990\|9.702\|1.03\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|10.517\|10.746\|0.98\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|36.702\|36.731\|1.00\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|41.035\|38.280\|1.07\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.981\|10.573\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|12.863\|12.384\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|50.437\|54.088\|0.93\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|50.650\|50.635\|1.00\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|14.696\|14.606\|1.01\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|16.201\|15.426\|1.05\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|16.061\|14.292\|1.12\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|17.743\|18.250\|0.97\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|77.909\|78.165\|1.00\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|21.579\|21.879\|0.99\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|20.424\|19.589\|1.04\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.389\|19.461\|1.00\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.319\|20.358\|1.05\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|22.609\|21.826\|1.04\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|25.497\|25.789\|0.99\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|21.966\|22.108\|0.99\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|35.883\|33.470\|1.07\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|31.041\|29.314\|1.06\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|29.922\|28.145\|1.06\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|31.624\|31.148\|1.02\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|38.564\|39.164\|0.98\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|31.502\|30.269\|1.04\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|34.248\|34.589\|0.99\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|130.211\|134.120\|0.97\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|127.490\|132.874\|0.96\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|199.834\|200.081\|1.00\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|247.346\|247.523\|1.00\| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake ``` force_builders=Linux AVX2,Custom Win build_image:Custom Win=msvs2019 CPU_BASELINE:Custom Win=AVX512_SKX ```	2023-03-10 11:59:49 +03:00
Alexander Alekhin	9eb5e39ff3	dnn(tflite): fix wrong axis normalization	2023-02-21 21:20:37 +00:00
Alexander Alekhin	bdff0949bb	dnn(tflite): add 3rdparty flatbuffers with pre-generated schema	2023-02-21 16:06:19 +00:00
Zihao Mu	20dac7ea48	Merge pull request #23255 from zihaomu:fused_cuda_naryeltwise DNN: fuse conv+naryEletwise on CUDA backend.	2023-02-17 10:18:13 +00:00
Alexander Alekhin	58d8a2702a	Merge pull request #23243 from WanliZhong:accelerate_palm_det	2023-02-14 16:25:02 +00:00
Dmitry Kurtaev	76350cd30f	Merge pull request #23161 from dkurt:dnn_tflite TFLite models importer * initial commit * Refactor TFLiteImporter * Better FlatBuffers detection * Add permute before 4D->3D reshape * Track layers layout * TFLite Convolution2DTransposeBias layer * Skip TFLite tests without FlatBuffers * Fix check of FlatBuffers in tests. Add readNetFromTFLite from buffer * TFLite Max Unpooling test * Add skip for TFLite unpooling test * Revert DW convolution workaround * Fix ObjC bindings * Better errors handling * Regenerate TFLite schema using flatc * dnn(tflite): more checks, better logging * Checks for unimplemented fusion. Fix tests	2023-02-13 14:00:20 +00:00
Yuantao Feng	c2b7c1f13b	Merge pull request #23219 from fengyuentau:add_gelu Add GELU layer for vision transformers * add gelu and gelu approximation * drop setKernelParams	2023-02-10 18:03:29 +00:00
wanli	c8f5e228fc	release MUL and ADD operator on CUDA	2023-02-10 19:33:59 +08:00
Alexander Alekhin	96a45e842e	Merge pull request #23061 from WanliZhong:gemm_cuda DNN: make GEMM can be supported with transA and transB in CUDA	2023-02-09 00:06:32 +03:00
wanli	4718a4bf81	make GEMM can be supported with transA and transB in CUDA	2023-01-31 15:14:17 +08:00
Alexander Alekhin	f33598f55e	Merge branch 4.x	2023-01-28 17:31:32 +00:00
Alexander Alekhin	cd44aa0bb1	Merge pull request #23162 from zihaomu:issue_23151	2023-01-28 13:00:43 +00:00
zihaomu	f45a12439a	fix depth wise issue.	2023-01-28 11:41:00 +08:00

1 2 3 4 5 ...

2275 Commits