opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-12 07:09:12 +08:00

Author	SHA1	Message	Date
Abduragim Shtanchaev	a3b3a589f9	Merge pull request #24322 from Abdurrahheem:ash/dev_einsum_ellips Ellipses supported added for Einsum Layer #24322 This PR added addresses issues not covered in #24037. Namely these are: Test case for this patch is in this PR [#1106](https://github.com/opencv/opencv_extra/pull/1106) in opencv extra Added: - [x] Broadcasting reduction "...ii ->...I" - [x] Add lazy shape deduction. "...ij, ...jk->...ik" Features to add: - [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik") - [ ] Add support for CUDA backend - [ ] BatchWiseMultiply optimize - [ ] Performance test ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-24 16:47:00 +03:00
andrewerf	b44cb33d2f	Merge pull request #21066 from andrewerf:21052-openvino-native-onnx Native ONNX to Inference Engine backend #21066 Resolves #21052 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV - [x] The PR is proposed to proper branch - [x] There is reference to original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-20 11:49:27 +03:00
Kumataro	6e4280ea81	Merge pull request #24372 from Kumataro:fix24369 Supporting protobuf v22 and later(with abseil-cpp/C++17) #24372 fix https://github.com/opencv/opencv/issues/24369 related https://github.com/opencv/opencv/issues/23791 1. This patch supports external protobuf v22 and later, it required abseil-cpp and c++17. Even if the built-in protobuf is upgraded to v22 or later, the dependency on abseil-cpp and the requirement for C++17 will continue. 2. Some test for caffe required patched protobuf, so this patch disable them. This patch is tested by following libraries. - Protobuf: /usr/local/lib/libprotobuf.so (4.24.4) - abseil-cpp: YES (20230125) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-19 08:45:08 +03:00
Aser Atawya	240b245105	Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX GSoC Add ONNX Support for GatherElements #24092 Merge with: https://github.com/opencv/opencv_extra/pull/1082 Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements) Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-18 10:41:47 +03:00
alexlyulkov	014e8485b5	Merge pull request #24367 from alexlyulkov:al/fixed-cumsum-inplace-flag Fixed CumSum layer inplace flag #24367 When exclusive is false: dst[i] = dst[i-1] + src[i] When exclusive is true: dst[i] = dst[i-1] + src[i-1] So CumSum layer can be inplace only when exclusive flag is false.	2023-10-18 09:21:40 +03:00
Yuantao Feng	0507043a55	Merge pull request #24386 from fengyuentau:fix_dtype_nary_eltwise dnn: fix inconsistent input dtype for nary eltwise layers #24386 Resolves https://github.com/opencv/opencv/issues/24385 Merge with https://github.com/opencv/opencv_extra/pull/1107 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1353964405 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-13 11:56:18 +03:00
Alexander Smorkalov	58285e5468	Merge pull request #24359 from asmorkalov:as/FastNeuralStyle_eccv16_tuning Tuned threshold for FastNeuralStyle_eccv16 test	2023-10-13 10:29:41 +03:00
Yuantao Feng	590f150d5e	dnn: hotfixes for fast gemm (#24315 ) * remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code	2023-10-07 21:48:44 +03:00
Sean McBride	5fb3869775	Merge pull request #23109 from seanm:misc-warnings * Fixed clang -Wnewline-eof warnings * Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings * Removed trailing semi from various macros * Fixed various -Wunused-macros warnings * Fixed some trivial -Wdocumentation warnings * Fixed some -Wdocumentation-deprecated-sync warnings * Fixed incorrect indentation * Suppressed some clang warnings in 3rd party code * Fixed QRCodeEncoder::Params documentation. --------- Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2023-10-06 13:33:21 +03:00
Dmitry Kurtaev	2c92eb3175	Enable more tests for OpenVINO 2023.0	2023-10-05 12:51:55 +03:00
Alexander Smorkalov	33d64d0491	Tuned threshold for FastNeuralStyle_eccv16 test for systems without AVX2.	2023-10-04 16:19:13 +03:00
alexlyulkov	9bd14d5417	Merge pull request #24353 from alexlyulkov:al/fixed-cumsum-layer Fixed CumSum dnn layer #24353 Fixes #20110 The algorithm had several errors, so I rewrote it. Also the layer didn't work with non constant axis tensor. Fixed it. Enabled CumSum layer tests from ONNX conformance.	2023-10-03 13:58:25 +03:00
Dmitry Kurtaev	c7ec0d599a	Merge pull request #23987 from dkurt:openvino_int8_backend OpenVINO backend for INT8 models #23987 ### Pull Request Readiness Checklist TODO: - [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069) - [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum) - [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039) - [x] Single layer tests (convolution) - [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~ Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`: \| backend \| performance (median time) \| \|---\|---\| \| OpenCV \| 77.42ms \| \| OpenVINO 2023.0 \| 10.90ms \| CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz` Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef --- See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-28 16:24:43 +03:00
Alexander Smorkalov	b8d4ac589d	Merge pull request #24334 from fengyuentau:fix_24319 dnn onnx: fix not-found constant indices for Gather if shared	2023-09-28 13:08:26 +03:00
fengyuentau	7fa0493ca0	init commit	2023-09-28 11:50:21 +08:00
Yuantao Feng	307324f4ac	Merge pull request #24283 from fengyuentau:halide_tests dnn: merge tests from test_halide_layers to test_backends #24283 Context: https://github.com/opencv/opencv/pull/24231#pullrequestreview-1628649980 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 14:09:47 +03:00
Yuantao Feng	bb171a0c05	dnn: expand refactor with cv::broadcast for onnx models (#24295 ) * add expand impl with cv::broadcast * remove expandMid * deduce shape from -1 * add constant folding * handle input constant; handle input constant 1d * add expand conformance tests; add checks to disallow shape of neg values; add early copy for unchanged total elements * fix ExpandSubgraph * dummy commit to trigger build * dummy commit to trigger build 1 * remove conformance from test names	2023-09-27 09:28:52 +03:00
Abduragim Shtanchaev	865e7cacca	Merge pull request #24037 from Abdurrahheem:ash/dev_einsum Add Support for Einsum Layer #24037 ### This PR adding support for [Einsum Layer](https://pytorch.org/docs/stable/generated/torch.einsum.html) (in progress). This PR is currently not to be merged but only reviewed. Test cases are located in [#1090](https://github.com/opencv/opencv_extra/pull/1090)RP in OpenCV extra DONE: - [x] 2-5D GMM support added - [x] Matrix transpose support added - [x] Reduction type comupte 'ij->j' - [x] 2nd shape computation - during forward Next PRs: - [ ] Broadcasting reduction "...ii ->...i" - [ ] Add lazy shape deduction. "...ij, ...jk->...ik" - [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik") - [ ] Add support for CUDA backend - [ ] BatchWiseMultiply optimize Later in 5.x version (requires support for 1D matrices): - [ ] Add 1D vector multiplication support - [ ] Inter product "i, i" (problems with 1D shapes) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-22 11:25:02 +03:00
Yuantao Feng	8a96e34e33	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix	2023-09-20 00:53:34 +03:00
Dmitry Kurtaev	c5edd20354	Higher threshold for FasterRCNN_vgg16	2023-09-14 13:11:53 +03:00
Alexander Smorkalov	5350fba319	Merge pull request #24128 from CSBVision:CSBVision-patch-1 Fix bug at blobFromImagesWithParams	2023-09-06 16:20:37 +03:00
CSBVision	674c618471	Update dnn_utils.cpp	2023-09-06 10:01:07 +03:00
Dmitry Kurtaev	178fdbbda8	Merge pull request #24196 from dkurt:ov_backend_cleanups Use ngraph::Output in OpenVINO backend wrapper #24196 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24102 * Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: `588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)` * All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`). * Correct inference if only intermediate blobs requested. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-05 18:08:28 +03:00
Wanli	84f32bbb24	increase Fast Math threshold	2023-09-05 14:03:54 +08:00
Dmitry Kurtaev	a0debc3a9a	Enable OpenVINO max pooling with indices since 2022.1	2023-08-23 10:39:38 +03:00
Dmitry Kurtaev	8ad5eb521a	Merge pull request #24120 from dkurt:actualize_dnn_links OCL_FP16 MatMul with large batch * Workaround FP16 MatMul with large batch * Fix OCL reinitialization * Higher thresholds for INT8 quantization * Try fix gemm_buffer_NT for half (columns) * Fix GEMM by rows * Add batch dimension to InnerProduct layer test * Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding * Batch 16 * Replace all vload4 * Version suffix for MobileNetSSD_deploy Caffe model	2023-08-16 15:46:11 +03:00
Dmitry Kurtaev	4b8aeb1129	Merge pull request #24039 from dkurt:tflite_test_backends TFLite models on different backends (tests and improvements) #24039 ### Pull Request Readiness Checklist * MaxUnpooling with OpenVINO * Fully connected with transposed inputs/weights with OpenVINO * Enable backends tests for TFLite (related to https://github.com/opencv/opencv/issues/23992#issuecomment-1640691722) * Increase existing tests thresholds See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-04 11:28:51 +03:00
Dmitry Kurtaev	96f23e3da1	Merge pull request #24080 from dkurt:dnn_cuda_layers Resolve uncovered CUDA dnn layer #24080 ### Pull Request Readiness Checklist * Gelu activation layer on CUDA * Try to relax GEMM from ONNX resolves https://github.com/opencv/opencv/issues/24064 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-03 09:13:42 +03:00
Dmitry Kurtaev	0245c0cd10	Merge pull request #24072 from dkurt:openvino_cpu_tests Remove legacy nGraph logic #24072 ### Pull Request Readiness Checklist TODO: - [x] Test with OpenVINO 2021.4 (tested locally) See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-02 14:39:11 +03:00
Dmitry Kurtaev	195aad8e6a	Merge pull request #24069 from dkurt:openvino_detection_layer DetectionOutput layer on OpenVINO without limitations #24069 ### Pull Request Readiness Checklist required for https://github.com/opencv/opencv/pull/23987 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-02 14:28:47 +03:00
Dmitry Kurtaev	677a28fd2a	Merge pull request #24056 from dkurt:eltwise_prelu PReLU with element-wise scales #24056 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24051 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-07-27 16:36:40 +03:00
Alexander Smorkalov	d96ff496b4	Increase eps for Test_Torch_nets.FastNeuralStyle_accuracy to prevent sporadic test failres with CUDA.	2023-07-21 13:51:03 +03:00
Dmitry Kurtaev	e41ba90f17	Merge pull request #24004 from dkurt:tflite_new_layers [TFLite] Pack layer and other fixes for SSD from Keras #24004 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/23992 Merge with extra: https://github.com/opencv/opencv_extra/pull/1076 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-07-21 09:13:37 +03:00
Dmitry Kurtaev	6909fffde1	Consider half pixel mode in ONNX resize	2023-06-14 14:21:28 +03:00
Abduragim Shtanchaev	6b53fe8f7b	Merge pull request #23746 from Abdurrahheem:ash/graph_simplifier Assertion Fix in Split Layer #23746 ### Pull Request Readiness Checklist This PR fixes issue mentioned in [#23663](https://github.com/opencv/opencv/issues/23663) Merge with https://github.com/opencv/opencv_extra/pull/1067 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-06-07 16:01:42 +03:00
Yuantao Feng	f07b01cc34	Merge pull request #23655 from fengyuentau:qlinearsoftmax Support ONNX operator QLinearSoftmax in dnn #23655 Resolves https://github.com/opencv/opencv/issues/23636. Merge with https://github.com/opencv/opencv_extra/pull/1064. This PR maps the QLinearSoftmax (from com.microsoft domain) to SoftmaxInt8 in dnn along with some speed optimization. Todo: - [x] support QLinearSoftmax with opset = 13 - [x] add model and test data for QLinearSoftmax with opset = 13 - [x] ensure all models have dims >= 3. - [x] add the script to generate model and test data ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-25 13:35:58 +03:00
Alexander Smorkalov	b122a4b436	Merge pull request #23646 from dkurt:dnn_ie_region_fix Fix Region layer with OpenVINO in case of different width/height	2023-05-22 16:22:50 +03:00
Zihao Mu	5025f29378	speed up vulkan dnn, and support ios and apple m1 chip. (#23349 )	2023-05-18 20:02:27 +03:00
Dmitry Kurtaev	af14780526	Fix Region layer with OpenVINO in case of different width/height	2023-05-18 17:45:30 +03:00
Abduragim Shtanchaev	d2143bcd44	Merge pull request #23614 from Abdurrahheem:lstm_layout_attribute LSTM ONNX Layout Attribute Support #23614 ### Explanation This PR contains necessary changes to support `layout` attribute. This attributes is present in [ONNX](https://github.com/onnx/onnx/blob/main/docs/Operators.md#lstm) and [Torch](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#lstm) (in touch it is name as `batch_first=True`) libraries. When `layout = 1` input to LSTM layer is expected to have batch dimension first -> `[batch_size, sequence_length, features]` vs `layout = 0` - default `[sequence_length, batch_size, features]` ### Test Data Test data and data generator for PR located here [#1063](https://github.com/opencv/opencv_extra/pull/1063) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-17 22:46:56 +03:00
Zihao Mu	5229312ad2	Merge pull request #22275 from zihaomu:fp16_support_conv DNN: FP16 support on Convolution 2D #22275 ## FP16 support on ARM platform This PR proposes to support FP16 backend in Convolution. For now, we only support FP16 at ARM aarch64. In addition to adding fp16, I also added `seperateIm2col` optimization in this patch. ## How to use FP16 to speed up convolution? ``` Net net = readNet(modelPath); net.setPreferableTarget(DNN_TARGET_CPU_FP16); net.setInput(blob); Mat output = net.forward(); ``` ### TODO List \| Task \| Status \| Remarks \| \|:-------:\|:--------:\|:------------:\| \| Convolution 2D FP16 \| ✔️ \| Done \| \| Winograd FP16 \| Because the current modification has reached 2k lines, winograd fp16 will be completed in the next PR. \| \| \| Accuracy Test \| ✔️ \| Done \| \| Performance Test \| ✔️ \| Done \| \| Compiler bug \| ✔️ \| Done \| ### Speed Test for FP 16. Test on M1 chip, 4 threads. \| Model Name \| FP32 (Conv+Wino) \| Conv(FP16) + Wino(FP 32) \| \|:-------:\|:--------:\|:------------:\| \| ReseNet 50 \| 26.0 ms \| 18.05 ms (25% speed up)\| \| MobileNet V2 \| 4.17 ms \| 3.09 ms (29% speed up) \| ### Speed Test for `seperateIm2col` trick on X86. Test on AMD 5600x, 12 threads. \| Model Name \| 4.x \| Patch \| \|:-------:\|:--------:\|:------------:\| \| MobileNet V2 \| 5.6 ms \| 3.0 ms (46% speed up) \| ### Performance Test #### Performance Test of X86 platform: AMD 5600X, with `-perf_threas=1` \|Name of Test\|4.x\|patch\|patch vs 4.x (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|Name of Test\|4.x 0\|fp16pr final\|fp16pr final vs 4.x 0 (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.00\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.03\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.001\|0.001\|0.92\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.002\|0.003\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.006\|0.006\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.045\|0.033\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.011\|0.009\|1.17\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.109\|0.078\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.040\|0.042\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.326\|0.342\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.580\|0.589\|0.99\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.293\|1.382\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.590\|3.710\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.120\|1.191\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.576\|2.872\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.599\|4.670\|0.98\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|9.230\|9.582\|0.96\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|65.946\|69.381\|0.95\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|18.915\|19.289\|0.98\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|1.404\|1.457\|0.96\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|2.060\|1.501\|1.37\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.409\|1.464\|0.96\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|1.793\|1.838\|0.98\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.207\|1.199\|1.01\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.277\|1.275\|1.00\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.319\|2.370\|0.98\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.351\|1.346\|1.00\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|3.520\|3.612\|0.97\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.876\|1.880\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.981\|1.995\|0.99\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|2.620\|2.627\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|4.202\|4.123\|1.02\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.429\|2.445\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|2.591\|2.576\|1.01\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|3.005\|2.998\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|3.515\|3.532\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|3.115\|3.134\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.937\|3.899\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|5.533\|5.471\|1.01\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.472\|3.464\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|4.302\|4.322\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|6.100\|6.035\|1.01\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|6.580\|6.484\|1.01\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|9.741\|9.634\|1.01\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|10.131\|10.156\|1.00\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|12.391\|12.350\|1.00\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|91.074\|87.893\|1.04\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|5.903\|5.903\|1.00\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.890\|6.794\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.160\|5.131\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|4.970\|5.036\|0.99\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|5.045\|5.015\|1.01\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|11.583\|11.343\|1.02\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.348\|5.320\|1.01\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|5.357\|5.396\|0.99\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|6.050\|6.006\|1.01\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|5.952\|5.953\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|8.014\|8.014\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.472\|12.577\|0.99\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|10.803\|10.655\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|18.429\|13.405\|1.37\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|6.659\|6.647\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|14.192\|13.819\|1.03\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|6.045\|6.068\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.742\|12.828\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|8.046\|7.773\|1.04\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.440\|17.192\|1.01\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|15.418\|14.972\|1.03\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.430\|0.430\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|6.692\|6.663\|1.00\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|6.350\|6.347\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.267\|0.265\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|7.755\|7.558\|1.03\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.203\|0.202\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.663\|10.576\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|10.827\|10.614\|1.02\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|7.049\|6.947\|1.01\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|6.900\|6.901\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.165\|0.165\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|17.953\|17.251\|1.04\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|7.430\|7.320\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|22.187\|21.705\|1.02\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|8.349\|8.126\|1.03\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|8.273\|8.297\|1.00\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|8.169\|8.094\|1.01\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|13.602\|13.359\|1.02\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|8.633\|8.584\|1.01\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|29.339\|28.897\|1.02\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|13.000\|12.920\|1.01\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|14.262\|13.319\|1.07\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|27.453\|27.253\|1.01\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|32.052\|27.269\|1.18\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|15.363\|15.208\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|18.543\|18.434\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|39.114\|37.954\|1.03\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|36.271\|36.972\|0.98\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|19.262\|19.427\|0.99\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.298\|19.349\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.261\|19.847\|1.02\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.867\|21.525\|1.02\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|51.756\|49.979\|1.04\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|28.133\|27.060\|1.04\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|25.035\|24.980\|1.00\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|25.858\|25.821\|1.00\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|27.313\|27.149\|1.01\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|28.219\|28.111\|1.00\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|46.025\|46.674\|0.99\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|30.220\|29.446\|1.03\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|49.410\|48.708\|1.01\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|38.203\|38.001\|1.01\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|39.961\|39.021\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|48.685\|47.075\|1.03\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|75.114\|72.586\|1.03\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|41.222\|41.144\|1.00\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|46.220\|46.353\|1.00\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|98.201\|98.771\|0.99\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|100.106\|96.971\|1.03\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|146.977\|140.445\|1.05\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|198.618\|194.665\|1.02\| #### Performance Test of ARM platform: apple M1, with `-perf_threas=1` Min (ms) \|Name of Test\|4.x\|patch\|4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.07\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.10\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.002\|0.002\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.003\|0.003\|0.84\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.009\|0.009\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.027\|0.030\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.008\|0.007\|1.07\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.066\|0.072\|0.91\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.090\|0.054\|1.68\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.328\|0.409\|0.80\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.659\|0.697\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.266\|1.403\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.550\|4.145\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.188\|1.375\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.683\|3.236\|0.83\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.491\|5.501\|0.82\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|8.916\|10.181\|0.88\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.995\|72.296\|0.97\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|22.531\|23.139\|0.97\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|2.239\|1.933\|1.16\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU_FP16)\|-\|1.010\|-\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|3.134\|2.068\|1.52\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU_FP16)\|-\|1.062\|-\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.918\|1.920\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU_FP16)\|-\|1.014\|-\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.340\|2.352\|0.99\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.247\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.116\|1.111\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU_FP16)\|-\|1.114\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.116\|1.112\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|1.113\|-\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|3.067\|3.085\|0.99\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.622\|-\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.153\|1.187\|0.97\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU_FP16)\|-\|1.150\|-\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|4.804\|4.849\|0.99\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU_FP16)\|-\|2.922\|-\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.463\|1.469\|1.00\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.459\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.577\|1.580\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|1.580\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|1.826\|1.818\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|1.817\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|6.541\|5.081\|1.29\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|2.809\|-\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.912\|1.919\|1.00\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.919\|-\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|1.961\|1.971\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|1.961\|-\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.317\|2.329\|0.99\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.322\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|2.920\|2.947\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|2.924\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|2.467\|2.466\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|2.496\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.028\|2.997\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|2.986\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|4.353\|4.355\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|4.355\|-\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.762\|2.793\|0.99\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.797\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|3.428\|3.226\|1.06\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU_FP16)\|-\|3.223\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|3.967\|3.957\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU_FP16)\|-\|3.960\|-\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|4.806\|4.387\|1.10\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|4.366\|-\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|14.509\|11.756\|1.23\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|6.510\|-\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|13.718\|13.287\|1.03\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.190\|-\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|15.133\|14.853\|1.02\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|8.671\|-\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|41.928\|43.328\|0.97\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|38.072\|-\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.409\|4.428\|1.00\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.427\|-\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.144\|5.363\|1.15\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU_FP16)\|-\|5.368\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.926\|3.932\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.938\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.920\|3.915\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.950\|-\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|3.767\|3.764\|1.00\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|3.762\|-\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|19.959\|13.875\|1.44\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|7.781\|-\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.951\|3.955\|1.00\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.969\|-\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|4.050\|4.034\|1.00\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.093\|-\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|4.923\|4.506\|1.09\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.509\|-\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|4.759\|4.476\|1.06\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.447\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|6.079\|5.628\|1.08\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|5.625\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.843\|17.523\|1.13\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.917\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|8.334\|8.247\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU_FP16)\|-\|8.246\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|23.164\|18.199\|1.27\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.305\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|5.184\|5.178\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|5.149\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.990\|18.103\|0.99\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.777\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|4.831\|4.522\|1.07\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.523\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.328\|17.319\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.948\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|5.944\|5.961\|1.00\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|5.936\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.811\|20.064\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|11.705\|-\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|22.398\|17.686\|1.27\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU_FP16)\|-\|9.859\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.416\|0.416\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.417\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|5.356\|5.110\|1.05\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|5.114\|-\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|5.092\|4.748\|1.07\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.754\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.260\|0.229\|1.13\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.229\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|5.872\|5.460\|1.08\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|5.460\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.161\|0.161\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.161\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|7.176\|7.175\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.162\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|7.174\|7.185\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|7.157\|-\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|5.400\|5.180\|1.04\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.201\|-\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|5.330\|5.188\|1.03\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.177\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.115\|0.115\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.115\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|26.156\|20.222\|1.29\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU_FP16)\|-\|11.203\|-\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|5.627\|5.543\|1.02\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.506\|-\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|27.925\|27.741\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|17.217\|-\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|6.359\|6.062\|1.05\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.048\|-\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|6.559\|6.322\|1.04\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|6.280\|-\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|6.412\|6.200\|1.03\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.197\|-\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|9.167\|8.624\|1.06\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU_FP16)\|-\|8.626\|-\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|6.755\|6.491\|1.04\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.520\|-\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|35.664\|34.752\|1.03\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|20.260\|-\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|9.514\|9.414\|1.01\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.462\|-\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|10.631\|9.963\|1.07\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.935\|-\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|37.465\|36.798\|1.02\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|19.569\|-\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|38.157\|36.157\|1.06\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU_FP16)\|-\|18.902\|-\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.356\|10.401\|1.00\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|10.360\|-\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|12.641\|12.150\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|12.162\|-\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|50.545\|50.505\|1.00\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|27.950\|-\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|54.233\|49.603\|1.09\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|26.515\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|13.779\|12.968\|1.06\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|12.984\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|15.809\|15.329\|1.03\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|15.433\|-\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|14.563\|14.527\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|14.480\|-\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|16.714\|16.484\|1.01\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|16.362\|-\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|77.832\|65.729\|1.18\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|32.065\|-\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|21.903\|20.386\|1.07\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|20.416\|-\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|20.405\|18.148\|1.12\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|18.128\|-\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.334\|18.521\|1.10\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|18.495\|-\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.527\|19.584\|1.10\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|19.630\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|22.715\|20.057\|1.13\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|20.068\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|26.228\|24.992\|1.05\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|24.957\|-\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|21.524\|21.581\|1.00\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|21.782\|-\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|34.094\|31.964\|1.07\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|31.925\|-\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|28.677\|27.813\|1.03\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.808\|-\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|31.274\|27.892\|1.12\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.910\|-\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|30.533\|30.007\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|30.089\|-\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|39.837\|38.312\|1.04\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|38.477\|-\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|32.480\|29.237\|1.11\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|29.452\|-\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|33.544\|32.832\|1.02\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|32.784\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|134.481\|130.678\|1.03\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU_FP16)\|-\|70.134\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|127.930\|126.530\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|65.261\|-\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|201.346\|187.007\|1.08\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|91.525\|-\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|252.038\|245.587\|1.03\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU_FP16)\|-\|125.477\|-\| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-05-17 09:38:33 +03:00
wanli	46991bcd62	Solve the bug of same shape broadcast with CUDA	2023-05-15 13:55:38 +08:00
Alexander Smorkalov	85b04f0b4d	Merge pull request #23557 from WanliZhong:eltwise_cpu_bug fix nary elementwise bug in cpu	2023-05-11 15:56:46 +03:00
wanli	85cc4086c8	fix nary elementwise bug in cpu	2023-05-10 14:29:33 +08:00
Alexander Smorkalov	25c28c5da4	Merge pull request #23485 from zihaomu:add_onnx_where DNN: add ONNX where node support	2023-05-05 09:21:07 +03:00
zihaomu	0513741a85	add broadcast where node	2023-05-05 11:16:19 +08:00
Alexander Alekhin	3c76b33532	Merge pull request #22614 from zihaomu:add_std2DB_API	2023-05-01 19:37:23 +00:00
zihaomu	8be93a6de7	add scale factor to DB demo.	2023-04-30 22:03:21 +08:00
Abduragim Shtanchaev	3b1ee0549b	added test for lstm without hidden states initialization	2023-04-25 16:01:13 +03:00
Dmitry Kurtaev	aa57833ad5	Merge pull request #23409 from dkurt:dnn_tflite_quant Import and inference INT8 quantized TFLite model #23409 ### Pull Request Readiness Checklist * Support quantized TFLite models * Enable fused activations (FP32, INT8) Merge with extra: https://github.com/opencv/opencv_extra/pull/1048 ![res](https://user-images.githubusercontent.com/25801568/231433201-566b4bd6-ccff-462c-9e74-adbdcdf3648b.png) on the image, green boxes are from TFLite and red boxes from OpenCV See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-24 13:44:10 +03:00
Alexander Alekhin	9ab0ff6cf2	Merge pull request #23511 from zihaomu:issue_23465	2023-04-22 04:01:26 +00:00
Zihao Mu	601778e0e6	Merge pull request #22750 from zihaomu:improve_blobFromImage DNN: Add New API blobFromImageParam #22750 The purpose of this PR: 1. Add new API `blobFromImageParam` to extend `blobFromImage` API. It can support the different data layout (NCHW or NHWC), and letter_box. 2. ~~`blobFromImage` can output `CV_16F`~~ ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-04-21 19:10:17 +03:00
zihaomu	54e1a8709d	fix the bug, disable the fast1x1 when padding is not 0.	2023-04-21 10:55:07 +08:00
Abduragim Shtanchaev	b3a2444bcf	Merge pull request #23501 from Abdurrahheem:additional_lstm_tests Added LSTM and GRU tests for various batch and input length sizes #23501 Added tests with various sequence length and batch sizes Test data: https://github.com/opencv/opencv_extra/pull/1057 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-20 10:11:33 +03:00
zihaomu	51281f8d69	support the split node of onnx opset >= 13	2023-04-11 16:18:50 +08:00
Dmitry Kurtaev	5e1d33329b	Several fixes for ONNX importer: Expand, Gather	2023-03-27 22:15:26 +03:00
Dmitry Kurtaev	5df6b4a756	Merge pull request #23325 from dkurt:dnn_input_info Propagate inputs info for ONNX and TFLite models ### Pull Request Readiness Checklist Needed for generic applications such as benchmarking pipelines. So OpenCV can tell about the default input shapes specified in the models. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-03-21 14:50:53 +03:00
zihaomu	ee3740af00	move global skip out of if loop, and add opencv_deny_list	2023-03-13 22:16:51 +08:00
Alexander Alekhin	bdff0949bb	dnn(tflite): add 3rdparty flatbuffers with pre-generated schema	2023-02-21 16:06:19 +00:00
Dmitry Kurtaev	76350cd30f	Merge pull request #23161 from dkurt:dnn_tflite TFLite models importer * initial commit * Refactor TFLiteImporter * Better FlatBuffers detection * Add permute before 4D->3D reshape * Track layers layout * TFLite Convolution2DTransposeBias layer * Skip TFLite tests without FlatBuffers * Fix check of FlatBuffers in tests. Add readNetFromTFLite from buffer * TFLite Max Unpooling test * Add skip for TFLite unpooling test * Revert DW convolution workaround * Fix ObjC bindings * Better errors handling * Regenerate TFLite schema using flatc * dnn(tflite): more checks, better logging * Checks for unimplemented fusion. Fix tests	2023-02-13 14:00:20 +00:00
Yuantao Feng	c2b7c1f13b	Merge pull request #23219 from fengyuentau:add_gelu Add GELU layer for vision transformers * add gelu and gelu approximation * drop setKernelParams	2023-02-10 18:03:29 +00:00
Alexander Alekhin	96a45e842e	Merge pull request #23061 from WanliZhong:gemm_cuda DNN: make GEMM can be supported with transA and transB in CUDA	2023-02-09 00:06:32 +03:00
wanli	4718a4bf81	make GEMM can be supported with transA and transB in CUDA	2023-01-31 15:14:17 +08:00
Alexander Alekhin	cd44aa0bb1	Merge pull request #23162 from zihaomu:issue_23151	2023-01-28 13:00:43 +00:00
zihaomu	f45a12439a	fix depth wise issue.	2023-01-28 11:41:00 +08:00
Yuantao Feng	4d918ba40b	Merge pull request #23047 from fengyuentau:layer_norm dnn: add layer normalization for vision transformers * add layer norm onnx parser, impl and tests * add onnx graph simplifier for layer norm expanded * handle the case when constants are of type Initializer * add test case for layer norm expanded with initializers * use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16 * use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull * template hasBias * remove trailing whitespace * use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt * refactor implementation, optimize parallel_for * disable layer norm expanded * remove the removal of layer norm optional outputs	2023-01-27 16:35:59 +03:00
Alexander Alekhin	18cbfa4a4f	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2023-01-23 00:11:12 +00:00
zihaomu	840b1d5c94	add depthwise add fuse	2023-01-11 08:42:51 +08:00
Dmitry Kurtaev	8681686d8f	Merge pull request #22957 from dkurt:new_openvino_api Switch to new OpenVINO API after 2022.1 release * Pass Layer_Test_Convolution_DLDT.Accuracy/0 test * Pass test Test_Caffe_layers.Softmax * Failed 136 tests * Fix Concat. Failed 120 tests * Custom nGraph ops. 19 failed tests * Set and get properties from Core * Read model from buffer * Change MaxPooling layer output names. Restore reshape * Cosmetic changes * Cosmetic changes * Override getOutputsInfo * Fixes for OpenVINO < 2022.1 * Async inference for 2021.4 and less * Compile model with config * Fix serialize for 2022.1 * Asynchronous inference with 2022.1 * Handle 1d outputs * Work with model with dynamic output shape * Fixes with 1d output for old API * Control outputs by nGraph function for all OpenVINO versions * Refer inputs in PrePostProcessor by indices * Fix cycled dependency between InfEngineNgraphNode and InfEngineNgraphNet. Add InferRequest callback only for async inference. Do not capture InferRequest object. * Fix tests thresholds * Fix HETERO:GPU,CPU plugin issues with unsupported layer	2022-12-23 16:58:41 +00:00
Alexander Smorkalov	9012e6dd9b	Merge pull request #22965 from vrabaud:numpy_fix Remove references to deprecated NumPy type aliases.	2022-12-23 15:34:02 +03:00
Alexander Smorkalov	4930516652	Merge pull request #22898 from fengyuentau:slice_neg_steps dnn: support ONNX Slice with negative steps by adding and using cv::flipND	2022-12-23 14:15:06 +03:00
Vincent Rabaud	ad568edd7f	Remove references to deprecated NumPy type aliases. This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str). Those types were deprecated in 1.20 and are removed in 1.24, cf https://github.com/numpy/numpy/pull/22607.	2022-12-23 13:53:49 +03:00
Alexander Alekhin	1f41d06f9a	Merge pull request #23008 from mshabunin:fix-yolov4-tiny-hash	2022-12-23 10:14:25 +00:00
fengyuentau	34a0897f90	add cv::flipND; support onnx slice with negative steps via cv::flipND	2022-12-23 16:39:53 +08:00
Maksim Shabunin	d35fbe6bfc	dnn: updated YOLOv4-tiny model and tests	2022-12-22 15:49:21 +03:00
Alexander Alekhin	6b4f3e5fab	Merge pull request #22993 from alalek:fixup_21738	2022-12-21 19:50:51 +00:00
Yuantao Feng	a2b3acfc6e	dnn: add the CANN backend (#22634 ) * cann backend impl v1 * cann backend impl v2: use opencv parsers to build models for cann * adjust fc according to the new transA and transB * put cann net in cann backend node and reuse forwardLayer * use fork() to create a child process and compile cann model * remove legacy code * remove debug code * fall bcak to CPU backend if there is one layer not supoorted by CANN backend * fix netInput forward	2022-12-21 09:04:41 +03:00
Alexander Alekhin	cdbb893b27	dnn: disable OpenCL code path in MatMul processing - this mode is not supported by 22828	2022-12-20 09:46:48 +00:00
Alexander Alekhin	1102b7eff8	dnn: fix gather layer implementation - support FP16 data	2022-12-20 06:09:34 +00:00
zoom	4891818114	make MatMul support 3D or 4D with broadcast	2022-12-15 10:36:08 +08:00
Alexander Alekhin	8ba44e7d55	Merge pull request #22882 from zihaomu:gemm_first_const	2022-12-08 14:18:33 +00:00
Zihao Mu	0a650b573b	Merge pull request #22840 from zihaomu:optimze_conv_memory_usage DNN: reduce the memory used in convolution layer * reduce the memory in winograd and disabel the test when usage memory is larger than 2gb. * remove VERY_LOG tag	2022-12-08 12:57:13 +00:00
Alexander Alekhin	b16f76eede	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-12-03 12:39:41 +00:00
Alexander Alekhin	d16b3b2487	dnn(test): restore openvino tests with 'Cannot get memory' message	2022-12-03 01:34:48 +00:00
Alexander Smorkalov	e14ca39fd7	Merge pull request #22857 from fengyuentau:batched_nms dnn: add batched nms	2022-11-30 12:37:49 +03:00
Alexander Smorkalov	421ba8730a	Merge pull request #22809 from fengyuentau:tile dnn: support ONNX Tile	2022-11-29 14:42:28 +03:00
zihaomu	0d56524b72	gemm support transA and transB, and first input is constance.	2022-11-29 17:13:36 +08:00
fengyuentau	9fded9ca53	batched nms impl	2022-11-29 15:32:34 +08:00
fengyuentau	441624a5fb	tile impl	2022-11-29 11:15:38 +08:00
zoom	5044af69d1	let MatMul can work when both two inputs are const	2022-11-27 17:32:41 +08:00
zoom	ef2677b0a6	Make MatMul layer support 3d or 4d operation with const input	2022-11-10 11:41:44 +08:00
Zihao Mu	903bf0147e	Merge pull request #22666 from zihaomu:support_onnx_qdq_model DNN: let Quant and Dequant of ONNX_importer support the Constant input. * let Quant and Dequant support the Constant input. * fix negative value of axis.	2022-10-31 16:06:31 +00:00
Alexander Smorkalov	22f8fb4d5c	Do not fail tests in Yolo v7 model was not found.	2022-10-24 17:59:18 +03:00
Dmitry Kurtaev	35b2cff295	Merge pull request #22656 from dkurt:halide_fixes * Fixes for Halide * Enable some Halide tests	2022-10-21 17:49:49 +03:00
Alexander Smorkalov	5d292826b2	Merge pull request #22593 from zihaomu:optimize_wino optimize winograd futher more	2022-10-19 13:08:32 +03:00
Alexander Smorkalov	f378f02954	Merge pull request #22652 from rogday:cuda_test_fixes Address CUDA-related errors	2022-10-19 09:37:12 +03:00
Smirnov Egor	dd14cf6a9c	address CUDA-related errors and enable cuda in elementwise ops	2022-10-18 16:54:42 +03:00
Alexander Smorkalov	ec7fc5adca	Merge pull request #22529 from fengyuentau:scatter_scatternd DNN: supports Scatter and ScatterND from ONNX	2022-10-17 14:57:46 +03:00
fengyuentau	d24d8f2abe	implementation of scatter and scatternd with conformance tests enabled	2022-10-17 11:30:32 +08:00
zoom	d816442e4d	Make Unsqueeze layer support negative axes.	2022-10-14 18:00:19 +08:00

1 2 3 4 5 ...

1086 Commits