opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-15 09:49:13 +08:00

Author	SHA1	Message	Date
Alexander Smorkalov	23f27d8dbe	Use OpenCV logging instead of std::cerr.	2023-07-19 10:49:54 +03:00
Zihao Mu	1920993525	Merge pull request #23952 from zihaomu:fix_depth_conv_5x5 DNN: optimize the speed of general Depth-wise #23952 Try to solve the issue: https://github.com/opencv/opencv/issues/23941 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-07-14 17:34:39 +03:00
Alexander Smorkalov	bf06bc92aa	Merge branch '3.4' into merge-3.4	2023-06-23 20:12:58 +03:00
Yuantao Feng	aff420329c	Merge pull request #23853 from fengyuentau:disable_fp16_warning dnn: disable warning when loading a fp16 model #23853 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-06-23 19:52:04 +03:00
Alexander Smorkalov	d9a5603fa3	Merge pull request #23860 from fengyuentau:fix_overflow_sigmoid_v3.4 dnn: fix overflow in sigmoid layer for 3.4	2023-06-23 19:47:42 +03:00
fengyuentau	29388f80a5	fix overflow	2023-06-23 21:22:21 +08:00
Alexander Smorkalov	51702ffd92	pre: OpenCV 4.8.0 (version++)	2023-06-20 15:52:57 +03:00
Dmitry Kurtaev	433c364456	Merge pull request #23724 from dkurt:java_without_ant Build Java without ANT #23724 ### Pull Request Readiness Checklist Enables a path of building Java bindings without ANT * Able to build OpenCV JAR and Docs without ANT ``` -- Java: -- ant: NO -- JNI: /usr/lib/jvm/default-java/include /usr/lib/jvm/default-java/include/linux /usr/lib/jvm/default-java/include -- Java wrappers: YES -- Java tests: NO ``` * Possible to build OpenCV JAR without ANT but tests still require ANT Merge with: https://github.com/opencv/opencv_contrib/pull/3502 Notes: - Use `OPENCV_JAVA_IGNORE_ANT=1` to force "Java" flow for building Java bindings - Java tests still require Apache ANT - JAR doesn't include `.java` source code files. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-06-16 19:58:20 +03:00
Alexander Smorkalov	3c0b71bcec	Merge pull request #23795 from dkurt:tf_half_pixel_for_nn Consider half pixel mode in ONNX resize	2023-06-16 10:21:20 +03:00
Dmitry Kurtaev	924c01dbec	Replace CV_Assert_N	2023-06-15 17:30:33 +03:00
Wang Kai	fc2d933224	removing unreachable code and fixing a typo	2023-06-15 01:09:02 +08:00
Dmitry Kurtaev	6909fffde1	Consider half pixel mode in ONNX resize	2023-06-14 14:21:28 +03:00
Dmitry Kurtaev	f9d7f47e28	Change Scalar assignment in Python from single value	2023-06-13 10:45:03 +03:00
Wang Kai	4622f1e89b	fixing typo of a variable name in dnn::runFastConv	2023-06-11 01:54:03 +08:00
Zihao Mu	eec8a20c33	Merge pull request #23763 from zihaomu:add_runtime_check DNN: fix bug for X86 Winograd #23763 Address https://github.com/opencv/opencv/issues/23760 The patch aims to add a runtime check for X86 platform without AVX(2). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-06-09 09:18:12 +03:00
Alexander Smorkalov	6d2cbc4055	Merge pull request #23761 from LaurentBerger:typeblobfromimages checktype in blobFromImages and blobFromImagesWithParams	2023-06-08 09:59:01 +03:00
unknown	5f8e43da85	checktype in blobFromImages and blobFromImagesWithParams	2023-06-07 16:15:58 +02:00
Abduragim Shtanchaev	6b53fe8f7b	Merge pull request #23746 from Abdurrahheem:ash/graph_simplifier Assertion Fix in Split Layer #23746 ### Pull Request Readiness Checklist This PR fixes issue mentioned in [#23663](https://github.com/opencv/opencv/issues/23663) Merge with https://github.com/opencv/opencv_extra/pull/1067 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-06-07 16:01:42 +03:00
Olivier Hotel	0442c6fa81	Addition of normalize_axis to ONNXImporter::parseSqueeze to support negative values for the axes attribut. Negative values are part of the ONNX optset>=11. Signed-off-by: Olivier Hotel <olivier.hotel@orange.com>	2023-05-30 10:21:27 +02:00
Abduragim Shtanchaev	ecd2e8ff47	added index that check all inputs of nodes that match	2023-05-29 14:48:42 +03:00
Alexander Smorkalov	cf0ba039c3	Merge pull request #23625 from zihaomu:improve_conv DNN: Remove unnecessary flags for convolution	2023-05-26 12:59:36 +03:00
Alexander Smorkalov	26a7b332cb	Merge pull request #23671 from zihaomu:fix_potential_bug DNN: fix potential bug, stride should not be set as 0.	2023-05-25 13:36:37 +03:00
Yuantao Feng	f07b01cc34	Merge pull request #23655 from fengyuentau:qlinearsoftmax Support ONNX operator QLinearSoftmax in dnn #23655 Resolves https://github.com/opencv/opencv/issues/23636. Merge with https://github.com/opencv/opencv_extra/pull/1064. This PR maps the QLinearSoftmax (from com.microsoft domain) to SoftmaxInt8 in dnn along with some speed optimization. Todo: - [x] support QLinearSoftmax with opset = 13 - [x] add model and test data for QLinearSoftmax with opset = 13 - [x] ensure all models have dims >= 3. - [x] add the script to generate model and test data ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-25 13:35:58 +03:00
zihaomu	4384e77bd1	when stride ==0, it should be bug	2023-05-24 21:57:59 +08:00
Alexander Smorkalov	4a559bc2ab	Merge pull request #23656 from peters:patch-2 Build fix for AVX 256	2023-05-23 09:20:34 +03:00
Alexander Smorkalov	b122a4b436	Merge pull request #23646 from dkurt:dnn_ie_region_fix Fix Region layer with OpenVINO in case of different width/height	2023-05-22 16:22:50 +03:00
Peter Rekdal Khan-Sunde	04970490ec	Build fix /build/build_cuda/3p/opencv/linux-x64/ubuntu22.04/Debug/modules/dnn/src/layers/cpu_kernels/convolution.cpp: In function 'void cv::dnn::packData8(char&, float&, int&, int&, int&, const int, int, int, int)': /build/build_cuda/3p/opencv/linux-x64/ubuntu22.04/Debug/modules/dnn/src/layers/cpu_kernels/convolution.cpp:448:43: error: 'CONV_NR' was not declared in this scope; did you mean 'CONV_3D'? 448 \| vx_store(inpbufC_FP32 + kCONV_NR, vx_load(inptrInC + k1)); \| ^~~~~~~ \| CONV_3D	2023-05-22 11:25:04 +02:00
Alexander Smorkalov	f2311d1bfd	Merge pull request #23645 from Abdurrahheem:ash/tf_init_input_check Add assert to check if layer input size is not empty	2023-05-19 13:28:24 +03:00
Zihao Mu	5025f29378	speed up vulkan dnn, and support ios and apple m1 chip. (#23349 )	2023-05-18 20:02:27 +03:00
Dmitry Kurtaev	af14780526	Fix Region layer with OpenVINO in case of different width/height	2023-05-18 17:45:30 +03:00
Abduragim Shtanchaev	2b9d2c726a	add assert to check if layer input size is not empty	2023-05-18 16:17:57 +03:00
Abduragim Shtanchaev	d2143bcd44	Merge pull request #23614 from Abdurrahheem:lstm_layout_attribute LSTM ONNX Layout Attribute Support #23614 ### Explanation This PR contains necessary changes to support `layout` attribute. This attributes is present in [ONNX](https://github.com/onnx/onnx/blob/main/docs/Operators.md#lstm) and [Torch](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#lstm) (in touch it is name as `batch_first=True`) libraries. When `layout = 1` input to LSTM layer is expected to have batch dimension first -> `[batch_size, sequence_length, features]` vs `layout = 0` - default `[sequence_length, batch_size, features]` ### Test Data Test data and data generator for PR located here [#1063](https://github.com/opencv/opencv_extra/pull/1063) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-17 22:46:56 +03:00
Yuantao Feng	eefee8574a	dnn: refactor reduce (#23613 ) * initial impl * remove reduce in8; fix reduce importer * fix bugs and add log sum exp * remove unnecessary header and fix indentation	2023-05-17 10:03:45 +03:00
Zihao Mu	5229312ad2	Merge pull request #22275 from zihaomu:fp16_support_conv DNN: FP16 support on Convolution 2D #22275 ## FP16 support on ARM platform This PR proposes to support FP16 backend in Convolution. For now, we only support FP16 at ARM aarch64. In addition to adding fp16, I also added `seperateIm2col` optimization in this patch. ## How to use FP16 to speed up convolution? ``` Net net = readNet(modelPath); net.setPreferableTarget(DNN_TARGET_CPU_FP16); net.setInput(blob); Mat output = net.forward(); ``` ### TODO List \| Task \| Status \| Remarks \| \|:-------:\|:--------:\|:------------:\| \| Convolution 2D FP16 \| ✔️ \| Done \| \| Winograd FP16 \| Because the current modification has reached 2k lines, winograd fp16 will be completed in the next PR. \| \| \| Accuracy Test \| ✔️ \| Done \| \| Performance Test \| ✔️ \| Done \| \| Compiler bug \| ✔️ \| Done \| ### Speed Test for FP 16. Test on M1 chip, 4 threads. \| Model Name \| FP32 (Conv+Wino) \| Conv(FP16) + Wino(FP 32) \| \|:-------:\|:--------:\|:------------:\| \| ReseNet 50 \| 26.0 ms \| 18.05 ms (25% speed up)\| \| MobileNet V2 \| 4.17 ms \| 3.09 ms (29% speed up) \| ### Speed Test for `seperateIm2col` trick on X86. Test on AMD 5600x, 12 threads. \| Model Name \| 4.x \| Patch \| \|:-------:\|:--------:\|:------------:\| \| MobileNet V2 \| 5.6 ms \| 3.0 ms (46% speed up) \| ### Performance Test #### Performance Test of X86 platform: AMD 5600X, with `-perf_threas=1` \|Name of Test\|4.x\|patch\|patch vs 4.x (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|Name of Test\|4.x 0\|fp16pr final\|fp16pr final vs 4.x 0 (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.00\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.03\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.001\|0.001\|0.92\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.002\|0.003\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.006\|0.006\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.045\|0.033\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.011\|0.009\|1.17\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.109\|0.078\|1.39\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.040\|0.042\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.326\|0.342\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.580\|0.589\|0.99\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.293\|1.382\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.590\|3.710\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.120\|1.191\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.576\|2.872\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.599\|4.670\|0.98\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|9.230\|9.582\|0.96\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|65.946\|69.381\|0.95\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|18.915\|19.289\|0.98\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|1.404\|1.457\|0.96\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|2.060\|1.501\|1.37\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.409\|1.464\|0.96\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|1.793\|1.838\|0.98\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.207\|1.199\|1.01\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.277\|1.275\|1.00\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.319\|2.370\|0.98\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.351\|1.346\|1.00\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|3.520\|3.612\|0.97\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.876\|1.880\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.981\|1.995\|0.99\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|2.620\|2.627\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|4.202\|4.123\|1.02\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.429\|2.445\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|2.591\|2.576\|1.01\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|3.005\|2.998\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|3.515\|3.532\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|3.115\|3.134\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.937\|3.899\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|5.533\|5.471\|1.01\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.472\|3.464\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|4.302\|4.322\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|6.100\|6.035\|1.01\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|6.580\|6.484\|1.01\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|9.741\|9.634\|1.01\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|10.131\|10.156\|1.00\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|12.391\|12.350\|1.00\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|91.074\|87.893\|1.04\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|5.903\|5.903\|1.00\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.890\|6.794\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.160\|5.131\|1.01\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|4.970\|5.036\|0.99\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|5.045\|5.015\|1.01\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|11.583\|11.343\|1.02\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.348\|5.320\|1.01\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|5.357\|5.396\|0.99\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|6.050\|6.006\|1.01\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|5.952\|5.953\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|8.014\|8.014\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.472\|12.577\|0.99\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|10.803\|10.655\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|18.429\|13.405\|1.37\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|6.659\|6.647\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|14.192\|13.819\|1.03\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|6.045\|6.068\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.742\|12.828\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|8.046\|7.773\|1.04\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.440\|17.192\|1.01\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|15.418\|14.972\|1.03\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.430\|0.430\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|6.692\|6.663\|1.00\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|6.350\|6.347\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.267\|0.265\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|7.755\|7.558\|1.03\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.203\|0.202\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.663\|10.576\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|10.827\|10.614\|1.02\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|7.049\|6.947\|1.01\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|6.900\|6.901\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.165\|0.165\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|17.953\|17.251\|1.04\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|7.430\|7.320\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|22.187\|21.705\|1.02\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|8.349\|8.126\|1.03\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|8.273\|8.297\|1.00\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|8.169\|8.094\|1.01\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|13.602\|13.359\|1.02\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|8.633\|8.584\|1.01\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|29.339\|28.897\|1.02\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|13.000\|12.920\|1.01\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|14.262\|13.319\|1.07\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|27.453\|27.253\|1.01\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|32.052\|27.269\|1.18\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|15.363\|15.208\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|18.543\|18.434\|1.01\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|39.114\|37.954\|1.03\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|36.271\|36.972\|0.98\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|19.262\|19.427\|0.99\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.298\|19.349\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.261\|19.847\|1.02\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.867\|21.525\|1.02\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|51.756\|49.979\|1.04\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|28.133\|27.060\|1.04\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|25.035\|24.980\|1.00\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|25.858\|25.821\|1.00\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|27.313\|27.149\|1.01\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|28.219\|28.111\|1.00\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|46.025\|46.674\|0.99\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|30.220\|29.446\|1.03\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|49.410\|48.708\|1.01\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|38.203\|38.001\|1.01\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|39.961\|39.021\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|48.685\|47.075\|1.03\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|75.114\|72.586\|1.03\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|41.222\|41.144\|1.00\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|46.220\|46.353\|1.00\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|98.201\|98.771\|0.99\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|100.106\|96.971\|1.03\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|146.977\|140.445\|1.05\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|198.618\|194.665\|1.02\| #### Performance Test of ARM platform: apple M1, with `-perf_threas=1` Min (ms) \|Name of Test\|4.x\|patch\|4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|1.07\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|1.10\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.002\|0.002\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.003\|0.003\|0.84\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.009\|0.009\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.027\|0.030\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.008\|0.007\|1.07\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.066\|0.072\|0.91\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.090\|0.054\|1.68\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.328\|0.409\|0.80\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.659\|0.697\|0.95\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.266\|1.403\|0.90\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.550\|4.145\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.188\|1.375\|0.86\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.683\|3.236\|0.83\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.491\|5.501\|0.82\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|8.916\|10.181\|0.88\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.995\|72.296\|0.97\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|22.531\|23.139\|0.97\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|2.239\|1.933\|1.16\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU_FP16)\|-\|1.010\|-\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|3.134\|2.068\|1.52\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU_FP16)\|-\|1.062\|-\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.918\|1.920\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU_FP16)\|-\|1.014\|-\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.340\|2.352\|0.99\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.247\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.116\|1.111\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU_FP16)\|-\|1.114\|-\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.116\|1.112\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|1.113\|-\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|3.067\|3.085\|0.99\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.622\|-\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.153\|1.187\|0.97\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU_FP16)\|-\|1.150\|-\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|4.804\|4.849\|0.99\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU_FP16)\|-\|2.922\|-\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.463\|1.469\|1.00\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.459\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.577\|1.580\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|1.580\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|1.826\|1.818\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|1.817\|-\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|6.541\|5.081\|1.29\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|2.809\|-\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.912\|1.919\|1.00\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|1.919\|-\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|1.961\|1.971\|0.99\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|1.961\|-\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.317\|2.329\|0.99\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.322\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|2.920\|2.947\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|2.924\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|2.467\|2.466\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|2.496\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|3.028\|2.997\|1.01\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|2.986\|-\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|4.353\|4.355\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|4.355\|-\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.762\|2.793\|0.99\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|2.797\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|3.428\|3.226\|1.06\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU_FP16)\|-\|3.223\|-\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|3.967\|3.957\|1.00\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU_FP16)\|-\|3.960\|-\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|4.806\|4.387\|1.10\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|4.366\|-\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|14.509\|11.756\|1.23\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|6.510\|-\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|13.718\|13.287\|1.03\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.190\|-\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|15.133\|14.853\|1.02\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|8.671\|-\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|41.928\|43.328\|0.97\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|38.072\|-\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.409\|4.428\|1.00\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.427\|-\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.144\|5.363\|1.15\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU_FP16)\|-\|5.368\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.926\|3.932\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.938\|-\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.920\|3.915\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.950\|-\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|3.767\|3.764\|1.00\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|3.762\|-\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|19.959\|13.875\|1.44\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|7.781\|-\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|3.951\|3.955\|1.00\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|3.969\|-\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|4.050\|4.034\|1.00\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.093\|-\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|4.923\|4.506\|1.09\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.509\|-\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|4.759\|4.476\|1.06\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.447\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|6.079\|5.628\|1.08\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU_FP16)\|-\|5.625\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.843\|17.523\|1.13\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.917\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|8.334\|8.247\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU_FP16)\|-\|8.246\|-\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|23.164\|18.199\|1.27\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.305\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|5.184\|5.178\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU_FP16)\|-\|5.149\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.990\|18.103\|0.99\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|9.777\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|4.831\|4.522\|1.07\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU_FP16)\|-\|4.523\|-\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.328\|17.319\|1.00\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|8.948\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|5.944\|5.961\|1.00\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU_FP16)\|-\|5.936\|-\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.811\|20.064\|0.99\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU_FP16)\|-\|11.705\|-\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|22.398\|17.686\|1.27\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU_FP16)\|-\|9.859\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.416\|0.416\|1.00\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.417\|-\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|5.356\|5.110\|1.05\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU_FP16)\|-\|5.114\|-\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|5.092\|4.748\|1.07\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|4.754\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.260\|0.229\|1.13\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.229\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|5.872\|5.460\|1.08\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU_FP16)\|-\|5.460\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.161\|0.161\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.161\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|7.176\|7.175\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|7.162\|-\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|7.174\|7.185\|1.00\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU_FP16)\|-\|7.157\|-\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|5.400\|5.180\|1.04\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.201\|-\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|5.330\|5.188\|1.03\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.177\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.115\|0.115\|1.00\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|0.115\|-\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|26.156\|20.222\|1.29\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU_FP16)\|-\|11.203\|-\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|5.627\|5.543\|1.02\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|5.506\|-\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|27.925\|27.741\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU_FP16)\|-\|17.217\|-\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|6.359\|6.062\|1.05\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.048\|-\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|6.559\|6.322\|1.04\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|6.280\|-\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|6.412\|6.200\|1.03\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.197\|-\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|9.167\|8.624\|1.06\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU_FP16)\|-\|8.626\|-\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|6.755\|6.491\|1.04\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU_FP16)\|-\|6.520\|-\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|35.664\|34.752\|1.03\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|20.260\|-\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|9.514\|9.414\|1.01\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.462\|-\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|10.631\|9.963\|1.07\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|9.935\|-\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|37.465\|36.798\|1.02\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|19.569\|-\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|38.157\|36.157\|1.06\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU_FP16)\|-\|18.902\|-\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.356\|10.401\|1.00\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|10.360\|-\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|12.641\|12.150\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|12.162\|-\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|50.545\|50.505\|1.00\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|27.950\|-\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|54.233\|49.603\|1.09\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU_FP16)\|-\|26.515\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|13.779\|12.968\|1.06\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|12.984\|-\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|15.809\|15.329\|1.03\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|15.433\|-\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|14.563\|14.527\|1.00\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|14.480\|-\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|16.714\|16.484\|1.01\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|16.362\|-\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|77.832\|65.729\|1.18\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU_FP16)\|-\|32.065\|-\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|21.903\|20.386\|1.07\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|20.416\|-\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|20.405\|18.148\|1.12\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU_FP16)\|-\|18.128\|-\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.334\|18.521\|1.10\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|18.495\|-\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.527\|19.584\|1.10\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|19.630\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|22.715\|20.057\|1.13\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|20.068\|-\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|26.228\|24.992\|1.05\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|24.957\|-\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|21.524\|21.581\|1.00\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|21.782\|-\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|34.094\|31.964\|1.07\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|31.925\|-\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|28.677\|27.813\|1.03\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.808\|-\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|31.274\|27.892\|1.12\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|27.910\|-\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|30.533\|30.007\|1.02\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|30.089\|-\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|39.837\|38.312\|1.04\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|38.477\|-\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|32.480\|29.237\|1.11\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU_FP16)\|-\|29.452\|-\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|33.544\|32.832\|1.02\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU_FP16)\|-\|32.784\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|134.481\|130.678\|1.03\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU_FP16)\|-\|70.134\|-\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|127.930\|126.530\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU_FP16)\|-\|65.261\|-\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|201.346\|187.007\|1.08\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU_FP16)\|-\|91.525\|-\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|252.038\|245.587\|1.03\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU_FP16)\|-\|125.477\|-\| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-05-17 09:38:33 +03:00
Alexander Smorkalov	59ca444b26	Merge pull request #23560 from WanliZhong:eltwise_cuda_bug DNN/CUDA: Solve the bug of same shape broadcast with CUDA	2023-05-16 14:16:37 +03:00
zihaomu	91b6c8507a	remove flag of convolution	2023-05-16 15:29:20 +08:00
Dmitry Kurtaev	a8d3d1f6f9	Merge pull request #23604 from dkurt:dnn_no_protobuf Build DNN without Protobuf DNN module can be built without Protobuf for Darknet, TFLite, OpenVINO, Torch (not PyTorch) models. ``` cmake \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_LIST=dnn \ -DWITH_PROTOBUF=OFF \ -DWITH_OPENCL=OFF 7.1M lib/libopencv_dnn.so.4.7.0 ``` ``` cmake \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_LIST=dnn \ -DWITH_OPENCL=OFF 3.9M lib/libopencv_dnn.so.4.7.0 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-05-15 12:23:18 +03:00
wanli	46991bcd62	Solve the bug of same shape broadcast with CUDA	2023-05-15 13:55:38 +08:00
Alexander Smorkalov	85b04f0b4d	Merge pull request #23557 from WanliZhong:eltwise_cpu_bug fix nary elementwise bug in cpu	2023-05-11 15:56:46 +03:00
Dmitry Kurtaev	676afdc494	Update FlatBuffers source code to 23.5.9	2023-05-10 14:39:36 +03:00
wanli	85cc4086c8	fix nary elementwise bug in cpu	2023-05-10 14:29:33 +08:00
Alexander Smorkalov	25c28c5da4	Merge pull request #23485 from zihaomu:add_onnx_where DNN: add ONNX where node support	2023-05-05 09:21:07 +03:00
zihaomu	0513741a85	add broadcast where node	2023-05-05 11:16:19 +08:00
Alexander Smorkalov	351589e5fb	Merge pull request #23491 from fengyuentau:patch_for_segment_anything Fixes for Segment Anything	2023-05-04 21:07:58 +03:00
Alexander Alekhin	3c76b33532	Merge pull request #22614 from zihaomu:add_std2DB_API	2023-05-01 19:37:23 +00:00
zihaomu	8be93a6de7	add scale factor to DB demo.	2023-04-30 22:03:21 +08:00
Abduragim Shtanchaev	3b1ee0549b	added test for lstm without hidden states initialization	2023-04-25 16:01:13 +03:00
Alexander Smorkalov	e3e1f704a4	Merge pull request #23528 from WanliZhong:issue23278 DNN/CUDA: make 'abcd op 1b11' broadcast eltwise operator support cuda	2023-04-24 19:31:55 +03:00
Dmitry Kurtaev	aa57833ad5	Merge pull request #23409 from dkurt:dnn_tflite_quant Import and inference INT8 quantized TFLite model #23409 ### Pull Request Readiness Checklist * Support quantized TFLite models * Enable fused activations (FP32, INT8) Merge with extra: https://github.com/opencv/opencv_extra/pull/1048 ![res](https://user-images.githubusercontent.com/25801568/231433201-566b4bd6-ccff-462c-9e74-adbdcdf3648b.png) on the image, green boxes are from TFLite and red boxes from OpenCV See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-24 13:44:10 +03:00
Abduragim Shtanchaev	e4e774d42b	Merge pull request #23475 from Abdurrahheem:lstm_fix_initialization Fix ONNX parser for single-layer LSTM hidden and cell states #23475 ### Fix ONNX parser for single-layer LSTM hidden and cell states ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake This PR addresses #21118 [issue](https://github.com/opencv/opencv/issues/21118). The problem is that the ONNX parser is unable to read the hidden state and cell state for single-layer LSTMs. This PR fixes the issue by updating the parser to correctly read hidden and cell states.	2023-04-24 13:39:41 +03:00
wanli	e4360294c5	make 'abcd op 1b11' broadcast support cuda	2023-04-23 17:46:50 +08:00
Alexander Alekhin	9ab0ff6cf2	Merge pull request #23511 from zihaomu:issue_23465	2023-04-22 04:01:26 +00:00
Zihao Mu	601778e0e6	Merge pull request #22750 from zihaomu:improve_blobFromImage DNN: Add New API blobFromImageParam #22750 The purpose of this PR: 1. Add new API `blobFromImageParam` to extend `blobFromImage` API. It can support the different data layout (NCHW or NHWC), and letter_box. 2. ~~`blobFromImage` can output `CV_16F`~~ ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-04-21 19:10:17 +03:00
zihaomu	54e1a8709d	fix the bug, disable the fast1x1 when padding is not 0.	2023-04-21 10:55:07 +08:00
Yuantao Feng	3c1fcd5deb	Merge pull request #23401 from fengyuentau:fix_cann_layer_support dnn: Support more operators in CANN backend #23401 This PR adds the support of following layers: - [x] Sub - [x] PRelu - [x] DeConv - [x] Also warn users if backend is switched back to default if some of the layers are not supported. - [ ] [Dropped] LSTM: some hacks (adding layers) were introduced which makes it even harder to build the graph for CANN backend. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-20 10:18:35 +03:00
Abduragim Shtanchaev	b3a2444bcf	Merge pull request #23501 from Abdurrahheem:additional_lstm_tests Added LSTM and GRU tests for various batch and input length sizes #23501 Added tests with various sequence length and batch sizes Test data: https://github.com/opencv/opencv_extra/pull/1057 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-20 10:11:33 +03:00
Alexander Smorkalov	aa17f881b1	Merge pull request #23482 from zihaomu:onnx_opset13_split DNN: support the split node of onnx opset >= 13	2023-04-14 11:59:57 +03:00
fengyuentau	4f99e5ab37	allow null constant_value in Pad and ignore it when loading	2023-04-14 16:50:16 +08:00
fengyuentau	88cacd35c5	support broadcast on axis > 1 for Expand	2023-04-14 15:52:27 +08:00
Alexander Smorkalov	136121f6ee	Merge pull request #22660 from zhouzq-thu:4.x Fix objectness is not assigned in dnn::region_layer	2023-04-12 09:34:58 +03:00
Alexander Smorkalov	3f02c9d5b9	Merge pull request #23310 from hanliutong:fix_hal_compatibility Fix HAL compatibility layer	2023-04-11 12:43:54 +03:00
zihaomu	51281f8d69	support the split node of onnx opset >= 13	2023-04-11 16:18:50 +08:00
Yuantao Feng	3a83a35ab0	Merge pull request #23296 from fengyuentau:fix_identifying_constant Fix identifying initializers in ONNX graph simplification #23296 Fixes https://github.com/opencv/opencv/issues/23295 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-04-06 15:35:31 +03:00
Dmitry Kurtaev	5e1d33329b	Several fixes for ONNX importer: Expand, Gather	2023-03-27 22:15:26 +03:00
HAN Liutong	a809ae4e88	Fix HAL compatibility layer and modify use cases.	2023-03-27 21:30:47 +08:00
Dmitry Kurtaev	5df6b4a756	Merge pull request #23325 from dkurt:dnn_input_info Propagate inputs info for ONNX and TFLite models ### Pull Request Readiness Checklist Needed for generic applications such as benchmarking pipelines. So OpenCV can tell about the default input shapes specified in the models. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-03-21 14:50:53 +03:00
Alexander Smorkalov	924a65413a	Merge pull request #23357 from zihaomu:fix_winograd_error_32bit DNN : fix bug in 32 bit cpu	2023-03-15 11:24:54 +03:00
zihaomu	6bac5453d1	fix bug in 32 bit cpu	2023-03-15 08:24:55 +08:00
Alexander Smorkalov	ccbc784195	Merge pull request #23354 from zihaomu:issue_23351 DNN : fix bug in layer fusion	2023-03-14 17:23:25 +03:00
zihaomu	386be97ce2	fix bug in layer fusion	2023-03-14 19:06:06 +08:00
tingbo.liao	7d032de7e8	Fix bugs of test case failure 4 failed tests in open_test_dnn listed below: * Test_Caffe_layers.Conv_Elu/0, where GetParam() = OCV/CPU * Test_ONNX_layers.ConvResizePool1d/0, where GetParam() = OCV/CPU * Test_TensorFlow_layers.tf_reshape_nhwc/0, where GetParam() = OCV/CPU * Test_Torch_layers.net_inception_block/0, where GetParam() = OCV/CPU In winofunc_AtXA_8x8_f32 and winofunc_BtXB_8x8_f32 implementation, incorrect input parameters cause tests failure. Add four new different variables for the last four input parameters of v_transpose4x4 to fix bugs, and update related comments. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	2023-03-14 17:05:19 +08:00
Alexander Smorkalov	22a52766dc	Merge pull request #23343 from zihaomu:fix_test_onnx_conf DNN Test ONNX: Fix the logic of the test case	2023-03-13 21:48:41 +03:00
Yuantao Feng	b94e13c8ae	Merge pull request #23319 from fengyuentau:fix_zoo_issue_136 Related issue: https://github.com/opencv/opencv_zoo/issues/136 Features added: - Support operators with multiple output: ONNX Split. - Support Slice without steps. Bugs fixed: - Wrong settings in ClipByValue (Relu6). - Wrong calculation of pads in convolution layer (It is wrong generally but only fixed specifically for CANN for now). ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-03-13 21:46:33 +03:00
zihaomu	ee3740af00	move global skip out of if loop, and add opencv_deny_list	2023-03-13 22:16:51 +08:00
Zihao Mu	e03e2e7f94	Merge pull request #23192 from zihaomu:clean_up_SIMD_code ### Purpose of this PR: - Move all dispatch and SIMD code of `convolution layer` into `simd.hpp` file. - Support Winograd at AVX-only machine. - Re-name the folder from `fast_conv` to `cpu_kernels`. In the future, we can put other layers of CPU optimization into it, like `GEMM` or `MatMul`. ## Performance Test Since this patch just focuses on the code style, the performance is expected as the same as before. Test with the following script: `./bin/opencv_perf_dnn '--gtest_filter=conv' --gtest_output="xml:../1-0th.xml" --perf_threads=1` ### Test on X86 platform Min (ms) \|Name of Test\|4.x \| patch \| 4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|0.98\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|0.95\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.001\|0.001\|0.97\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.002\|0.002\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.002\|0.002\|0.94\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.040\|0.044\|0.93\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.010\|0.010\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.106\|0.103\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.041\|0.040\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.340\|0.329\|1.03\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.590\|0.567\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.374\|1.314\|1.05\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.715\|3.528\|1.05\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.181\|1.166\|1.01\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.689\|2.587\|1.04\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.754\|4.500\|1.06\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|9.612\|9.112\|1.05\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.000\|64.676\|1.07\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|20.248\|18.451\|1.10\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|1.395\|1.392\|1.00\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|1.990\|1.984\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.393\|1.360\|1.02\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|1.813\|1.744\|1.04\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.190\|1.191\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.286\|1.284\|1.00\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.295\|2.279\|1.01\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.322\|1.331\|0.99\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|3.784\|3.533\|1.07\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.838\|1.844\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.957\|1.959\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|2.596\|2.573\|1.01\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|4.183\|4.083\|1.02\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|2.413\|2.406\|1.00\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|2.538\|2.546\|1.00\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.972\|2.980\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|3.452\|3.464\|1.00\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|3.082\|3.105\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.043\|3.919\|1.03\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|5.538\|5.531\|1.00\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.393\|3.418\|0.99\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|4.325\|4.234\|1.02\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|6.009\|5.908\|1.02\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|6.557\|6.376\|1.03\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|10.114\|9.472\|1.07\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|10.373\|9.879\|1.05\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|12.782\|11.624\|1.10\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|90.931\|90.552\|1.00\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|6.091\|5.818\|1.05\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|7.083\|6.643\|1.07\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.054\|5.059\|1.00\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|5.005\|4.931\|1.02\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|4.951\|5.065\|0.98\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|11.957\|11.293\|1.06\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|5.328\|5.250\|1.01\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|5.544\|5.292\|1.05\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|6.186\|5.893\|1.05\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|6.153\|5.834\|1.05\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|8.154\|8.107\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|12.699\|12.256\|1.04\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|11.355\|11.217\|1.01\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.062\|17.814\|1.07\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|6.820\|6.531\|1.04\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|14.502\|13.483\|1.08\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|6.270\|6.123\|1.02\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|13.173\|12.451\|1.06\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|8.326\|7.652\|1.09\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|17.605\|16.465\|1.07\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|15.675\|14.771\|1.06\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.420\|0.423\|0.99\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|6.788\|6.491\|1.05\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|6.456\|6.168\|1.05\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.263\|0.261\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|7.690\|7.398\|1.04\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.200\|0.202\|0.99\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.542\|10.464\|1.01\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|10.876\|10.728\|1.01\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|7.194\|6.768\|1.06\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|7.099\|6.731\|1.05\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.147\|0.162\|0.91\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|18.558\|17.141\|1.08\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|7.641\|7.219\|1.06\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|22.666\|20.999\|1.08\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|8.523\|7.921\|1.08\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|8.514\|8.109\|1.05\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|8.300\|7.878\|1.05\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|13.403\|13.131\|1.02\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|8.920\|8.357\|1.07\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|28.827\|27.616\|1.04\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|12.895\|12.670\|1.02\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|14.120\|13.078\|1.08\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|27.541\|27.582\|1.00\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|32.367\|31.140\|1.04\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|14.934\|14.910\|1.00\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|18.289\|18.491\|0.99\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|37.857\|36.845\|1.03\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|37.402\|36.566\|1.02\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|19.031\|19.164\|0.99\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.019\|19.135\|0.99\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|20.077\|19.400\|1.03\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.883\|21.302\|1.03\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|51.288\|49.851\|1.03\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|27.349\|28.359\|0.96\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|24.915\|25.130\|0.99\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|25.488\|25.899\|0.98\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|27.346\|27.390\|1.00\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|28.033\|28.301\|0.99\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|50.216\|49.970\|1.00\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|29.670\|29.513\|1.01\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|50.565\|49.634\|1.02\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|37.900\|37.814\|1.00\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|41.367\|39.742\|1.04\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|49.128\|50.350\|0.98\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|79.643\|80.645\|0.99\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|41.439\|40.895\|1.01\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|46.504\|46.220\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|98.086\|96.842\|1.01\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|102.447\|97.299\|1.05\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|145.047\|144.996\|1.00\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|206.104\|195.543\|1.05\| ### Test on M1(ARM) platform \|Name of Test\|4.x\|patch\|4.x vs patch (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 19}, OCN=2, G=2, S=2, P=(1, 1), BIAS, OCV/CPU)\|0.001\|0.001\|0.97\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 2, 25}, OCN=2, G=2, P=(2, 2), PM=SAME, OCV/CPU)\|0.001\|0.001\|0.94\| \|conv1d::Conv1D::(GFLOPS=0.000, K=[3], IN={1, 6, 10}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.002\|0.002\|0.92\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 4, 9, 10, 10}, OCN=4, S=[1 x 1 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.003\|0.003\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[1 x 1 x 1], IN={1, 8, 1, 10, 10}, OCN=8, G=8, P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.003\|0.003\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 3 x 3], IN={1, 2, 19, 19, 19}, OCN=2, G=2, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), BIAS, OCV/CPU)\|0.031\|0.031\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.000, K=[3 x 4 x 2], IN={1, 4, 8, 10, 10}, OCN=4, G=4, S=[1 x 2 x 1], BIAS, OCV/CPU)\|0.009\|0.009\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.001, K=[3 x 3 x 3], IN={1, 2, 25, 19, 19}, OCN=2, G=2, S=[1 x 2 x 2], P=(2, 2) x (2, 2) x (2, 2), PM=SAME, OCV/CPU)\|0.066\|0.066\|1.01\| \|conv3d::Conv3D::(GFLOPS=0.002, K=[3 x 1 x 4], IN={1, 14, 5, 10, 10}, OCN=14, PM=SAME, OCV/CPU)\|0.102\|0.102\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.006, K=[5 x 5 x 5], IN={1, 4, 50, 19, 19}, OCN=4, S=[2 x 2 x 2], P=(1, 1) x (1, 1) x (1, 1), PM=VALID, OCV/CPU)\|0.328\|0.328\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.027, K=[3 x 3 x 3], IN={1, 6, 10, 38, 50}, OCN=6, PM=VALID, BIAS, OCV/CPU)\|0.693\|0.747\|0.93\| \|conv3d::Conv3D::(GFLOPS=0.030, K=[5 x 5 x 5], IN={1, 6, 19, 19, 19}, OCN=6, G=2, OCV/CPU)\|1.268\|1.266\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.045, K=[7 x 7 x 7], IN={1, 2, 38, 38, 38}, OCN=2, S=[1 x 2 x 1], OCV/CPU)\|3.530\|3.581\|0.99\| \|conv3d::Conv3D::(GFLOPS=0.053, K=[3 x 3 x 3], IN={1, 10, 98, 10, 10}, OCN=10, PM=SAME, OCV/CPU)\|1.186\|1.188\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.071, K=[7 x 7 x 7], IN={1, 6, 15, 19, 19}, OCN=6, S=[2 x 1 x 1], P=(3, 3) x (3, 3) x (3, 3), PM=SAME, BIAS, OCV/CPU)\|2.682\|2.683\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.093, K=[5 x 5 x 5], IN={1, 4, 40, 75, 75}, OCN=4, S=[2 x 2 x 2], OCV/CPU)\|4.490\|4.501\|1.00\| \|conv3d::Conv3D::(GFLOPS=0.116, K=[5 x 5 x 5], IN={1, 2, 21, 75, 100}, OCN=2, BIAS, OCV/CPU)\|8.914\|8.938\|1.00\| \|conv3d::Conv3D::(GFLOPS=1.267, K=[5 x 5 x 5], IN={1, 3, 75, 75, 100}, OCN=3, PM=SAME, BIAS, OCV/CPU)\|69.819\|69.876\|1.00\| \|conv3d::Conv3D::(GFLOPS=1.343, K=[3 x 3 x 3], IN={1, 11, 9, 150, 200}, OCN=11, PM=VALID, BIAS, OCV/CPU)\|24.058\|22.420\|1.07\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 512, 26, 26}, OCN=256, OCV/CPU)\|2.240\|2.236\|1.00\| \|conv::Conv::(GFLOPS=0.177, K=[1 x 1], IN={1, 1024, 13, 13}, OCN=512, OCV/CPU)\|3.132\|3.136\|1.00\| \|conv::Conv::(GFLOPS=0.178, K=[1 x 1], IN={1, 256, 52, 52}, OCN=128, OCV/CPU)\|1.920\|1.919\|1.00\| \|conv::Conv::(GFLOPS=0.210, K=[1 x 1], IN={1, 576, 38, 50}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.343\|2.346\|1.00\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 128, 56, 56}, OCN=32, P=[1 x 1], OCV/CPU)\|1.234\|1.116\|1.11\| \|conv::Conv::(GFLOPS=0.231, K=[3 x 3], IN={1, 256, 14, 14}, OCN=256, P=[1 x 1], OCV/CPU)\|1.109\|1.121\|0.99\| \|conv::Conv::(GFLOPS=0.280, K=[1 x 1], IN={1, 576, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|3.197\|3.084\|1.04\| \|conv::Conv::(GFLOPS=0.302, K=[3 x 3], IN={1, 64, 64, 64}, OCN=64, PM=SAME, OCV/CPU)\|1.123\|1.148\|0.98\| \|conv::Conv::(GFLOPS=0.357, K=[1 x 1], IN={1, 64, 208, 208}, OCN=64, OCV/CPU)\|4.836\|5.061\|0.96\| \|conv::Conv::(GFLOPS=0.420, K=[3 x 3], IN={1, 96, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.535\|1.463\|1.05\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 128, 40, 40}, OCN=128, PM=SAME, OCV/CPU)\|1.756\|1.584\|1.11\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 256, 20, 20}, OCN=256, PM=SAME, OCV/CPU)\|1.821\|1.820\|1.00\| \|conv::Conv::(GFLOPS=0.472, K=[3 x 3], IN={1, 512, 10, 10}, OCN=512, PM=SAME, OCV/CPU)\|7.049\|6.672\|1.06\| \|conv::Conv::(GFLOPS=0.561, K=[3 x 3], IN={1, 128, 38, 50}, OCN=128, PM=SAME, BIAS, OCV/CPU)\|1.967\|1.922\|1.02\| \|conv::Conv::(GFLOPS=0.624, K=[3 x 3], IN={1, 128, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|1.943\|1.977\|0.98\| \|conv::Conv::(GFLOPS=0.701, K=[3 x 3], IN={1, 128, 38, 50}, OCN=160, PM=SAME, BIAS, OCV/CPU)\|2.464\|2.310\|1.07\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 64, 104, 104}, OCN=64, P=[1 x 1], OCV/CPU)\|2.860\|2.904\|0.98\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 128, 52, 52}, OCN=128, P=[1 x 1], OCV/CPU)\|2.428\|2.483\|0.98\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 256, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|2.955\|2.983\|0.99\| \|conv::Conv::(GFLOPS=0.798, K=[3 x 3], IN={1, 512, 13, 13}, OCN=512, P=[1 x 1], OCV/CPU)\|4.328\|4.484\|0.97\| \|conv::Conv::(GFLOPS=0.830, K=[3 x 3], IN={1, 64, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|2.712\|2.778\|0.98\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 192, 38, 38}, OCN=192, PM=SAME, OCV/CPU)\|3.205\|3.331\|0.96\| \|conv::Conv::(GFLOPS=0.958, K=[3 x 3], IN={1, 384, 19, 19}, OCN=384, PM=SAME, OCV/CPU)\|4.193\|4.412\|0.95\| \|conv::Conv::(GFLOPS=1.022, K=[3 x 3], IN={1, 576, 19, 19}, OCN=273, PM=SAME, BIAS, OCV/CPU)\|5.026\|4.565\|1.10\| \|conv::Conv::(GFLOPS=1.112, K=[3 x 3], IN={1, 512, 10, 10}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|14.490\|14.213\|1.02\| \|conv::Conv::(GFLOPS=1.181, K=[3 x 3], IN={1, 64, 160, 200}, OCN=128, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|14.886\|14.003\|1.06\| \|conv::Conv::(GFLOPS=1.182, K=[3 x 3], IN={1, 32, 320, 400}, OCN=64, S=[2 x 2], P=[1 x 1], BIAS, OCV/CPU)\|15.923\|15.184\|1.05\| \|conv::Conv::(GFLOPS=1.195, K=[9 x 9], IN={1, 32, 240, 320}, OCN=3, P=[4 x 4], BIAS, OCV/CPU)\|45.136\|41.696\|1.08\| \|conv::Conv::(GFLOPS=1.196, K=[3 x 3], IN={1, 384, 26, 26}, OCN=256, P=[1 x 1], OCV/CPU)\|4.995\|4.631\|1.08\| \|conv::Conv::(GFLOPS=1.210, K=[3 x 3], IN={1, 32, 256, 256}, OCN=32, PM=SAME, OCV/CPU)\|6.402\|6.261\|1.02\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 64, 75, 75}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|4.478\|3.965\|1.13\| \|conv::Conv::(GFLOPS=1.245, K=[3 x 3], IN={1, 96, 75, 100}, OCN=96, PM=SAME, BIAS, OCV/CPU)\|3.908\|3.978\|0.98\| \|conv::Conv::(GFLOPS=1.248, K=[3 x 3], IN={1, 256, 46, 46}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|4.176\|4.206\|0.99\| \|conv::Conv::(GFLOPS=1.258, K=[3 x 3], IN={1, 1280, 10, 10}, OCN=546, PM=SAME, BIAS, OCV/CPU)\|21.509\|21.136\|1.02\| \|conv::Conv::(GFLOPS=1.261, K=[3 x 3], IN={1, 192, 38, 50}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|4.426\|4.082\|1.08\| \|conv::Conv::(GFLOPS=1.416, K=[3 x 3], IN={1, 128, 62, 82}, OCN=128, BIAS, OCV/CPU)\|4.098\|4.289\|0.96\| \|conv::Conv::(GFLOPS=1.500, K=[3 x 3], IN={1, 128, 64, 84}, OCN=128, BIAS, OCV/CPU)\|4.646\|5.105\|0.91\| \|conv::Conv::(GFLOPS=1.586, K=[3 x 3], IN={1, 128, 66, 86}, OCN=128, BIAS, OCV/CPU)\|4.746\|4.724\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 26, 26}, OCN=512, P=[1 x 1], OCV/CPU)\|5.614\|5.779\|0.97\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 256, 52, 52}, OCN=512, S=[2 x 2], P=[1 x 1], OCV/CPU)\|21.909\|20.718\|1.06\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 13, 13}, OCN=1024, P=[1 x 1], OCV/CPU)\|8.256\|8.290\|1.00\| \|conv::Conv::(GFLOPS=1.595, K=[3 x 3], IN={1, 512, 26, 26}, OCN=1024, S=[2 x 2], P=[1 x 1], OCV/CPU)\|25.196\|23.267\|1.08\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 104, 104}, OCN=128, P=[1 x 1], OCV/CPU)\|5.721\|5.172\|1.11\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 64, 208, 208}, OCN=128, S=[2 x 2], P=[1 x 1], OCV/CPU)\|20.066\|18.322\|1.10\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 52, 52}, OCN=256, P=[1 x 1], OCV/CPU)\|4.448\|4.542\|0.98\| \|conv::Conv::(GFLOPS=1.596, K=[3 x 3], IN={1, 128, 104, 104}, OCN=256, S=[2 x 2], P=[1 x 1], OCV/CPU)\|19.193\|19.013\|1.01\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 208, 208}, OCN=64, P=[1 x 1], OCV/CPU)\|6.009\|5.964\|1.01\| \|conv::Conv::(GFLOPS=1.598, K=[3 x 3], IN={1, 32, 416, 416}, OCN=64, S=[2 x 2], P=[1 x 1], OCV/CPU)\|20.169\|20.009\|1.01\| \|conv::Conv::(GFLOPS=1.659, K=[3 x 3], IN={1, 960, 10, 10}, OCN=960, PM=SAME, OCV/CPU)\|22.584\|23.423\|0.96\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, G=128, P=[1 x 1], BIAS, OCV/CPU)\|0.372\|0.504\|0.74\| \|conv::Conv::(GFLOPS=1.660, K=[3 x 3], IN={1, 128, 75, 75}, OCN=128, PM=SAME, OCV/CPU)\|5.426\|5.456\|0.99\| \|conv::Conv::(GFLOPS=1.675, K=[3 x 3], IN={1, 128, 68, 88}, OCN=128, BIAS, OCV/CPU)\|4.945\|5.221\|0.95\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, G=256, P=[1 x 1], BIAS, OCV/CPU)\|0.210\|0.261\|0.81\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 256, 38, 38}, OCN=256, PM=SAME, OCV/CPU)\|5.720\|5.997\|0.95\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, G=512, P=[1 x 1], BIAS, OCV/CPU)\|0.149\|0.161\|0.93\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|7.154\|7.225\|0.99\| \|conv::Conv::(GFLOPS=1.704, K=[3 x 3], IN={1, 512, 19, 19}, OCN=512, PM=SAME, OCV/CPU)\|7.184\|7.223\|0.99\| \|conv::Conv::(GFLOPS=1.766, K=[3 x 3], IN={1, 128, 70, 90}, OCN=128, BIAS, OCV/CPU)\|5.324\|5.343\|1.00\| \|conv::Conv::(GFLOPS=1.859, K=[3 x 3], IN={1, 128, 72, 92}, OCN=128, BIAS, OCV/CPU)\|5.114\|5.238\|0.98\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, G=1024, P=[1 x 1], BIAS, OCV/CPU)\|0.111\|0.121\|0.92\| \|conv::Conv::(GFLOPS=1.888, K=[3 x 3], IN={1, 1024, 10, 10}, OCN=1024, PM=SAME, OCV/CPU)\|25.907\|26.804\|0.97\| \|conv::Conv::(GFLOPS=1.954, K=[3 x 3], IN={1, 128, 74, 94}, OCN=128, BIAS, OCV/CPU)\|5.695\|5.654\|1.01\| \|conv::Conv::(GFLOPS=1.995, K=[9 x 9], IN={1, 3, 320, 400}, OCN=32, P=[4 x 4], BIAS, OCV/CPU)\|27.435\|27.566\|1.00\| \|conv::Conv::(GFLOPS=2.052, K=[3 x 3], IN={1, 128, 76, 96}, OCN=128, BIAS, OCV/CPU)\|6.944\|6.164\|1.13\| \|conv::Conv::(GFLOPS=2.100, K=[3 x 3], IN={1, 144, 75, 75}, OCN=144, PM=SAME, OCV/CPU)\|7.180\|6.717\|1.07\| \|conv::Conv::(GFLOPS=2.153, K=[3 x 3], IN={1, 128, 78, 98}, OCN=128, BIAS, OCV/CPU)\|6.817\|6.050\|1.13\| \|conv::Conv::(GFLOPS=2.156, K=[3 x 3], IN={1, 576, 19, 19}, OCN=576, PM=SAME, OCV/CPU)\|9.225\|8.660\|1.07\| \|conv::Conv::(GFLOPS=2.255, K=[3 x 3], IN={1, 128, 80, 100}, OCN=128, BIAS, OCV/CPU)\|7.496\|6.625\|1.13\| \|conv::Conv::(GFLOPS=2.719, K=[3 x 3], IN={1, 96, 256, 256}, OCN=96, S=[2 x 2], PM=SAME, OCV/CPU)\|35.520\|36.056\|0.99\| \|conv::Conv::(GFLOPS=3.319, K=[3 x 3], IN={1, 128, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|9.990\|9.702\|1.03\| \|conv::Conv::(GFLOPS=3.321, K=[3 x 3], IN={1, 64, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|10.517\|10.746\|0.98\| \|conv::Conv::(GFLOPS=3.398, K=[7 x 7], IN={1, 128, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|36.702\|36.731\|1.00\| \|conv::Conv::(GFLOPS=3.407, K=[3 x 3], IN={1, 512, 19, 19}, OCN=1024, D=[6 x 6], P=[6 x 6], BIAS, OCV/CPU)\|41.035\|38.280\|1.07\| \|conv::Conv::(GFLOPS=3.408, K=[3 x 3], IN={1, 256, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|10.981\|10.573\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[3 x 3], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|12.863\|12.384\|1.04\| \|conv::Conv::(GFLOPS=4.247, K=[5 x 5], IN={1, 144, 128, 128}, OCN=144, S=[2 x 2], PM=SAME, OCV/CPU)\|50.437\|54.088\|0.93\| \|conv::Conv::(GFLOPS=4.566, K=[7 x 7], IN={1, 172, 46, 46}, OCN=128, P=[3 x 3], BIAS, OCV/CPU)\|50.650\|50.635\|1.00\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 256, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|14.696\|14.606\|1.01\| \|conv::Conv::(GFLOPS=4.993, K=[3 x 3], IN={1, 512, 46, 46}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|16.201\|15.426\|1.05\| \|conv::Conv::(GFLOPS=4.994, K=[3 x 3], IN={1, 128, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|16.061\|14.292\|1.12\| \|conv::Conv::(GFLOPS=4.997, K=[3 x 3], IN={1, 64, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|17.743\|18.250\|0.97\| \|conv::Conv::(GFLOPS=5.780, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, S=[2 x 2], PM=SAME, OCV/CPU)\|77.909\|78.165\|1.00\| \|conv::Conv::(GFLOPS=6.116, K=[3 x 3], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|21.579\|21.879\|0.99\| \|conv::Conv::(GFLOPS=6.118, K=[3 x 3], IN={1, 144, 128, 128}, OCN=144, PM=SAME, OCV/CPU)\|20.424\|19.589\|1.04\| \|conv::Conv::(GFLOPS=6.637, K=[3 x 3], IN={1, 256, 75, 75}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|19.389\|19.461\|1.00\| \|conv::Conv::(GFLOPS=6.638, K=[3 x 3], IN={1, 128, 150, 150}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|21.319\|20.358\|1.05\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 150, 200}, OCN=192, PM=SAME, BIAS, OCV/CPU)\|22.609\|21.826\|1.04\| \|conv::Conv::(GFLOPS=6.641, K=[3 x 3], IN={1, 64, 300, 300}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|25.497\|25.789\|0.99\| \|conv::Conv::(GFLOPS=6.814, K=[3 x 3], IN={1, 512, 38, 38}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|21.966\|22.108\|0.99\| \|conv::Conv::(GFLOPS=8.025, K=[3 x 3], IN={1, 1024, 19, 19}, OCN=1206, P=[1 x 1], BIAS, OCV/CPU)\|35.883\|33.470\|1.07\| \|conv::Conv::(GFLOPS=9.986, K=[3 x 3], IN={1, 512, 46, 46}, OCN=512, P=[1 x 1], BIAS, OCV/CPU)\|31.041\|29.314\|1.06\| \|conv::Conv::(GFLOPS=9.987, K=[3 x 3], IN={1, 256, 92, 92}, OCN=256, P=[1 x 1], BIAS, OCV/CPU)\|29.922\|28.145\|1.06\| \|conv::Conv::(GFLOPS=9.989, K=[3 x 3], IN={1, 128, 184, 184}, OCN=128, P=[1 x 1], BIAS, OCV/CPU)\|31.624\|31.148\|1.02\| \|conv::Conv::(GFLOPS=9.993, K=[3 x 3], IN={1, 64, 368, 368}, OCN=64, P=[1 x 1], BIAS, OCV/CPU)\|38.564\|39.164\|0.98\| \|conv::Conv::(GFLOPS=10.087, K=[3 x 3], IN={1, 576, 38, 50}, OCN=512, PM=SAME, BIAS, OCV/CPU)\|31.502\|30.269\|1.04\| \|conv::Conv::(GFLOPS=10.701, K=[3 x 3], IN={1, 512, 38, 38}, OCN=804, P=[1 x 1], BIAS, OCV/CPU)\|34.248\|34.589\|0.99\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 240, 64, 64}, OCN=240, PM=SAME, OCV/CPU)\|130.211\|134.120\|0.97\| \|conv::Conv::(GFLOPS=11.797, K=[5 x 5], IN={1, 480, 32, 32}, OCN=480, PM=SAME, OCV/CPU)\|127.490\|132.874\|0.96\| \|conv::Conv::(GFLOPS=16.987, K=[5 x 5], IN={1, 1152, 16, 16}, OCN=1152, PM=SAME, OCV/CPU)\|199.834\|200.081\|1.00\| \|conv::Conv::(GFLOPS=23.122, K=[5 x 5], IN={1, 672, 32, 32}, OCN=672, PM=SAME, OCV/CPU)\|247.346\|247.523\|1.00\| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake ``` force_builders=Linux AVX2,Custom Win build_image:Custom Win=msvs2019 CPU_BASELINE:Custom Win=AVX512_SKX ```	2023-03-10 11:59:49 +03:00
Alexander Alekhin	9eb5e39ff3	dnn(tflite): fix wrong axis normalization	2023-02-21 21:20:37 +00:00
Alexander Alekhin	bdff0949bb	dnn(tflite): add 3rdparty flatbuffers with pre-generated schema	2023-02-21 16:06:19 +00:00
Zihao Mu	20dac7ea48	Merge pull request #23255 from zihaomu:fused_cuda_naryeltwise DNN: fuse conv+naryEletwise on CUDA backend.	2023-02-17 10:18:13 +00:00
Alexander Alekhin	58d8a2702a	Merge pull request #23243 from WanliZhong:accelerate_palm_det	2023-02-14 16:25:02 +00:00
Dmitry Kurtaev	76350cd30f	Merge pull request #23161 from dkurt:dnn_tflite TFLite models importer * initial commit * Refactor TFLiteImporter * Better FlatBuffers detection * Add permute before 4D->3D reshape * Track layers layout * TFLite Convolution2DTransposeBias layer * Skip TFLite tests without FlatBuffers * Fix check of FlatBuffers in tests. Add readNetFromTFLite from buffer * TFLite Max Unpooling test * Add skip for TFLite unpooling test * Revert DW convolution workaround * Fix ObjC bindings * Better errors handling * Regenerate TFLite schema using flatc * dnn(tflite): more checks, better logging * Checks for unimplemented fusion. Fix tests	2023-02-13 14:00:20 +00:00
Yuantao Feng	c2b7c1f13b	Merge pull request #23219 from fengyuentau:add_gelu Add GELU layer for vision transformers * add gelu and gelu approximation * drop setKernelParams	2023-02-10 18:03:29 +00:00
wanli	c8f5e228fc	release MUL and ADD operator on CUDA	2023-02-10 19:33:59 +08:00
Alexander Alekhin	96a45e842e	Merge pull request #23061 from WanliZhong:gemm_cuda DNN: make GEMM can be supported with transA and transB in CUDA	2023-02-09 00:06:32 +03:00
wanli	4718a4bf81	make GEMM can be supported with transA and transB in CUDA	2023-01-31 15:14:17 +08:00
Alexander Alekhin	cd44aa0bb1	Merge pull request #23162 from zihaomu:issue_23151	2023-01-28 13:00:43 +00:00
zihaomu	f45a12439a	fix depth wise issue.	2023-01-28 11:41:00 +08:00
Yuantao Feng	4d918ba40b	Merge pull request #23047 from fengyuentau:layer_norm dnn: add layer normalization for vision transformers * add layer norm onnx parser, impl and tests * add onnx graph simplifier for layer norm expanded * handle the case when constants are of type Initializer * add test case for layer norm expanded with initializers * use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16 * use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull * template hasBias * remove trailing whitespace * use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt * refactor implementation, optimize parallel_for * disable layer norm expanded * remove the removal of layer norm optional outputs	2023-01-27 16:35:59 +03:00
Alexander Alekhin	8ffc06ff72	Merge pull request #23173 from tomoaki0705:fix_warning_master	2023-01-23 15:33:16 +00:00
Tomoaki Teshima	186c18668c	suppress warning	2023-01-23 22:47:43 +09:00
Alexander Alekhin	18cbfa4a4f	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2023-01-23 00:11:12 +00:00
Alexander Alekhin	3d5e3a910f	Merge pull request #23096 from zihaomu:issue_23074	2023-01-12 00:51:04 +00:00
zihaomu	840b1d5c94	add depthwise add fuse	2023-01-11 08:42:51 +08:00
zihaomu	82616eec41	fix possible segmentation fault error in winograd on x86	2023-01-09 13:40:04 +08:00
Alexander Alekhin	9627ab9462	Merge pull request #23050 from zihaomu:fix_memory	2022-12-28 10:04:25 +00:00
zihaomu	71765858dc	fix invalid memory access	2022-12-28 17:16:11 +08:00
Alexander Alekhin	9a2a34f94e	dnn(openvino): remove undefined status	2022-12-28 06:55:00 +00:00
Alexander Alekhin	fc27a343e9	Merge pull request #22905 from zihaomu:clean_up_conv3d_1d	2022-12-26 17:39:18 +00:00
Alexander Alekhin	b42c11de82	pre: OpenCV 4.7.0 (version++)	2022-12-25 17:00:22 +00:00
Alexander Alekhin	a494c75bfe	pre: OpenCV 3.4.19 (version++)	2022-12-25 16:59:47 +00:00
Dmitry Kurtaev	8681686d8f	Merge pull request #22957 from dkurt:new_openvino_api Switch to new OpenVINO API after 2022.1 release * Pass Layer_Test_Convolution_DLDT.Accuracy/0 test * Pass test Test_Caffe_layers.Softmax * Failed 136 tests * Fix Concat. Failed 120 tests * Custom nGraph ops. 19 failed tests * Set and get properties from Core * Read model from buffer * Change MaxPooling layer output names. Restore reshape * Cosmetic changes * Cosmetic changes * Override getOutputsInfo * Fixes for OpenVINO < 2022.1 * Async inference for 2021.4 and less * Compile model with config * Fix serialize for 2022.1 * Asynchronous inference with 2022.1 * Handle 1d outputs * Work with model with dynamic output shape * Fixes with 1d output for old API * Control outputs by nGraph function for all OpenVINO versions * Refer inputs in PrePostProcessor by indices * Fix cycled dependency between InfEngineNgraphNode and InfEngineNgraphNet. Add InferRequest callback only for async inference. Do not capture InferRequest object. * Fix tests thresholds * Fix HETERO:GPU,CPU plugin issues with unsupported layer	2022-12-23 16:58:41 +00:00
Alexander Smorkalov	9012e6dd9b	Merge pull request #22965 from vrabaud:numpy_fix Remove references to deprecated NumPy type aliases.	2022-12-23 15:34:02 +03:00
Alexander Smorkalov	4930516652	Merge pull request #22898 from fengyuentau:slice_neg_steps dnn: support ONNX Slice with negative steps by adding and using cv::flipND	2022-12-23 14:15:06 +03:00
Vincent Rabaud	ad568edd7f	Remove references to deprecated NumPy type aliases. This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str). Those types were deprecated in 1.20 and are removed in 1.24, cf https://github.com/numpy/numpy/pull/22607.	2022-12-23 13:53:49 +03:00
Alexander Alekhin	1f41d06f9a	Merge pull request #23008 from mshabunin:fix-yolov4-tiny-hash	2022-12-23 10:14:25 +00:00
zihaomu	71c6339af0	remove old convolution branch, and optimize conv3d and conv1d.	2022-12-23 16:50:28 +08:00
fengyuentau	34a0897f90	add cv::flipND; support onnx slice with negative steps via cv::flipND	2022-12-23 16:39:53 +08:00
Maksim Shabunin	d35fbe6bfc	dnn: updated YOLOv4-tiny model and tests	2022-12-22 15:49:21 +03:00
Alexander Alekhin	6b4f3e5fab	Merge pull request #22993 from alalek:fixup_21738	2022-12-21 19:50:51 +00:00
Yuantao Feng	a2b3acfc6e	dnn: add the CANN backend (#22634 ) * cann backend impl v1 * cann backend impl v2: use opencv parsers to build models for cann * adjust fc according to the new transA and transB * put cann net in cann backend node and reuse forwardLayer * use fork() to create a child process and compile cann model * remove legacy code * remove debug code * fall bcak to CPU backend if there is one layer not supoorted by CANN backend * fix netInput forward	2022-12-21 09:04:41 +03:00
Alexander Alekhin	cdbb893b27	dnn: disable OpenCL code path in MatMul processing - this mode is not supported by 22828	2022-12-20 09:46:48 +00:00
Alexander Alekhin	1102b7eff8	dnn: fix gather layer implementation - support FP16 data	2022-12-20 06:09:34 +00:00
zoom	4891818114	make MatMul support 3D or 4D with broadcast	2022-12-15 10:36:08 +08:00
Alexander Alekhin	8ba44e7d55	Merge pull request #22882 from zihaomu:gemm_first_const	2022-12-08 14:18:33 +00:00
Zihao Mu	0a650b573b	Merge pull request #22840 from zihaomu:optimze_conv_memory_usage DNN: reduce the memory used in convolution layer * reduce the memory in winograd and disabel the test when usage memory is larger than 2gb. * remove VERY_LOG tag	2022-12-08 12:57:13 +00:00
Alexander Alekhin	b16f76eede	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-12-03 12:39:41 +00:00
Alexander Alekhin	d16b3b2487	dnn(test): restore openvino tests with 'Cannot get memory' message	2022-12-03 01:34:48 +00:00
Alexander Alekhin	74d0b4cc78	dnn(openvino): fix custom layers BlockingDesc	2022-12-03 01:34:10 +00:00
Alexander Smorkalov	e14ca39fd7	Merge pull request #22857 from fengyuentau:batched_nms dnn: add batched nms	2022-11-30 12:37:49 +03:00
Alexander Smorkalov	421ba8730a	Merge pull request #22809 from fengyuentau:tile dnn: support ONNX Tile	2022-11-29 14:42:28 +03:00
zihaomu	0d56524b72	gemm support transA and transB, and first input is constance.	2022-11-29 17:13:36 +08:00
fengyuentau	9fded9ca53	batched nms impl	2022-11-29 15:32:34 +08:00
fengyuentau	441624a5fb	tile impl	2022-11-29 11:15:38 +08:00
zoom	5044af69d1	let MatMul can work when both two inputs are const	2022-11-27 17:32:41 +08:00
Alexander Smorkalov	6ca205a029	Merge pull request #22478 from WanliZhong:nary_eltwise_cuda DNN: Let part of the operators in nary_eltwise support CUDA	2022-11-22 16:15:50 +03:00
zihaomu	5bf64e7dfe	fix the infinite loop in tf importer of 3.4 branch	2022-11-15 11:42:10 +08:00
zoom	ef2677b0a6	Make MatMul layer support 3d or 4d operation with const input	2022-11-10 11:41:44 +08:00
zoom	11d492b0b9	Let part of the operators in nary_eltwise support cuda	2022-11-02 14:08:21 +08:00
Zihao Mu	17f2b56291	remove never used code in onnximporter	2022-11-02 10:45:16 +08:00
Alexander Alekhin	ee9137f176	Merge pull request #22725 from zihaomu:fix_infinit_loop_in_tf	2022-10-31 17:03:03 +00:00
Zihao Mu	903bf0147e	Merge pull request #22666 from zihaomu:support_onnx_qdq_model DNN: let Quant and Dequant of ONNX_importer support the Constant input. * let Quant and Dequant support the Constant input. * fix negative value of axis.	2022-10-31 16:06:31 +00:00
Zihao Mu	18fbb72f7d	fix the infinite loop in tf importer.	2022-10-31 20:10:25 +08:00
Alexander Smorkalov	22f8fb4d5c	Do not fail tests in Yolo v7 model was not found.	2022-10-24 17:59:18 +03:00
Alexander Smorkalov	23edec83fb	Merge pull request #22667 from zihaomu:bug_fix_in_winograd DNN: bug fixed in Winograd	2022-10-21 17:54:13 +03:00
Alexander Smorkalov	e4cd430710	Merge pull request #22653 from WanliZhong:issue22597 DNN-TF: let StridedSlice layer support const input	2022-10-21 17:51:00 +03:00
Dmitry Kurtaev	35b2cff295	Merge pull request #22656 from dkurt:halide_fixes * Fixes for Halide * Enable some Halide tests	2022-10-21 17:49:49 +03:00
Zihao Mu	cee8c86b6e	fixed bug at winograd of SIMD128 and more robust code.	2022-10-21 19:14:54 +08:00
Alexander Smorkalov	5d292826b2	Merge pull request #22593 from zihaomu:optimize_wino optimize winograd futher more	2022-10-19 13:08:32 +03:00
Alexander Smorkalov	f378f02954	Merge pull request #22652 from rogday:cuda_test_fixes Address CUDA-related errors	2022-10-19 09:37:12 +03:00
Zhi-Qiang Zhou	c8561eae2d	Update region_layer.cpp Fix objectness (dstData[index + 4]) is not assigned if new_coords == 1.	2022-10-19 11:17:23 +08:00
Smirnov Egor	dd14cf6a9c	address CUDA-related errors and enable cuda in elementwise ops	2022-10-18 16:54:42 +03:00
Alexander Smorkalov	ec7fc5adca	Merge pull request #22529 from fengyuentau:scatter_scatternd DNN: supports Scatter and ScatterND from ONNX	2022-10-17 14:57:46 +03:00
Alexander Smorkalov	02143cd0e2	Merge pull request #22531 from zihaomu:stop_rely_name Parsing quantized nodes does not rely on names	2022-10-17 11:20:24 +03:00
Alexander Smorkalov	1c5dcbcac8	Merge pull request #22639 from WanliZhong:issue#22625 DNN: Make Unsqueeze layer support negative axes	2022-10-17 09:27:49 +03:00
fengyuentau	d24d8f2abe	implementation of scatter and scatternd with conformance tests enabled	2022-10-17 11:30:32 +08:00
Alexander Alekhin	762481411d	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-10-15 16:44:47 +00:00
zoom	d816442e4d	Make Unsqueeze layer support negative axes.	2022-10-14 18:00:19 +08:00
Zihao Mu	0fa43e3aac	Optimize the winograd futher more.	2022-10-14 10:15:45 +08:00
zoom	9119692bb8	let StridedSlice layer support const input	2022-10-12 11:50:44 +08:00
Alexander Smorkalov	ec26541771	Merge pull request #22577 from zihaomu:Disable_winograd_branch_in_tryquantize DNN: add enableWinograd API for Net	2022-10-11 09:44:00 +03:00
Zihao Mu	d9eff7daeb	parse quantized nodes does not rely on name.	2022-10-10 17:08:46 +08:00
Alexander Smorkalov	3419e64dcf	Merge pull request #22611 from zihaomu:greaterOrEqual DNN: support GreaterOrEqual and LessOrEqual op in ONNX	2022-10-10 11:43:44 +03:00
Zihao Mu	1e2ceca4df	add enableWinograd API for Net.	2022-10-09 09:33:07 +08:00
Alexander Alekhin	347246901e	Merge pull request #21745 from alalek:dnn_plugin_openvino	2022-10-08 22:32:25 +00:00
Zihao Mu	9821fae59d	add greater_or_equal and less_or_equal ONNX support	2022-10-08 15:51:40 +08:00
Alexander Alekhin	43b2bb2c25	dnn: plugin support for OpenVINO	2022-10-07 16:57:31 +00:00
Alexander Smorkalov	96844b0ca5	Merge pull request #22554 from WanliZhong:slice_axes_no_seq DNN: Let Slice layer support non-sequential and negative axes	2022-10-03 10:15:55 +03:00
zoom	4557971481	enhance slice layer refactor the code for parsing Slice layer add test for Slice layer let 'begin' and 'end' resize to dims add opset message comment	2022-10-01 17:12:07 +08:00
Zihao Mu	15cfafb360	DNN: Remove unused code in onnx_importer.cpp	2022-09-29 10:53:43 +08:00
Voron	cbf43a54fb	added opencv for openvino tutorial	2022-09-28 12:05:28 +02:00
Alexander Smorkalov	a6274647a4	Merge pull request #21738 from rogday:gather add Gather implementation	2022-09-19 16:21:14 +03:00
Egor Smirnov	65f71ce2eb	add Gather implementation	2022-09-19 15:06:44 +03:00
Alexander Smorkalov	6aefb8e86f	Merge pull request #22290 from fengyuentau:naive_yolov7 Support for YOLOv7 ONNX (not simplified)	2022-09-19 14:43:18 +03:00
fengyuentau	4aef9b1c93	dnn: support yolov7 (not simplified)	2022-09-19 18:38:03 +08:00
Alexander Smorkalov	e1e9261450	Merge pull request #22479 from scottchou007:master Fix issues in opencv_test_dnn from conv48 kernels without bias	2022-09-16 09:05:55 +03:00
scottchou007	a3cb2020bc	Fix issues in opencv_test_dnn from conv48 kernels using uninitialized tensors when there is no bias.	2022-09-15 13:41:27 -07:00
Alexander Alekhin	65bdb3a544	dnn: eliminate GCC12 warning in total() call	2022-09-14 11:37:00 +00:00
Alexander Smorkalov	c2c8da2517	Merge pull request #22448 from Ichini24:reshape-permutations-fix changed names of permutations if Reshpe is in NHWC	2022-09-13 09:24:56 +03:00
wxsheng	4154bd0667	Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX * Add Loongson Advanced SIMD Extension support: -DCPU_BASELINE=LASX * Add resize.lasx.cpp for Loongson SIMD acceleration * Add imgwarp.lasx.cpp for Loongson SIMD acceleration * Add LASX acceleration support for dnn/conv * Add CV_PAUSE(v) for Loongarch * Set LASX by default on Loongarch64 * LoongArch: tune test threshold for Core/HAL.mat_decomp/15 Co-authored-by: shengwenxue <shengwenxue@loongson.cn>	2022-09-10 09:39:43 +03:00
Alexander Alekhin	ca7f964104	dnn: use inheritance for OpenVINO net impl	2022-09-06 18:05:00 +00:00
anton	337452b4c0	changed names of permutations if Reshpe is in NHWC	2022-09-03 19:02:41 +02:00
Zihao Mu	b69b1eae8f	fix bug 22450	2022-09-02 16:30:06 +08:00
Alexander Smorkalov	70fb1cd603	Merge pull request #22440 from zihaomu:fix_conv_bug	2022-08-30 07:01:05 +00:00
Alexander Smorkalov	d2c48b898c	Merge pull request #22306 from zihaomu:qgemm_and_squeeze_opset13_onnximporter	2022-08-30 06:33:57 +00:00
Zihao Mu	2d837efba7	add qgemm and squeeze op13 supported on ONNXImporter	2022-08-30 09:50:29 +08:00
Alexander Smorkalov	1fd45a1b85	Merge pull request #22362 from fengyuentau:conv_asym_pad_fuse Remove asymmetric padding in Conv layer since it is supported in CPU backend	2022-08-29 17:56:17 +03:00
Zihao Mu	2cd7e17b65	replace v_add with +	2022-08-29 17:15:35 +08:00
Alexander Smorkalov	2619099fe5	Merge pull request #22337 from zihaomu:load_ONNX_fp16_as_fp32 DNN: load fp16 ONNX model as fp32	2022-08-29 09:32:25 +03:00
fengyuentau	2959286eb5	tengine: supports conv with asymmetric padding	2022-08-29 02:51:26 +00:00
Zihao Mu	9638e34ab0	reuse WORDS_BIGENDIAN.	2022-08-27 07:42:38 +08:00
Zihao Mu	bb64db98d8	Further optimization of Conv2D, fused Conv_Add_Activation, bring latest code from ficus OpConv.fx. (#22401 )	2022-08-26 12:57:25 +03:00
Zihao Mu	7eaec9dd22	load fp16 as fp32 and align fp16 and double in onnx_graph_simplifie	2022-08-26 10:04:44 +08:00
Zihao Mu	5e92bf8e41	support silu activation in darknet	2022-08-22 10:51:29 +08:00
Alexander Alekhin	2ebdc04787	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-08-14 15:50:42 +00:00
fengyuentau	0cdff46725	tune for opencl	2022-08-14 17:47:48 +08:00
Alexander Alekhin	d0d115321d	Merge pull request #22350 from alalek:rework_psabi_warning	2022-08-13 15:05:41 +00:00
Alexander Smorkalov	bb71cb200e	Merge pull request #22199 from zihaomu:bug_fix_22195 DNN: Reduce Layer (add dynamic batch and ReduceSum support)	2022-08-11 12:59:51 +03:00
fengyuentau	e7e814fa8c	remove asymmetric padding checks	2022-08-10 19:52:44 +08:00
Alexander Alekhin	44b2f9637a	Revert "suppress warning on GCC 7 and later" This reverts commit `a630ad73cb`.	2022-08-07 15:43:10 +03:00
Alexander Smorkalov	b2b7193374	Merge pull request #22311 from zihaomu:layer_fused_optmized_mish DNN: add another two Mish activation to onnx_graph_simplifier	2022-08-05 14:22:06 +03:00
Zihao Mu	0614c40b42	add more skip for very long test case in test_dnn.	2022-08-02 14:58:05 +08:00
Zihao Mu	d4640f4647	support ReduceLayer without reshape layer.	2022-08-02 10:32:31 +08:00
Zihao Mu	57545653b1	replace new mish impl with softplus	2022-07-28 13:19:06 +08:00
Zihao Mu	3c5377ca1b	add another Mish graph simplifier.	2022-07-28 11:21:29 +08:00
HAN Liutong	e2bfe0ce76	Use "#if" instead of "#ifdef" for CV_SIMD128.	2022-07-21 03:23:57 +00:00
Zihao Mu	98c33c605d	batchsize dynamic is set to index 0.	2022-07-20 19:02:16 +08:00
rogday	ed69bcae2d	Merge pull request #21865 from rogday:nary_eltwise_layers Reimplementation of Element-wise layers with broadcasting support * init * semi-working initial version * add small_vector * wip * remove smallvec * add nary function * replace auto with Mat in lambda expr used in transform * uncomment asserts * autobuffer shape_buf & step_buf * fix a missing bracket * fixed a missing addLayer in parseElementWise * solve one-dimensional broadcast * remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called * one autobuffer for step & shape * temporal fix for the missing original dimension information * fix parseUnsqueeze when it gets a 1d tensor constant * support sum/mean/min/max with only one input * reuse old code to handle cases of two non-constant inputs * add condition to handle div & mul of two non-constant inputs * use \|\| instead of or * remove trainling spaces * enlarge buf in binary_forward to contain other buffer * use autobuffer in nary_forward * generate data randomly and add more cases for perf * add op and, or & xor * update perf_dnn * remove some comments * remove legacy; add two ONNX conformance tests in filter * move from cpu_denylist to all_denylist * adjust parsing for inputs>=2 Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2022-07-19 06:14:05 +03:00
fengyuentau	1c7b71bf9e	define data_layout as unknown for pack	2022-07-14 19:27:20 +08:00
Zihao Mu	1b8fba8e26	support ReduceSum with two input and dynamic shape batch size in ReduceLayer.	2022-07-13 13:46:16 +08:00
Zihao Mu	45fbb67aba	fix scale layer can not handle 1x1 weight correctly.	2022-07-13 11:25:27 +08:00
Zihao Mu	139c443770	Merge pull request #22183 from zihaomu:fastConv_ARMv7_compatible DNN: ARMv7 compatible fastConv * support armv7 on fastConv * remove whitespace.	2022-07-07 13:23:08 +03:00
Tomoaki Teshima	a630ad73cb	suppress warning on GCC 7 and later	2022-07-06 23:31:31 +09:00
Zihao Mu	a80fcacd90	Merge pull request #21372 from zihaomu:dnn_quantize_per_tensor Add per_tensor_quantize to int8 quantize * add per_tensor_quantize to dnn int8 module. * change api flag from perTensor to perChannel, and recognize quantize type and onnx importer. * change the default to hpp	2022-07-05 19:14:42 +03:00
Zihao Mu	59b870a87a	Merge pull request #21910 from zihaomu:fast_conv_ARM DNN: Accelerating convolution * Fast Conv of ARM, X86 and universal intrinsics. * improve code style. * error fixed. * improve the License * optimize memory allocated and Adjust the threshold. * change FasterRCNN_vgg16 to 2GB memory.	2022-07-01 13:03:15 +03:00
Zihao Mu	ef94275eb6	bug fixed of GEMM node in ONNX_importer	2022-06-22 21:08:48 +08:00
Wanli	a6ca48a1c2	Merge pull request #22100 from WanliZhong:issue_22015 Fix issue 22015, let Clip layer support 1-3 inputs * Fix issue 22015. Let layer Clip support 1-3 inputs. * Resolve other problems caused by modifications * Update onnx_importer.cpp added extra checks to min/max handling in Clip * Add assertions to check the size of the input * Add test for clip with min and max initializers * Separate test for "clip_init_min_max". Change the check method for input_size to provide a clearer message in case of problem. * Add tests for clip with min or max initializers * Change the implementation of getting input Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>	2022-06-22 14:21:16 +03:00
Zihao Mu	2411b825b4	bug fixed of GEMM node in ONNX_importer	2022-06-22 15:00:17 +08:00
Alexander Alekhin	583bd1a6e2	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-06-04 19:10:35 +00:00
Namgoo Lee	24547f40ff	remove const from functions returning by value	2022-05-26 21:30:41 +09:00
Alexander Alekhin	e9187ae38c	Merge pull request #22026 from alalek:update_version_3.4.18-pre	2022-05-24 20:23:28 +00:00
Alexander Alekhin	978dc76653	Merge pull request #22006 from rogday:21947_fix	2022-05-24 19:26:02 +00:00
rogday	a2ad997e97	fix vector access in TF::sortByExecutionOrder	2022-05-24 00:05:13 +03:00
Alexander Alekhin	e9428726ca	pre: OpenCV 4.6.0 (version++)	2022-05-23 19:25:16 +00:00
Alexander Alekhin	400906b433	pre: OpenCV 3.4.18 (version++)	2022-05-23 19:18:02 +00:00
berak	50d7c61c01	Update darknet_importer.cpp make it more obvious, that this is a '404', not a 'parsing' problem	2022-05-23 19:18:31 +02:00
Alexander Alekhin	d9bf522b27	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-05-23 16:06:14 +00:00
rogday	93dc0679ec	Merge pull request #21818 from rogday:revert_renaming * add prefixes to layer names and layer output names * dnn: OPENCV_DNN_ONNX_USE_LEGACY_NAMES runtime parameter Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2022-05-23 14:50:42 +00:00
Alexander Alekhin	bb5462e327	Merge pull request #21991 from fengyuentau:qconv_asympad	2022-05-19 17:20:04 +00:00
fengyuentau	ff88132620	support asymmetric paddings for qconv	2022-05-16 19:01:37 +08:00
OpenCV Developers	d9a444ca1a	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-05-14 11:23:21 +00:00
Yulv-git	15ac54d5d6	Fix some typos in modules/.	2022-04-30 13:40:07 +08:00
Zihao Mu	64ded50bbf	parsing depth2space and space2depth of ONNX importer	2022-04-29 10:17:02 +08:00
rogday	9cd5a0a1e6	Merge pull request #21884 from rogday:cuda_cleanup Fix CUDA compilation issues and adjust thresholds. * Fix CUDA compilation issues and adjust thresholds. * add conformance tests to denylist	2022-04-19 16:40:25 +00:00
OpenCV Developers	2985739b8c	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-04-16 14:41:15 +00:00
rogday	a2b84e9897	add assert to tf graph simplifier to address security concerns	2022-04-13 22:50:27 +03:00
OpenCV Pushbot	66f3c2673c	Merge pull request #21831 from zihaomu:sign_layer_onnx DNN: Add sign, shrink and reciprocal for onnx_impoter	2022-04-13 17:08:30 +00:00
OpenCV Pushbot	03c9648f2e	Merge pull request #21854 from opencv-pushbot:dnn_test_update_checks_face_detector_4.x	2022-04-12 17:20:22 +00:00
OpenCV Developers	e3a55af336	dnn(test): update opencv_face_detector checks original commit: `be4a432bea`	2022-04-11 20:27:06 +00:00
OpenCV Developers	be4a432bea	dnn(test): update opencv_face_detector checks	2022-04-11 20:26:25 +00:00
zihaomu	e36948cfbc	add ONNX OP sign, shrink and reciprocal	2022-04-07 15:32:12 +08:00
Alexander Alekhin	08d44f588f	dnn(test): update OpenVINO tests 2022.1.0 (OpenCV 4.x)	2022-04-05 14:13:38 +00:00
Alexander Alekhin	13a995cc1d	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-04-02 19:45:44 +00:00
Alexander Alekhin	4d927e73f1	dnn(test): update OpenVINO tests 2022.1.0	2022-04-02 17:42:53 +00:00
Alexander Alekhin	a233982931	Merge pull request #20938 from JulieBar:lstm_cuda2	2022-04-01 22:10:08 +00:00
Zihao Mu	7b582b71ba	Merge pull request #21036 from fengyuentau:timvx_backend_support dnn: TIM-VX NPU backend support * Add TimVX NPU backend for DNN module. * use official branch from tim-vx repo; fix detecting viv sdk Co-authored-by: fytao <yuantao.feng@outlook.com>	2022-03-31 21:42:11 +00:00
Smirnov Egor	abebbf04b1	Add CUDA support for LSTM. Co-authored-by: Julia Bareeva <jbareeva@gmail.com>	2022-03-31 16:38:22 +03:00
Alexander Alekhin	5e434073d4	Merge pull request #21796 from alalek:dnn_reduce_fixup_21601	2022-03-30 22:26:28 +00:00
Alexander Alekhin	6f5cf8c15f	dnn: fix ReduceLayer implementation, update OpenVINO tests	2022-03-30 20:03:41 +00:00
Alexander Alekhin	b687bc807a	dnn(test): update OpenVINO tests 2021.4.2	2022-03-30 18:58:35 +00:00
Alexander Alekhin	1339ebaa84	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2022-03-26 16:00:28 +00:00
Alexander Alekhin	c9b90884da	Merge pull request #21601 from zihaomu:add_reduceLayer	2022-03-26 10:20:10 +00:00
luz paz	8e8e4bbabc	dnn: fix various dnn related typos Fixes source comments and documentation related to dnn code.	2022-03-23 18:12:12 -04:00
Alexander Alekhin	4c79318694	dnn: fix index access	2022-03-19 06:54:07 +00:00
Zihao Mu	b6b5c27cec	Support for some reduce layers for onnx	2022-03-18 10:19:13 +08:00
Alexander Alekhin	685797f403	Merge pull request #21662 from alalek:dnn_split	2022-03-17 16:09:17 +00:00
rogday	93353aea70	Merge pull request #21522 from rogday:lstm Fix LSTM support in ONNX * fix LSTM and add peephole support * disable old tests * turn lambdas into functions * more hacks for c++98 * add assertions * slice fixes * backport of cuda-related fixes * address review comments	2022-03-15 09:14:05 +03:00
Alexander Alekhin	5bf3c1df24	Merge pull request #21715 from ilyachur:change_type_info_creation	2022-03-14 09:18:58 +00:00
Ilya Churaev	419918076e	Changed call of NodeTypeInfo constructor	2022-03-14 10:55:33 +03:00
Alexander Alekhin	a120adde63	dnn: add dnn.cpp file with information about git commits history	2022-03-08 19:22:47 +00:00
Alexander Alekhin	a80af177b6	dnn: split dnn.cpp code base commit: `19926e2979` original dnn.cpp content: `19926e2979/modules/dnn/src/dnn.cpp`	2022-03-08 19:22:46 +00:00
Tsukasa Sugiura	8db7d435b9	Merge pull request #21692 from UnaNancyOwen:add_softmax * add apply softmax option to ClassificationModel * remove default arguments of ClassificationModel::setSoftMax() * fix build for python * fix docs warning for setSoftMax() * add impl for ClassficationModel() * fix failed build for docs by trailing whitespace * move to implement classify() to ClassificationModel_Impl * move to implement softmax() to ClassificationModel_Impl * remove softmax from public method in ClassificationModel	2022-03-07 20:26:15 +00:00

... 3 4 5 6 7 ...

2298 Commits