opencv

mirror of https://github.com/opencv/opencv.git synced 2024-12-14 08:59:11 +08:00

Author	SHA1	Message	Date
Alexander Smorkalov	fc9208cff5	Merge branch 4.x	2024-07-17 10:08:16 +03:00
Yuantao Feng	e3858cc5a3	Merge pull request #25147 from fengyuentau:dnn/elementwise_layers/speedup * added v_erf and implemented gelu acceleration via vectorization * remove anonymous v_erf and use v_erf from intrin_math * enable perf for ov and cuda backend	2024-07-08 14:24:36 +03:00
Wanli	b637e3a66e	Merge pull request #25463 from WanliZhong:ocvface2YuNet Change opencv_face_detector related tests and samples from caffe to onnx #25463 Part of https://github.com/opencv/opencv/issues/25314 This PR aims to change the tests related to opencv_face_detector from caffe framework to onnx. Tests in `test_int8_layer.cpp` and `test_caffe_importer.cpp` will be removed in https://github.com/opencv/opencv/pull/25323 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-05-08 15:49:10 +03:00
alexlyulkov	1d1faaabef	Merge pull request #24411 from alexlyulkov:al/dnn-type-inference Added int32, int64 support and type inference to dnn #24411 Added a type inference to dnn similar to the shape inference, added int32 and int64 support. - Added getTypes method for layers that calculates layer outputs types and internals types from inputs types (Similar to getMemoryShapes). By default outputs and internals types = input[0] type - Added type inference pipeline similar to shape inference pipeline. LayersShapes struct (that is used in shape inference pipeline) now contains both shapes and types - All layers output blobs are now allocated using the calculated types from the type inference. - Inputs and constants with int32 and int64 types are not automatically converted into float32 now. - Added int32 and int64 support for all the layers with indexing and for all the layers required in tests. Added int32 and int64 support for CUDA: - Added host<->device data moving for int32 and int64 - Added int32 and int64 support for several layers (just slightly modified CUDA C++ templates) Passed all the accuracy tests on CPU, OCL, OCL_FP16, CUDA, CUDA_FP16. (except RAFT model) CURRENT PROBLEMS: - ONNX parser always converts int64 constants and layers attributes to int32, so some models with int64 constants doesn't work (e.g. RAFT). The solution is to disable int64->int32 conversion and fix attributes reading in a lot of ONNX layers parsers (https://github.com/opencv/opencv/issues/25102) - I didn't add type inference and int support to VULCAN, so it doesn't work at all now. - Some layers don't support int yet, so some unknown models may not work. CURRENT WORKAROUNDS: - CPU arg_layer indides are implemented in int32 followed by a int32->int64 conversion (the master branch has the same workaround with int32->float conversion) - CPU and OCL pooling_layer indices are implemented in float followed by a float->int64 conversion - CPU gather_layer indices are implemented in int32, so int64 indices are converted to int32 (the master branch has the same workaround with float->int32 conversion) DISABLED TESTS: - RAFT model REMOVED TESTS: - Greater_input_dtype_int64 (because it doesn't fit ONNX rules, the whole test is just comparing float tensor with int constant) TODO IN NEXT PULL REQUESTS: - Add int64 support for ONNX parser - Add int support for more layers - Add int support for OCL (currently int layers just run on CPU) - Add int tests - Add int support for other backends	2024-03-01 17:07:38 +03:00
Alexander Smorkalov	3a55f50133	Merge branch 4.x	2024-02-12 14:20:35 +03:00
Alexander Smorkalov	77af137285	Fix proto and weights mess in dnn performance tests.	2024-02-07 09:16:09 +03:00
Haosonn	87f749277d	Merge pull request #24768 from Haosonn:pre-pr-2 Vulkan backend for NaryEltwiseLayer in DNN module #24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>	2024-01-29 18:41:49 +03:00
Alexander Smorkalov	decf6538a2	Merge branch 4.x	2024-01-23 17:06:52 +03:00
Alexander Smorkalov	c739117a7c	Merge branch 4.x	2024-01-19 17:32:22 +03:00
Alexander Smorkalov	ac4c0bffac	Merge pull request #24813 from fengyuentau:speedup_scatter dnn: improve scatter and scatterND speed with multi-threading	2024-01-17 17:16:50 +03:00
jimmylaw21	a7fa1e6f4b	Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer dnn onnx: add group norm layer #24610 dnn onnx: add group norm layer Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2024-01-12 15:13:26 +03:00
fengyuentau	13127365e2	better comment	2024-01-08 11:55:06 +08:00
Yuantao Feng	b7d70613e4	fix failed assertion in debug build	2024-01-05 18:33:01 +00:00
fengyuentau	2ed97b9ef3	multi-threaded scatterND and refactor perf	2024-01-05 18:15:59 +08:00
fengyuentau	63cde0b90d	multi-threaded scatter and refactor perf	2024-01-05 17:24:09 +08:00
Alexander Alekhin	f49b26182b	dnn(test): skip very long debug tests, reduce test time	2023-12-25 08:44:06 +00:00
Yuantao Feng	0521a3a384	Merge pull request #24476 from fengyuentau:attention_layer dnn: add attention layer #24476 Resolves #24609 Merge with: https://github.com/opencv/opencv_extra/pull/1128. Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention. TODO: - [x] benchmark (before this PR vs. with this PR vs. ORT). - [x] Layer fusion: Take care Slice with end=INT64_MAX. - [x] Layer fusion: match more potential attention (VIT) patterns. - [x] Single-head attention is supported. - [x] Test AttentionSubgraph fusion. - [x] Add acc tests for VIT_B_32 and VitTrack - [x] Add perf tests for VIT_B_32 and VitTrack ## Benchmarks Platform: Macbook Air M1. ### Attention Subgraph Input scale: [1, 197, 768]. \| \| mean (ms) \| median (ms) \| min (ms) \| \| ---------------------- \| --------- \| ----------- \| -------- \| \| w/ Attention (this PR) \| 3.75 \| 3.68 \| 3.22 \| \| w/o Attention \| 9.06 \| 9.01 \| 8.24 \| \| ORT (python) \| 4.32 \| 2.63 \| 2.50 \| ### ViTs All data in millisecond (ms). \| ViTs \| With Attention \| Without Attention \| ORT \| \| -------- \| -------------- \| ----------------- \| ------ \| \| vit_b_16 \| 302.77 \| 365.35 \| 109.70 \| \| vit_b_32 \| 89.92 \| 116.22 \| 30.36 \| \| vit_l_16 \| 1593.32 \| 1730.74 \| 419.92 \| \| vit_l_32 \| 468.11 \| 577.41 \| 134.12 \| \| VitTrack \| 3.80 \| 3.87 \| 2.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-20 19:35:07 +03:00
Yuantao Feng	fa5ed62a66	Merge pull request #24694 from fengyuentau:matmul_refactor dnn: refactor ONNX MatMul with fastGemm #24694 Done: - [x] add backends - [x] CUDA - [x] OpenVINO - [x] CANN - [x] OpenCL - [x] Vulkan - [x] add perf tests - [x] const B case ### Benchmark Tests are done on M1. All data is in milliseconds (ms). \| Configuration \| MatMul (Prepacked) \| MatMul \| InnerProduct \| \| - \| - \| - \| - \| \| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 \| 0.39 \| 0.41 \| 1.33 \| \| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0 \| 0.42 \| 0.42 \| 1.17 \| \| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0 \| 0.13 \| 0.15 \| 0.33 \| \| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0 \| 0.11 \| 0.13 \| 0.22 \| \| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 \| 0.46 \| 0.54 \| 1.46 \| \| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0 \| 0.46 \| 0.95 \| 1.74 \| \| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0 \| 0.18 \| 0.32 \| 0.43 \| \| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0 \| 0.15 \| 0.25 \| 0.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-19 19:36:41 +03:00
Wanli	6ee71fee88	Merge pull request #24547 from WanliZhong:refactor_conv_perf_test Classify and extend convolution and depthwise performance tests #24547 This PR aims to: 1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added) 2. Classify the existing convolution performance test to below cases - CONV_1x1 - CONV_3x3_S1_D1 (winograd) - CONV - DEPTHWISE 3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned): (i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved. (ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]` (iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... ` > Reproduced: 1. follow step in https://github.com/alalek/opencv/commit/dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt) Performance test result on Apple M2 Test result details: [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md) Additional test result details with FP16: [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip) Brief summary for 4.8.1 vs 4.7.0 or 4.6.0: 1. `CONV_1x1_S1_D1` dropped significant with small or large input shape. 2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0. --- Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html): 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads. Test result details: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md) Brief summary for 4.8.1 vs 4.5.5: 1. `CONV_5x5_S1_D1` dropped significant. 2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape. --- TODO: - [x] Perform tests on arm with each opencv version - [x] Perform tests on x86 with each opencv version - [x] Split each test classification with single test config - [x] test enable fp16	2023-12-11 21:35:33 +03:00
Abduragim Shtanchaev	8c10545d3c	Merge pull request #24509 from Abdurrahheem:ash/dev_einsum_fast_gemm Fast gemm for einsum #24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-16 16:20:17 +03:00
Alexander Smorkalov	34f34f6227	Merge branch 4.x	2023-11-08 14:39:48 +03:00
Abduragim Shtanchaev	9d0c8a9edb	Merge pull request #24445 from Abdurrahheem:ash/dev_einsum_pref Einsum Layer Performance Test #24445 ## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs Notation: - WX: windows10_x64 - MX: macos_x64 - MA: macos_arm64 - UX: ubuntu_x64 - UA: ubuntu_arm64 All data in ms (milliseconds). Gemm is backend for matrix multiplication --- Benchmarks: \| Equation \| Inputs Mat Dims \| UX (ms) \| UA (ms) \| MX (ms) \| MA (ms) \| WX (ms) \| \|-------------------------\|-----------------------------------\|----------------\|---------\|---------\|---------\|---------\| \| "ij, jk -> ik" \| [2, 3], [3,2] \| 0.04 ± 0.00 \| - \| - \| - \| - \| \| "ij, jk -> ik" \| [20, 30], [30,20] \| 0.08 ± 0.00 \| - \| - \| - \| - \| \| "ij, jk -> ik" \| [113, 127], [127,113] \| 2.41 ± 0.05 \| - \| - \| - \| - \| \| "imkj, injs -> imnks" \| [1, 4, 7, 9], [1, 5, 9, 8] \| 0.11 ± 0.00 \| - \| - \| - \| - \| \| "imkj, injs -> imnks" \| [1, 4, 70, 90], [1, 5, 90, 80] \| 15.49 ± 0.46 \| - \| - \| - \| - \| \| "imkj, injs -> imnks" \| [1, 4, 73, 91], [1, 5, 91, 57] \| 11.53 ± 0.06 \| - \| - \| - \| - \| \| "ij -> i" \| [30, 40] \| 0.03 ± 0.00 \| - \| - \| - \| - \| \| "ij -> i" \| [113, 374] \| 0.13 ± 0.00 \| - \| - \| - \| - \| \| "...ij -> ...i" \| [30, 40] \| 0.03 ± 0.00 \| - \| - \| - \| - \| \| "...ij -> ...i" \| [113, 374] \| 0.13 ± 0.00 \| - \| - \| - \| - \| \| "...ij, ...jk -> ...ik" \| [40, 50], [50,80] \| 0.37 ± 0.01 \| - \| - \| - \| - \| \| "...ij, ...jk -> ...ik" \| [47, 51], [51, 83] \| 0.43 ± 0.01 \| - \| - \| - \| - \| ----- ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-08 11:56:21 +03:00
Yuantao Feng	ee0822dc4d	Merge pull request #24378 from fengyuentau:instance_norm dnn onnx: add instance norm layer #24378 Resolves https://github.com/opencv/opencv/issues/24377 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644 \| Perf \| multi-thread \| single-thread \| \| - \| - \| - \| \| x: [2, 64, 180, 240] \| 3.95ms \| 11.12ms \| Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` force_builders=Linux OpenCL,Win64 OpenCL,Custom buildworker:Custom=linux-4 build_image:Custom=ubuntu:18.04 modules_filter:Custom=none disable_ipp:Custom=ON ```	2023-11-07 12:59:10 +03:00
Wanli	ed52f7feea	Improve and refactor softmax layer (#24466 ) * improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD	2023-11-06 04:48:32 +03:00
alexlyulkov	b71be65f57	Merge pull request #24294 from alexlyulkov:al/remove-torch7-from-dnn Remove torch (old torch7) from dnn in 5.x #24294 Merge with https://github.com/opencv/opencv_extra/pull/1097 Completely removed torch (old torch7) from dnn: - removed modules/dnn/src/torch directory that contained torch7 model parser - removed readNetFromTorch() and readTorchBlob() public functions - removed torch7 references from comments and help texts - replaced links to t7 models by links to similar onnx models in js_style_transfer turtorial (similar to https://github.com/opencv/opencv/pull/24245/files)	2023-10-26 11:27:56 +03:00
Alexander Smorkalov	97620c053f	Merge branch 4.x	2023-10-23 11:53:04 +03:00
Aser Atawya	240b245105	Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX GSoC Add ONNX Support for GatherElements #24092 Merge with: https://github.com/opencv/opencv_extra/pull/1082 Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements) Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-18 10:41:47 +03:00
Yuantao Feng	d789cb459c	Merge pull request #24231 from fengyuentau:halide_cleanup_5.x dnn: cleanup of halide backend for 5.x #24231 Merge with https://github.com/opencv/opencv_extra/pull/1092. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-13 16:53:18 +03:00
Wanli	62b5470b78	Merge pull request #24298 from WanliZhong:extend_perf_net_test Extend performance test models #24298 Merged With https://github.com/opencv/opencv_extra/pull/1095 This PR aims to extend the performance tests. - YOLOv5 for object detection - YOLOv8 for object detection - EfficientNet for classification Models from OpenCV Zoo: - YOLOX for object detection - YuNet for face detection - SFace for face recognization - MPPalm for palm detection - MPHand for hand landmark - MPPose for pose estimation - ViTTrack for object tracking - PPOCRv3 for text detection - CRNN for text recognization - PPHumanSeg for human segmentation If other models should be added, please leave some comments. Thanks! Build opencv with script: ```shell -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_opencv_gapi=OFF -DINSTALL_PYTHON_EXAMPLES=OFF -DINSTALL_C_EXAMPLES=OFF -DBUILD_DOCS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_ZLIB=OFF -DWITH_FFMPEG=OFF ``` Performance Test on Apple M2 CPU ```shell MacOS 14.0 8 threads ``` 1 thread: \| Name of Test \| 4.5.5-1th \| 4.6.0-1th \| 4.7.0-1th \| 4.8.0-1th \| 4.8.1-1th \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 76.244 \| 76.611 \| 62.534 \| 57.678 \| 57.238 \| \| EfficientNet \| --- \| --- \| 109.224 \| 130.753 \| 109.076 \| \| MPHand \| --- \| --- \| 19.289 \| 22.727 \| 27.593 \| \| MPPalm \| 47.150 \| 47.061 \| 41.064 \| 65.598 \| 40.109 \| \| MPPose \| --- \| --- \| 26.592 \| 32.022 \| 26.956 \| \| PPHumanSeg \| 41.672 \| 41.790 \| 27.819 \| 27.212 \| 30.461 \| \| PPOCRv3 \| --- \| --- \| 140.371 \| 187.922 \| 170.026 \| \| SFace \| 43.830 \| 43.834 \| 27.575 \| 30.653 \| 26.387 \| \| ViTTrack \| --- \| --- \| --- \| 14.617 \| 15.028 \| \| YOLOX \| 1060.507 \| 1061.361 \| 495.816 \| 533.309 \| 549.713 \| \| YOLOv5 \| --- \| --- \| --- \| 191.350 \| 193.261 \| \| YOLOv8 \| --- \| --- \| 198.893 \| 218.733 \| 223.142 \| \| YuNet \| 27.084 \| 27.095 \| 26.238 \| 30.512 \| 34.439 \| \| MobileNet_SSD_Caffe \| 44.742 \| 44.565 \| 33.005 \| 29.421 \| 29.286 \| \| MobileNet_SSD_v1_TensorFlow \| 49.352 \| 49.274 \| 35.163 \| 32.134 \| 31.904 \| \| MobileNet_SSD_v2_TensorFlow \| 83.537 \| 83.379 \| 56.403 \| 42.947 \| 42.148 \| \| ResNet_50 \| 148.872 \| 148.817 \| 77.331 \| 67.682 \| 67.760 \| n threads: \| Name of Test \| 4.5.5-nth \| 4.6.0-nth \| 4.7.0-nth \| 4.8.0-nth \| 4.8.1-nth \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 44.262 \| 44.408 \| 41.540 \| 40.731 \| 41.151 \| \| EfficientNet \| --- \| --- \| 28.683 \| 42.676 \| 38.204 \| \| MPHand \| --- \| --- \| 6.738 \| 13.126 \| 8.155 \| \| MPPalm \| 16.613 \| 16.588 \| 12.477 \| 31.370 \| 17.048 \| \| MPPose \| --- \| --- \| 12.985 \| 19.700 \| 16.537 \| \| PPHumanSeg \| 14.993 \| 15.133 \| 13.438 \| 15.269 \| 15.252 \| \| PPOCRv3 \| --- \| --- \| 63.752 \| 85.469 \| 76.190 \| \| SFace \| 10.685 \| 10.822 \| 8.127 \| 8.318 \| 7.934 \| \| ViTTrack \| --- \| --- \| --- \| 10.079 \| 9.579 \| \| YOLOX \| 417.358 \| 422.977 \| 230.036 \| 234.662 \| 228.555 \| \| YOLOv5 \| --- \| --- \| --- \| 74.249 \| 75.480 \| \| YOLOv8 \| --- \| --- \| 63.762 \| 88.770 \| 70.927 \| \| YuNet \| 8.589 \| 8.731 \| 11.269 \| 16.466 \| 14.513 \| \| MobileNet_SSD_Caffe \| 12.575 \| 12.636 \| 11.529 \| 12.114 \| 12.236 \| \| MobileNet_SSD_v1_TensorFlow \| 13.922 \| 14.160 \| 13.078 \| 12.124 \| 13.298 \| \| MobileNet_SSD_v2_TensorFlow \| 25.096 \| 24.836 \| 22.823 \| 20.238 \| 20.319 \| \| ResNet_50 \| 41.561 \| 41.296 \| 29.092 \| 30.412 \| 29.339 \| Performance Test on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html) ```shell Ubuntu 22.04.2 LTS 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz) 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz) 20 threads ``` 1 thread: \| Name of Test \| 4.5.5-1th \| 4.6.0-1th \| 4.7.0-1th \| 4.8.0-1th \| 4.8.1-1th \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 16.752 \| 16.851 \| 16.840 \| 16.625 \| 16.663 \| \| EfficientNet \| --- \| --- \| 61.107 \| 76.037 \| 53.890 \| \| MPHand \| --- \| --- \| 8.906 \| 9.969 \| 8.403 \| \| MPPalm \| 24.243 \| 24.638 \| 18.104 \| 35.140 \| 18.387 \| \| MPPose \| --- \| --- \| 12.322 \| 16.515 \| 12.355 \| \| PPHumanSeg \| 15.249 \| 15.303 \| 10.203 \| 10.298 \| 10.353 \| \| PPOCRv3 \| --- \| --- \| 87.788 \| 144.253 \| 90.648 \| \| SFace \| 15.583 \| 15.884 \| 13.957 \| 13.298 \| 13.284 \| \| ViTTrack \| --- \| --- \| --- \| 11.760 \| 11.710 \| \| YOLOX \| 324.927 \| 325.173 \| 235.986 \| 253.653 \| 254.472 \| \| YOLOv5 \| --- \| --- \| --- \| 102.163 \| 102.621 \| \| YOLOv8 \| --- \| --- \| 87.013 \| 103.182 \| 103.146 \| \| YuNet \| 12.806 \| 12.645 \| 10.515 \| 12.647 \| 12.711 \| \| MobileNet_SSD_Caffe \| 23.556 \| 23.768 \| 24.304 \| 22.569 \| 22.602 \| \| MobileNet_SSD_v1_TensorFlow \| 26.136 \| 26.276 \| 26.854 \| 24.828 \| 24.961 \| \| MobileNet_SSD_v2_TensorFlow \| 43.521 \| 43.614 \| 46.892 \| 44.044 \| 44.682 \| \| ResNet_50 \| 73.588 \| 73.501 \| 75.191 \| 66.893 \| 65.144 \| n thread: \| Name of Test \| 4.5.5-nth \| 4.6.0-nth \| 4.7.0-nth \| 4.8.0-nth \| 4.8.1-nth \| \|--------------\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\| \| CRNN \| 8.665 \| 8.827 \| 10.643 \| 7.703 \| 7.743 \| \| EfficientNet \| --- \| --- \| 16.591 \| 12.715 \| 9.022 \| \| MPHand \| --- \| --- \| 2.678 \| 2.785 \| 1.680 \| \| MPPalm \| 5.309 \| 5.319 \| 3.822 \| 10.568 \| 4.467 \| \| MPPose \| --- \| --- \| 3.644 \| 6.088 \| 4.608 \| \| PPHumanSeg \| 4.756 \| 4.865 \| 5.084 \| 5.179 \| 5.148 \| \| PPOCRv3 \| --- \| --- \| 32.023 \| 50.591 \| 32.414 \| \| SFace \| 3.838 \| 3.980 \| 4.629 \| 3.145 \| 3.155 \| \| ViTTrack \| --- \| --- \| --- \| 10.335 \| 10.357 \| \| YOLOX \| 68.314 \| 68.081 \| 82.801 \| 74.219 \| 73.970 \| \| YOLOv5 \| --- \| --- \| --- \| 47.150 \| 47.523 \| \| YOLOv8 \| --- \| --- \| 32.195 \| 30.359 \| 30.267 \| \| YuNet \| 2.604 \| 2.644 \| 2.622 \| 3.278 \| 3.349 \| \| MobileNet_SSD_Caffe \| 13.005 \| 5.935 \| 8.586 \| 4.629 \| 4.713 \| \| MobileNet_SSD_v1_TensorFlow \| 7.002 \| 7.129 \| 9.314 \| 5.271 \| 5.213 \| \| MobileNet_SSD_v2_TensorFlow \| 11.939 \| 12.111 \| 22.688 \| 12.038 \| 12.086 \| \| ResNet_50 \| 18.227 \| 18.600 \| 26.150 \| 15.584 \| 15.706 \|	2023-10-04 13:05:32 +03:00
Dmitry Kurtaev	c7ec0d599a	Merge pull request #23987 from dkurt:openvino_int8_backend OpenVINO backend for INT8 models #23987 ### Pull Request Readiness Checklist TODO: - [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069) - [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum) - [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039) - [x] Single layer tests (convolution) - [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~ Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`: \| backend \| performance (median time) \| \|---\|---\| \| OpenCV \| 77.42ms \| \| OpenVINO 2023.0 \| 10.90ms \| CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz` Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef --- See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-28 16:24:43 +03:00
Dmitry Kurtaev	2b6d0f36f0	Merge pull request #24309 from dkurt:gemm_ov_hotfix Update OpenVINO init of new GEMM layer #24309 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request CI validation: - [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368 - [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373 Checklist: - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 10:25:45 +03:00
Yuantao Feng	8a96e34e33	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix	2023-09-20 00:53:34 +03:00
Dmitry Kurtaev	d88ad46978	Remove explitit transB attribute from MatMul perf test	2023-08-18 15:10:14 +03:00
Dmitry Kurtaev	8ad5eb521a	Merge pull request #24120 from dkurt:actualize_dnn_links OCL_FP16 MatMul with large batch * Workaround FP16 MatMul with large batch * Fix OCL reinitialization * Higher thresholds for INT8 quantization * Try fix gemm_buffer_NT for half (columns) * Fix GEMM by rows * Add batch dimension to InnerProduct layer test * Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding * Batch 16 * Replace all vload4 * Version suffix for MobileNetSSD_deploy Caffe model	2023-08-16 15:46:11 +03:00
Dmitry Kurtaev	96f23e3da1	Merge pull request #24080 from dkurt:dnn_cuda_layers Resolve uncovered CUDA dnn layer #24080 ### Pull Request Readiness Checklist * Gelu activation layer on CUDA * Try to relax GEMM from ONNX resolves https://github.com/opencv/opencv/issues/24064 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-08-03 09:13:42 +03:00
Zihao Mu	1920993525	Merge pull request #23952 from zihaomu:fix_depth_conv_5x5 DNN: optimize the speed of general Depth-wise #23952 Try to solve the issue: https://github.com/opencv/opencv/issues/23941 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-07-14 17:34:39 +03:00
wanli	e4360294c5	make 'abcd op 1b11' broadcast support cuda	2023-04-23 17:46:50 +08:00
wanli	c8f5e228fc	release MUL and ADD operator on CUDA	2023-02-10 19:33:59 +08:00
Yuantao Feng	4d918ba40b	Merge pull request #23047 from fengyuentau:layer_norm dnn: add layer normalization for vision transformers * add layer norm onnx parser, impl and tests * add onnx graph simplifier for layer norm expanded * handle the case when constants are of type Initializer * add test case for layer norm expanded with initializers * use CV_Assert & CV_CheckType in place of CV_Assert_N; use forward_fallback for OCL_FP16 * use const ref / ref in parameters of invoker::run; extract inner const if from nested loop; use size_t in place of ull * template hasBias * remove trailing whitespace * use pointer parameter with null check; move normSize division & mean_square division outside of loop; use std::max to ensure positive value before std::sqrt * refactor implementation, optimize parallel_for * disable layer norm expanded * remove the removal of layer norm optional outputs	2023-01-27 16:35:59 +03:00
Maksim Shabunin	d35fbe6bfc	dnn: updated YOLOv4-tiny model and tests	2022-12-22 15:49:21 +03:00
Zihao Mu	0a650b573b	Merge pull request #22840 from zihaomu:optimze_conv_memory_usage DNN: reduce the memory used in convolution layer * reduce the memory in winograd and disabel the test when usage memory is larger than 2gb. * remove VERY_LOG tag	2022-12-08 12:57:13 +00:00
zoom	11d492b0b9	Let part of the operators in nary_eltwise support cuda	2022-11-02 14:08:21 +08:00
fengyuentau	d24d8f2abe	implementation of scatter and scatternd with conformance tests enabled	2022-10-17 11:30:32 +08:00
rogday	ed69bcae2d	Merge pull request #21865 from rogday:nary_eltwise_layers Reimplementation of Element-wise layers with broadcasting support * init * semi-working initial version * add small_vector * wip * remove smallvec * add nary function * replace auto with Mat in lambda expr used in transform * uncomment asserts * autobuffer shape_buf & step_buf * fix a missing bracket * fixed a missing addLayer in parseElementWise * solve one-dimensional broadcast * remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called * one autobuffer for step & shape * temporal fix for the missing original dimension information * fix parseUnsqueeze when it gets a 1d tensor constant * support sum/mean/min/max with only one input * reuse old code to handle cases of two non-constant inputs * add condition to handle div & mul of two non-constant inputs * use \|\| instead of or * remove trainling spaces * enlarge buf in binary_forward to contain other buffer * use autobuffer in nary_forward * generate data randomly and add more cases for perf * add op and, or & xor * update perf_dnn * remove some comments * remove legacy; add two ONNX conformance tests in filter * move from cpu_denylist to all_denylist * adjust parsing for inputs>=2 Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2022-07-19 06:14:05 +03:00
Alexander Alekhin	8b4fa2605e	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2021-12-03 12:32:49 +00:00
Andrew Ryrie	ea7d4be3f8	Merge pull request #20658 from smbz:lstm_optimisation * dnn: LSTM optimisation This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm. fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications: - Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned. - Allow for weight matrices where the number of columns is not a multiple of 8. I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on. * Fix warning about initialisation order * Remove C++11 syntax * Fix build when AVX(2) is not available In this case the CV_TRY_X macros are defined to 0, rather than being undefined. * Minor changes as requested: - Don't check hardware support for AVX(2) when dispatch is disabled for these - Add braces * Fix out-of-bounds access in fully connected layer The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway). This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems. * Improve tail mask handling - Use static array for generating tail masks (as requested) - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs * Revert whitespace change * Improve readability of conditions for using AVX * dnn(lstm): minor coding style changes, replaced left aligned load	2021-11-29 21:43:00 +00:00
Alexander Alekhin	24fcb7f813	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2021-09-25 17:50:00 +00:00
Alexander Alekhin	1aacb9bb15	dnn(perf): update convolution tests	2021-09-10 13:11:02 +00:00
Alexander Alekhin	624d532000	Merge remote-tracking branch 'upstream/3.4' into merge-3.4	2020-12-17 21:05:34 +00:00
Alexander Alekhin	28aab134db	dnn(test): update tests for OpenVINO 2021.2	2020-12-17 07:53:35 +00:00

1 2 3

137 Commits