opencv

mirror of https://github.com/opencv/opencv.git synced 2025-07-25 22:57:53 +08:00

Author	SHA1	Message	Date
Yuantao Feng	025e7602b9	Merge pull request #25166 from fengyuentau:fix_cann_gemm dnn (CANN): Fix incorrect shape of 1d bias in Gemm #25166 Gemm layer was refactored some time ago. Users found that the mobilenet example in https://github.com/opencv/opencv/wiki/Huawei-CANN-Backend does not work because of incorrect shape set for 1d bias in Gemm. This PR resolves this issue. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-25 09:47:28 +03:00
Dmitry Kurtaev	0b6c9a2123	Merge pull request #25181 from dkurt:release_conv_weights Release convolution weightsMat after usage #25181 ### Pull Request Readiness Checklist related (but not resolved): https://github.com/opencv/opencv/issues/24134 Minor memory footprint improvement. Also, adds a test for VmHWM. RAM top memory usage (-230MB) \| YOLOv3 (237MB file) \| 4.x \| PR \| \|---------------------\|---------\|---------\| \| no winograd \| 808 MB \| 581 MB \| \| winograd \| 1985 MB \| 1750 MB \| See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-03-25 09:03:28 +03:00
Oleg Pipikin	6da2ddcf0e	Fix for OpenVINO 2024.0 Remove support OpenVINO lower than 2022.1 release Remove legacy InferenceEngine wrappers	2024-03-18 15:05:50 +04:00
Dmitry Kurtaev	6a370ba9e7	Avoid extra memset in convolution initialization	2024-03-08 10:46:07 +03:00
Dmitry Kurtaev	98aed21dd4	Avoid copy of ONNX graph during import	2024-03-05 18:22:46 +03:00
Alexander Smorkalov	daa8f7dfc6	Partially back-port #25075 to 4.x	2024-03-05 12:15:39 +03:00
Laurent Berger	5fe3933346	Merge pull request #25120 from LaurentBerger:I25103 Fixed ReduceMean layer behaviour #25120 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake `a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)`	2024-03-04 09:36:53 +03:00
CSBVision	e8582f2cf8	Update net_impl.cpp See issue #25112	2024-03-01 14:56:00 +01:00
Yuantao Feng	5aa5c39210	Merge pull request #25076 from fengyuentau:improve_attention dnn: try improving performance of Attention layer #25076 Checklist: - [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward - [x] Use layer internal buffer over temporary Mat buffer - [x] Try a single fastGemmBatch on the Q/K/V calculation Performance: Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}. Data is in millisecond. \| \| macOS 14.2.1, Apple M1 \| Ubuntu 22.04.2, Intel i7 12700K \| \| - \| - \| - \| \| Current \| 10.96 \| 1.58 \| \| w/ Mat \| 6.27 \| 1.41 \| \| w/ Internals \| 5.87 \| 1.38 \| \| w/ fastGemmBatch \| 6.12 \| 2.14 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-02-28 16:47:08 +03:00
Laurent Berger	3c712cf77d	Merge pull request #25100 from LaurentBerger:I25077 Fix issue #25077 #25100 Fixes https://github.com/opencv/opencv/issues/25077 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-02-27 14:15:11 +03:00
Dhanwanth1803	12aa0fe898	Merge pull request #24985 from Dhanwanth1803:hardswish Fixes #24974 support HardSwishInt8 #24985 As given very clearly in the issue #24974 I made the required 2 changes to implement HardSwish Layer in INT8. Requesting comments. resolves https://github.com/opencv/opencv/issues/24974 - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: Dhanwanth1803 <dhanwanthvarala@gmail,com>	2024-02-16 18:19:29 +03:00
fengyuentau	fcaa8ce3c2	fix incorrect steps and elemsize when dtype changes	2024-02-06 16:27:25 +08:00
Haosonn	87f749277d	Merge pull request #24768 from Haosonn:pre-pr-2 Vulkan backend for NaryEltwiseLayer in DNN module #24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>	2024-01-29 18:41:49 +03:00
Alexander Alekhin	efc9837df1	Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage DNN: avoid CV_16S usage for FP16 #24892 Merge after: #24918 TODO: - [x] measure performance changes - [x] optimize convertTo for OpenCL: #24918 12700K iGPU: \|Name of Test\|0\|1\|1 vs 0 (x-factor)\| \|---\|:-:\|:-:\|:-:\| \|AlexNet::DNNTestNetwork::OCV/OCL_FP16\|7.441\|7.480\|0.99\| \|CRNN::DNNTestNetwork::OCV/OCL_FP16\|10.776\|10.736\|1.00\| \|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16\|52.762\|52.833\|1.00\| \|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16\|60.694\|60.721\|1.00\| \|EfficientNet::DNNTestNetwork::OCV/OCL_FP16\|33.373\|33.173\|1.01\| \|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16\|81.840\|81.724\|1.00\| \|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16\|20.965\|20.927\|1.00\| \|Inception_5h::DNNTestNetwork::OCV/OCL_FP16\|22.204\|22.173\|1.00\| \|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|47.115\|47.460\|0.99\| \|MPHand::DNNTestNetwork::OCV/OCL_FP16\|6.760\|6.670\|1.01\| \|MPPalm::DNNTestNetwork::OCV/OCL_FP16\|10.188\|10.171\|1.00\| \|MPPose::DNNTestNetwork::OCV/OCL_FP16\|12.510\|12.561\|1.00\| \|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16\|17.290\|17.072\|1.01\| \|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|19.473\|19.306\|1.01\| \|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16\|22.874\|23.404\|0.98\| \|OpenFace::DNNTestNetwork::OCV/OCL_FP16\|9.568\|9.517\|1.01\| \|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16\|539.899\|539.845\|1.00\| \|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16\|18.015\|18.769\|0.96\| \|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16\|63.122\|63.540\|0.99\| \|ResNet_50::DNNTestNetwork::OCV/OCL_FP16\|34.947\|34.925\|1.00\| \|SFace::DNNTestNetwork::OCV/OCL_FP16\|10.249\|10.206\|1.00\| \|SSD::DNNTestNetwork::OCV/OCL_FP16\|213.068\|213.108\|1.00\| \|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16\|4.867\|4.878\|1.00\| \|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16\|200.563\|190.788\|1.05\| \|VitTrack::DNNTestNetwork::OCV/OCL_FP16\|7.528\|7.173\|1.05\| \|YOLOX::DNNTestNetwork::OCV/OCL_FP16\|132.858\|132.701\|1.00\| \|YOLOv3::DNNTestNetwork::OCV/OCL_FP16\|209.559\|208.809\|1.00\| \|YOLOv4::DNNTestNetwork::OCV/OCL_FP16\|221.357\|220.924\|1.00\| \|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16\|24.446\|24.382\|1.00\| \|YOLOv5::DNNTestNetwork::OCV/OCL_FP16\|43.922\|44.080\|1.00\| \|YOLOv8::DNNTestNetwork::OCV/OCL_FP16\|64.159\|63.842\|1.00\| \|YuNet::DNNTestNetwork::OCV/OCL_FP16\|10.177\|10.231\|0.99\| \|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16\|15.121\|15.445\|0.98\| Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>	2024-01-26 16:34:17 +03:00
Yuantao Feng	37156a4719	Merge pull request #24925 from fengyuentau:loongarch_handle_warnings Handle warnings in loongson-related code #24925 See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16 Warnings needs to be handled before we add the loongson server to our CI. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-01-26 13:38:00 +03:00
Sean McBride	e64857c561	Merge pull request #23736 from seanm:c++11-simplifications Removed all pre-C++11 code, workarounds, and branches #23736 This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required. It is a nice clean up and simplification. * No longer unconditionally #include <array> in cvdef.h, include explicitly where needed * Removed deprecated CV_NODISCARD, already unused in the codebase * Removed some pre-C++11 workarounds, and simplified some backwards compat defines * Removed CV_CXX_STD_ARRAY * Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE * Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code. * Updated some documentation consequently. * Removed all tests of CV_CXX11, now assume it's always true * Fixed links. --------- Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com> Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2024-01-19 16:53:08 +03:00
fengyuentau	d269de0a03	initial commit	2024-01-18 11:17:50 +08:00
Alexander Smorkalov	ac4c0bffac	Merge pull request #24813 from fengyuentau:speedup_scatter dnn: improve scatter and scatterND speed with multi-threading	2024-01-17 17:16:50 +03:00
Alexander Smorkalov	84bb1cda4e	Merge pull request #24865 from asmorkalov:as/dnn_concat_assert Normalize axis parameter in DNN Concat to handle negative values	2024-01-16 14:39:28 +03:00
Alexander Smorkalov	26cf82a56c	Normalize axis parameter in DNN Concat to handle negative values.	2024-01-16 12:22:22 +03:00
Alexander Smorkalov	99c86bb40c	Merge pull request #24556 from plctlab:rvp Optimization based on RISC-V P Packed SIMD Extension v0.5.2	2024-01-16 11:36:31 +03:00
Alexander Smorkalov	68dc02e302	Merge pull request #24858 from Dhanwanth1803:avx-fix Use AVX2 overload instread on AVX in AVX2 scope	2024-01-16 09:14:31 +03:00
Dhanwanth1803	a289eba357	Fixes #24677	2024-01-13 09:56:56 +05:30
jimmylaw21	a7fa1e6f4b	Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer dnn onnx: add group norm layer #24610 dnn onnx: add group norm layer Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>	2024-01-12 15:13:26 +03:00
Alexander Smorkalov	97c418ab86	Merge pull request #24840 from fengyuentau:ocl_innerproduct dnn (opencl): integrate bias handling in the inner product opencl kernel	2024-01-12 15:10:16 +03:00
Abduragim Shtanchaev	c923c59833	Merge pull request #24812 from Abdurrahheem:ash/einsum_bachedGemm Replace interactive batched Matrix Multiply. #24812 This PR replaces iterative batch matrix multiplication which `FastGemmBatch` in Einsum layer. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-01-12 14:23:43 +03:00
Yuantao Feng	e7ccff9805	Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast dnn (cuda): support broadcasting if a.rank() != b.rank() #24834 Inspired by https://github.com/opencv/opencv/pull/24786. This PR keeps the fusion of `NaryEltwise` and `Concat` while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank(). Resolves https://github.com/opencv/opencv/issues/23977 Resolves https://github.com/opencv/opencv/issues/24606 Resolves https://github.com/opencv/opencv/issues/24635 Resolves https://github.com/opencv/opencv/issues/24721 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-01-11 10:04:46 +03:00
fengyuentau	83acb656f1	integrate bias handling in ocl kernel	2024-01-11 11:15:17 +08:00
Yuantao Feng	7fb336322d	Merge pull request #24808 from fengyuentau:fix_layernorm dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808 Merge with https://github.com/opencv/opencv_extra/pull/1137 Resolves https://github.com/opencv/opencv/issues/24797 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2024-01-10 13:01:00 +03:00
Yuantao Feng	c955564cb3	Merge pull request #24765 from fengyuentau:mod_operator dnn onnx: add mod #24765 Resolves https://github.com/opencv/opencv/issues/23174 TODO: - [x] enable some conformance tests - [x] add backends - [x] CANN - [x] OpenVINO - [x] CUDA ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2024-01-09 19:00:17 +03:00
fengyuentau	2ed97b9ef3	multi-threaded scatterND and refactor perf	2024-01-05 18:15:59 +08:00
fengyuentau	2997b4c5fe	pretty format	2024-01-05 18:15:27 +08:00
fengyuentau	63cde0b90d	multi-threaded scatter and refactor perf	2024-01-05 17:24:09 +08:00
Yuantao Feng	f978c99523	Merge pull request #24753 from fengyuentau:einsum_importer dnn onnx: support constaint inputs in einsum importer #24753 Merge with https://github.com/opencv/opencv_extra/pull/1132. Resolves https://github.com/opencv/opencv/issues/24697 Credits to @LaurentBerger. --- This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-25 14:42:05 +03:00
Dmitry Kurtaev	938bc4d503	[CUDA] Hotfix Scale with 1 parameter	2023-12-22 15:49:27 +03:00
Dhanwanth1803	027aee8ad4	Merge pull request #24384 from Dhanwanth1803:feat-crop Fixes #22747. Support [crop] configuration for DarkNet #24384 Request for comments. This is my first PR. Merge with extra: https://github.com/opencv/opencv_extra/pull/1112 resolves https://github.com/opencv/opencv/issues/22747 - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-12-22 14:55:01 +03:00
Vadim Pisarevsky	853e5dfcdf	Merge pull request #24709 from vpisarev:winograd_mode Try to enable Winograd by default in FP32 mode and disable it by default in FP16 mode #24709 Hopefully, it will resolve regressions since 4.8.1 (see also https://github.com/opencv/opencv/pull/24587) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-12-22 09:22:31 +03:00
Alexander Smorkalov	35e2ef8019	Merge pull request #24740 from opencv-pushbot:gitee/alalek/ocl_fix_kernel_compilation ocl: fix kernels compilation	2023-12-22 09:20:47 +03:00
Alexander Alekhin	3340c71a2a	ocl: fix kernels compilation	2023-12-21 14:29:23 +00:00
Alexander Alekhin	99c94d3d83	dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16 - OpenCL kernel code doesn't support that	2023-12-21 13:30:57 +00:00
llh721113	a30c987f87	feat: RVP052 Optimization for DNN int8layers	2023-12-21 14:51:41 +08:00
Yuantao Feng	0521a3a384	Merge pull request #24476 from fengyuentau:attention_layer dnn: add attention layer #24476 Resolves #24609 Merge with: https://github.com/opencv/opencv_extra/pull/1128. Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention. TODO: - [x] benchmark (before this PR vs. with this PR vs. ORT). - [x] Layer fusion: Take care Slice with end=INT64_MAX. - [x] Layer fusion: match more potential attention (VIT) patterns. - [x] Single-head attention is supported. - [x] Test AttentionSubgraph fusion. - [x] Add acc tests for VIT_B_32 and VitTrack - [x] Add perf tests for VIT_B_32 and VitTrack ## Benchmarks Platform: Macbook Air M1. ### Attention Subgraph Input scale: [1, 197, 768]. \| \| mean (ms) \| median (ms) \| min (ms) \| \| ---------------------- \| --------- \| ----------- \| -------- \| \| w/ Attention (this PR) \| 3.75 \| 3.68 \| 3.22 \| \| w/o Attention \| 9.06 \| 9.01 \| 8.24 \| \| ORT (python) \| 4.32 \| 2.63 \| 2.50 \| ### ViTs All data in millisecond (ms). \| ViTs \| With Attention \| Without Attention \| ORT \| \| -------- \| -------------- \| ----------------- \| ------ \| \| vit_b_16 \| 302.77 \| 365.35 \| 109.70 \| \| vit_b_32 \| 89.92 \| 116.22 \| 30.36 \| \| vit_l_16 \| 1593.32 \| 1730.74 \| 419.92 \| \| vit_l_32 \| 468.11 \| 577.41 \| 134.12 \| \| VitTrack \| 3.80 \| 3.87 \| 2.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-20 19:35:07 +03:00
Laurent Berger	3e6dcdc0a4	Merge pull request #24539 from LaurentBerger:blobrecttoimage Add blobrecttoimage #24539 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/14659 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work #14659 - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-19 20:00:04 +03:00
Yuantao Feng	fa5ed62a66	Merge pull request #24694 from fengyuentau:matmul_refactor dnn: refactor ONNX MatMul with fastGemm #24694 Done: - [x] add backends - [x] CUDA - [x] OpenVINO - [x] CANN - [x] OpenCL - [x] Vulkan - [x] add perf tests - [x] const B case ### Benchmark Tests are done on M1. All data is in milliseconds (ms). \| Configuration \| MatMul (Prepacked) \| MatMul \| InnerProduct \| \| - \| - \| - \| - \| \| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 \| 0.39 \| 0.41 \| 1.33 \| \| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0 \| 0.42 \| 0.42 \| 1.17 \| \| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0 \| 0.13 \| 0.15 \| 0.33 \| \| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0 \| 0.11 \| 0.13 \| 0.22 \| \| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 \| 0.46 \| 0.54 \| 1.46 \| \| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0 \| 0.46 \| 0.95 \| 1.74 \| \| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0 \| 0.18 \| 0.32 \| 0.43 \| \| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0 \| 0.15 \| 0.25 \| 0.25 \| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-19 19:36:41 +03:00
Wanli	6ae1709c6a	Merge pull request #24613 from WanliZhong:softmax_default_axis Make default axis of softmax in onnx "-1" without opset option #24613 Try to solve problem: https://github.com/opencv/opencv/pull/24476#discussion_r1404821158 ONNX `opset <= 11` use 1 `else` use -1 TensorFlow `TF version = 2.x` use -1 `else` use 1 Darknet, Caffe, Torch use 1 by definition	2023-12-15 10:41:42 +03:00
Wanli	9bbc890d96	Merge pull request #24681 from WanliZhong:err_armv8 Fixed armv8 compilation warnings #24681 Fixes the following warning on armv8: ``` warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] ``` Buildbot: https://pullrequest.opencv.org/buildbot/builders/4_x_ARMv8-lin	2023-12-12 15:38:07 +03:00
Dmitry Kurtaev	ac4b26a561	Replace Slice optional inputs removal to adjustment	2023-12-08 23:29:52 +03:00
Yuantao Feng	a2edf4d929	Merge pull request #24647 from fengyuentau:cuda_sub dnn cuda: support Sub #24647 Related https://github.com/opencv/opencv/issues/24606#issuecomment-1837390257 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-06 13:46:24 +03:00
Yuantao Feng	f5ec92e4ca	Merge pull request #24655 from fengyuentau:graph_simplifier_optional_input dnn onnx graph simplifier: handle optional inputs of Slice #24655 Resolves https://github.com/opencv/opencv/issues/24609 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-12-06 13:43:54 +03:00
Alexander Smorkalov	7b1a5fb3de	Migrate Android Face Detection sample to DNN.	2023-11-29 11:02:44 +03:00
Abduragim Shtanchaev	5278560252	Merge pull request #24569 from Abdurrahheem:ash/padding_value_fix Add support for custom padding in DNN preprocessing #24569 This PR add functionality for specifying value in padding. It is required in many preprocessing pipelines in DNNs such as Yolox object detection model ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-28 11:54:09 +03:00
Dmitry Kurtaev	332748dd55	Merge pull request #24577 from dkurt:dnn_graph_match_stack Fix graph fusion with commutative ops #24577 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24568 Merge with extra: https://github.com/opencv/opencv_extra/pull/1125 TODO: - [x] replace recursive function to sequential See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-24 10:40:32 +03:00
Yuantao Feng	d05fb709f9	Merge pull request #24552 from fengyuentau:layernorm_backends dnn: add openvino, opencl and cuda backends for layer normalization layer #24552 Merge after https://github.com/opencv/opencv/pull/24544. Todo: - [x] openvino - [x] opencl - [x] cuda ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-21 15:33:01 +03:00
zihaomu	b913e73d04	DNN: add the Winograd fp16 support (#23654 ) * add Winograd FP16 implementation * fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime * fixed some warnings * hopefully fixed winograd on x64 (and maybe other platforms) --------- Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>	2023-11-20 13:45:37 +03:00
Yuantao Feng	a478757483	Merge pull request #24544 from fengyuentau:layernorm_conformance dnn test: move layer norm tests into conformance tests #24544 Merge with https://github.com/opencv/opencv_extra/pull/1122 ## Motivation Some ONNX operators, such as `LayerNormalization`, `BatchNormalization` and so on, produce outputs for training (mean, stdev). So they have reference outputs of conformance tests for those training outputs as well. However, when it comes to inference, we do not need and produce those outputs for training here in dnn. Hence, output size does not match if we use dnn to infer those conformance models. This has become the barrier if we want to test these operators using their conformance tests. <!-- \| Operator \| Inference needed \| Outputs (required - total) \| Optional outputs for training? \| \| ----------------------- \| ----------------------------------- \| -------------------------- \| ------------------------------ \| \| BatchNormalization \| Yes \| 1 - 3 \| Yes \| \| Dropout \| Maybe, can be eliminated via fusion \| 1 - 2 \| Yes \| \| GRU \| Yes \| 0 - 2 \| No \| \| LSTM \| Yes \| 0 - 3 \| No \| \| LayerNormalization \| Yes \| 1 - 3 \| Yes \| \| MaxPool \| Yes \| 1 - 2 \| Yes \| \| RNN \| Yes \| 0 - 2 \| No \| \| SoftmaxCrossEntropyLoss \| No \| 1 - 2 \| -- \| --> I checked all ONNX operators with optional outputs. Turns out there are only `BatchNormalization`, `Dropout`, `LayerNormalization` and `MaxPool` has optional outputs for training. All except `LayerNormalization` have models set for training mode and eval mode. Blame ONNX for that. ## Solution In this pull request, we remove graph outputs if the graph looks like the following: ``` [X] [Scale] [Bias] [X] [Scale] [Bias] \ \| / this patch \ \| / LayerNormalization -----------> LayerNormalization / \| \ \| [Y] [Mean] [Stdev] [Y] ``` We can update conformance tests and turn on some cases as well if extending to more layers. Notes: 1. This workaround does not solve expanded function operators if they are fused into a single operator, such as `$onnx/onnx/backend/test/data/node/test_layer_normalization_2d_axis1_expanded`, but they can be run without fusion. Note that either dnn or onnxruntime does not fuse those expanded function operators. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-20 11:19:24 +03:00
Abduragim Shtanchaev	8c10545d3c	Merge pull request #24509 from Abdurrahheem:ash/dev_einsum_fast_gemm Fast gemm for einsum #24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-16 16:20:17 +03:00
Yuantao Feng	024dfd54af	dnn cann backend: add hardswish, layernorm and instasnce norm for cann and bug fix (#24462 ) * add hardswish for cann * gemm cann bug fix * fix indentation * cann: add layer norm * cann: add instance norm * add supportBackend * cann: layer norm does not support axis=-1 due to 1d mat issue * disable instance norm for now * fix doc * remove tensor desc initialization for 1D tensor	2023-11-15 17:57:52 +03:00
Alexander Smorkalov	960a926055	Merge pull request #24510 from asmorkalov:as/softmax_rvv Enable softmax layer vectorization on RISC-V RVV #24510 Related: https://github.com/opencv/opencv/pull/24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-11-11 09:09:14 +03:00
Dmitry Kurtaev	b7ec2ebb55	Merge pull request #24483 from dkurt:dnn_fusion_commutative_ops Commutative rules for DNN subgraphs fusion #24483 ### Pull Request Readiness Checklist related: https://github.com/opencv/opencv/pull/24463#issuecomment-1783033931 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-08 16:26:33 +03:00
Yuantao Feng	ee0822dc4d	Merge pull request #24378 from fengyuentau:instance_norm dnn onnx: add instance norm layer #24378 Resolves https://github.com/opencv/opencv/issues/24377 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644 \| Perf \| multi-thread \| single-thread \| \| - \| - \| - \| \| x: [2, 64, 180, 240] \| 3.95ms \| 11.12ms \| Todo: - [x] speed up by multi-threading - [x] add perf - [x] add backend: OpenVINO - [x] add backend: CUDA - [x] add backend: OpenCL (no fp16) - [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` force_builders=Linux OpenCL,Win64 OpenCL,Custom buildworker:Custom=linux-4 build_image:Custom=ubuntu:18.04 modules_filter:Custom=none disable_ipp:Custom=ON ```	2023-11-07 12:59:10 +03:00
Wanli	ed52f7feea	Improve and refactor softmax layer (#24466 ) * improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD	2023-11-06 04:48:32 +03:00
Dmitry Kurtaev	fa56623458	Merge pull request #24463 from dkurt:dnn_shared_nodes_fusion DNN graph fusion with shared nodes #24463 ### Pull Request Readiness Checklist For now, nodes from matched pattern are removed during the matching process so if nodes are used in similar subgraph, they cannot be found. required for https://github.com/opencv/opencv/pull/24397 Merge with extra: https://github.com/opencv/opencv_extra/pull/1115 A part from [model_name ](https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/fcn/model/fcn-resnet101-11.onnx) with two Resize subgraphs with shared nodes: ![image](https://github.com/opencv/opencv/assets/25801568/611d89d9-12fb-4add-9218-13b10d2c086a) See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-03 12:34:09 +03:00
Yuantao Feng	c91af16fa7	Merge pull request #24409 from fengyuentau:norm_kernel dnn: add shared fastNorm kernel for mvn, instance norm and layer norm #24409 Relates https://github.com/opencv/opencv/pull/24378#issuecomment-1756906570 TODO: - [x] add fastNorm - [x] refactor layer norm with fastNorm - [x] refactor mvn with fastNorm - [ ] add onnx mvn in importer (in a new PR?) - [ ] refactor instance norm with fastNorm (in another PR https://github.com/opencv/opencv/pull/24378, need to merge this one first though) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-11-01 14:33:57 +03:00
Kumataro	1911c63826	fix: supress GCC13 warnings (#24434 ) * fix: supress GCC13 warnings * fix for review and compile-warning on MacOS	2023-10-26 09:00:58 +03:00
Abduragim Shtanchaev	a3b3a589f9	Merge pull request #24322 from Abdurrahheem:ash/dev_einsum_ellips Ellipses supported added for Einsum Layer #24322 This PR added addresses issues not covered in #24037. Namely these are: Test case for this patch is in this PR [#1106](https://github.com/opencv/opencv_extra/pull/1106) in opencv extra Added: - [x] Broadcasting reduction "...ii ->...I" - [x] Add lazy shape deduction. "...ij, ...jk->...ik" Features to add: - [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik") - [ ] Add support for CUDA backend - [ ] BatchWiseMultiply optimize - [ ] Performance test ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-24 16:47:00 +03:00
Amir Hassan	c2f909fc86	Merge pull request #23894 from kallaballa:blobFromImagesWithParams Pertaining Issue: https://github.com/opencv/opencv/issues/5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-20 14:27:40 +03:00
Alexander Smorkalov	1c0ca41b6e	Merge pull request #24371 from hanliutong:clean-up Clean up the obsolete API of Universal Intrinsic	2023-10-20 12:50:26 +03:00
andrewerf	b44cb33d2f	Merge pull request #21066 from andrewerf:21052-openvino-native-onnx Native ONNX to Inference Engine backend #21066 Resolves #21052 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV - [x] The PR is proposed to proper branch - [x] There is reference to original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-20 11:49:27 +03:00
fengyuentau	f2ef81a179	fp16 support for gather elements	2023-10-19 14:44:12 +08:00
Aser Atawya	240b245105	Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX GSoC Add ONNX Support for GatherElements #24092 Merge with: https://github.com/opencv/opencv_extra/pull/1082 Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements) Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-18 10:41:47 +03:00
alexlyulkov	014e8485b5	Merge pull request #24367 from alexlyulkov:al/fixed-cumsum-inplace-flag Fixed CumSum layer inplace flag #24367 When exclusive is false: dst[i] = dst[i-1] + src[i] When exclusive is true: dst[i] = dst[i-1] + src[i-1] So CumSum layer can be inplace only when exclusive flag is false.	2023-10-18 09:21:40 +03:00
Liutong HAN	a287605c3e	Clean up the Universal Intrinsic API.	2023-10-13 19:23:30 +08:00
Yuantao Feng	0507043a55	Merge pull request #24386 from fengyuentau:fix_dtype_nary_eltwise dnn: fix inconsistent input dtype for nary eltwise layers #24386 Resolves https://github.com/opencv/opencv/issues/24385 Merge with https://github.com/opencv/opencv_extra/pull/1107 Relates https://github.com/opencv/opencv/pull/24092#discussion_r1353964405 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-10-13 11:56:18 +03:00
Yuantao Feng	590f150d5e	dnn: hotfixes for fast gemm (#24315 ) * remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code	2023-10-07 21:48:44 +03:00
Sean McBride	5fb3869775	Merge pull request #23109 from seanm:misc-warnings * Fixed clang -Wnewline-eof warnings * Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings * Removed trailing semi from various macros * Fixed various -Wunused-macros warnings * Fixed some trivial -Wdocumentation warnings * Fixed some -Wdocumentation-deprecated-sync warnings * Fixed incorrect indentation * Suppressed some clang warnings in 3rd party code * Fixed QRCodeEncoder::Params documentation. --------- Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>	2023-10-06 13:33:21 +03:00
HAN Liutong	07bf9cb013	Merge pull request #24325 from hanliutong:rewrite Rewrite Universal Intrinsic code: float related part #24325 The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API. The series of PRs is listed below: #23885 First patch, an example #23980 Core module #24058 ImgProc module, part 1 #24132 ImgProc module, part 2 #24166 ImgProc module, part 3 #24301 Features2d and calib3d module #24324 Gapi module This patch (hopefully) is the last one in the series. This patch mainly involves 3 parts 1. Add some modifications related to float (CV_SIMD_64F) 2. Use `#if (CV_SIMD \|\| CV_SIMD_SCALABLE)` instead of `#if CV_SIMD \|\| CV_SIMD_SCALABLE`, then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD` 3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....` - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here) - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct) - ./modules/imgproc/src/color_lab.cpp (Array of vector type) - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type) - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`) These algorithms will need to be redesigned to accommodate scalable backends. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake	2023-10-05 17:57:25 +03:00
alexlyulkov	9bd14d5417	Merge pull request #24353 from alexlyulkov:al/fixed-cumsum-layer Fixed CumSum dnn layer #24353 Fixes #20110 The algorithm had several errors, so I rewrote it. Also the layer didn't work with non constant axis tensor. Fixed it. Enabled CumSum layer tests from ONNX conformance.	2023-10-03 13:58:25 +03:00
Alexander Smorkalov	5caee5cc64	Fixed OpenCL PF16 fallback in Einsum layer.	2023-09-29 15:52:23 +03:00
Dmitry Kurtaev	c7ec0d599a	Merge pull request #23987 from dkurt:openvino_int8_backend OpenVINO backend for INT8 models #23987 ### Pull Request Readiness Checklist TODO: - [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069) - [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum) - [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039) - [x] Single layer tests (convolution) - [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~ Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`: \| backend \| performance (median time) \| \|---\|---\| \| OpenCV \| 77.42ms \| \| OpenVINO 2023.0 \| 10.90ms \| CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz` Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef --- See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-28 16:24:43 +03:00
Alexander Smorkalov	b8d4ac589d	Merge pull request #24334 from fengyuentau:fix_24319 dnn onnx: fix not-found constant indices for Gather if shared	2023-09-28 13:08:26 +03:00
fengyuentau	7fa0493ca0	init commit	2023-09-28 11:50:21 +08:00
Dmitry Kurtaev	2b6d0f36f0	Merge pull request #24309 from dkurt:gemm_ov_hotfix Update OpenVINO init of new GEMM layer #24309 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request CI validation: - [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368 - [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373 Checklist: - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-27 10:25:45 +03:00
Yuantao Feng	bb171a0c05	dnn: expand refactor with cv::broadcast for onnx models (#24295 ) * add expand impl with cv::broadcast * remove expandMid * deduce shape from -1 * add constant folding * handle input constant; handle input constant 1d * add expand conformance tests; add checks to disallow shape of neg values; add early copy for unchanged total elements * fix ExpandSubgraph * dummy commit to trigger build * dummy commit to trigger build 1 * remove conformance from test names	2023-09-27 09:28:52 +03:00
Alexander Smorkalov	9942757bab	Merge pull request #24316 from alexlyulkov:al/fix-caffe-read-segfault Fixed segfault when reading Caffe model	2023-09-25 17:53:54 +03:00
Alexander Lyulkov	72e7672a6c	Fixed segfault when reading Caffe model	2023-09-25 12:55:11 +07:00
Abduragim Shtanchaev	865e7cacca	Merge pull request #24037 from Abdurrahheem:ash/dev_einsum Add Support for Einsum Layer #24037 ### This PR adding support for [Einsum Layer](https://pytorch.org/docs/stable/generated/torch.einsum.html) (in progress). This PR is currently not to be merged but only reviewed. Test cases are located in [#1090](https://github.com/opencv/opencv_extra/pull/1090)RP in OpenCV extra DONE: - [x] 2-5D GMM support added - [x] Matrix transpose support added - [x] Reduction type comupte 'ij->j' - [x] 2nd shape computation - during forward Next PRs: - [ ] Broadcasting reduction "...ii ->...i" - [ ] Add lazy shape deduction. "...ij, ...jk->...ik" - [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik") - [ ] Add support for CUDA backend - [ ] BatchWiseMultiply optimize Later in 5.x version (requires support for 1D matrices): - [ ] Add 1D vector multiplication support - [ ] Inter product "i, i" (problems with 1D shapes) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-22 11:25:02 +03:00
Alexander Smorkalov	799bb0cd18	Merge pull request #24291 from visitorckw:fix-memory-leak Fix memory leak and handle realloc failure	2023-09-20 08:49:56 +03:00
Yuantao Feng	8a96e34e33	dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897 ) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix	2023-09-20 00:53:34 +03:00
Kuan-Wei Chiu	e16ca08b33	Fix memory leak and handle realloc failure In the previous code, there was a memory leak issue where the previously allocated memory was not freed upon a failed realloc operation. This commit addresses the problem by releasing the old memory before setting the pointer to NULL in case of a realloc failure. This ensures that memory is properly managed and avoids potential memory leaks.	2023-09-18 22:43:44 +08:00
Alexander Smorkalov	157b0e7760	Merge pull request #24275 from alexlyulkov:al/fix-tf-graph-simplifier Fixed removePhaseSwitches in tf_graph_simplifier	2023-09-18 11:02:44 +03:00
Alexander Lyulkov	d4cb564ce2	Fixed removePhaseSwitches in tf_graph_simplifier	2023-09-15 14:22:21 +07:00
alexlyulkov	1e54e56579	Merge pull request #24266 from alexlyulkov:al/tf-argmax-default-dim Added default dimension value to tensorflow ArgMax and ArgMin layers #24266 Added default dimension value to tensorflow ArgMax and ArgMin layers. Added exception when accessing layer's input with out of range index. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48452	2023-09-14 10:25:24 +03:00
Alexander Smorkalov	62c0556c58	Merge pull request #24252 from opencv-pushbot:gitee/alalek/refactor_24218 cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration	2023-09-11 08:55:19 +03:00
Alexander Alekhin	02525abd9f	cmake: revise OPENCV_DNN_BACKEND_DEFAULT integration - disable message on default value	2023-09-10 13:11:36 +00:00
Dmitry Kurtaev	5dc5b27858	Enable build with OpenVINO in Debug	2023-09-09 20:38:59 +03:00
Alexander Smorkalov	e60825e75b	Merge pull request #24218 from CSBVision:patch-5 Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT	2023-09-08 14:21:39 +03:00
Alexander Smorkalov	5350fba319	Merge pull request #24128 from CSBVision:CSBVision-patch-1 Fix bug at blobFromImagesWithParams	2023-09-06 16:20:37 +03:00
CSBVision	674c618471	Update dnn_utils.cpp	2023-09-06 10:01:07 +03:00
Dmitry Kurtaev	178fdbbda8	Merge pull request #24196 from dkurt:ov_backend_cleanups Use ngraph::Output in OpenVINO backend wrapper #24196 ### Pull Request Readiness Checklist resolves https://github.com/opencv/opencv/issues/24102 * Use `ngraph::Output<ngraph::Node>>` insead of `std::shared_ptr<ngraph::Node>` as a backend wrapper. It lets access to multi-output nodes: `588ddf1b18/modules/dnn/src/net_openvino.cpp (L501-L504)` * All layers can be customizable with OpenVINO >= 2022.1. nGraph reference code used for default layer implementation does not required CPU plugin also (might be tested by commenting CPU plugin at `/opt/intel/openvino/runtime/lib/intel64/plugins.xml`). * Correct inference if only intermediate blobs requested. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake	2023-09-05 18:08:28 +03:00
Björn Böken	639836ebf0	Added CMake configuration OPENCV_DNN_BACKEND_DEFAULT	2023-09-05 10:05:12 +02:00

1 2 3 4 5 ...

1860 Commits