Commit Graph

2400 Commits

Author SHA1 Message Date
Abduragim Shtanchaev
d188319b82
0D test for Reshape layer (#25206)
* reshape test for 0D

* fix comments according to PR
2024-03-22 03:59:08 +03:00
alexlyulkov
aa9e80b07b
Added native int64 indices support to gather layer (#25211)
Co-authored-by: Alexander Lyulkov <alexander.lyulkov@opencv.ai>
2024-03-22 03:43:20 +03:00
alexlyulkov
f2cf3c8890
Added int support to flatten, permute, reshape, slice layers (#25236)
Co-authored-by: Alexander Lyulkov <alexander.lyulkov@opencv.ai>
2024-03-22 03:39:42 +03:00
Oleg Pipikin
6da2ddcf0e Fix for OpenVINO 2024.0
Remove support OpenVINO lower than 2022.1 release
Remove legacy InferenceEngine wrappers
2024-03-18 15:05:50 +04:00
Alexander Lyulkov
d2d6869a26 Added int64 values support to scatter, scatterND and maxunpool layers 2024-03-13 15:40:07 +03:00
alexlyulkov
85cc02f4de
Allowed int64 constants in ONNX parser (#25148)
* Removed automatic int64 to int32 conversion in ONNX parser

* Fixed wrong rebase code

* added tests, minor fixes

* fixed Cast layer

* Fixed Cast layer for fp16 backend

* Fixed Cast layer for fp16 backend

* Fixed Cast layer for fp16 backend

* Allowed uint32, int64, uint64 types in OpenCL

* Fixed Cast layer for fp16 backend

* Use randu in test_int

---------

Co-authored-by: Alexander Lyulkov <alexander.lyulkov@opencv.ai>
2024-03-13 11:48:23 +03:00
Dmitry Kurtaev
6a370ba9e7 Avoid extra memset in convolution initialization 2024-03-08 10:46:07 +03:00
Dmitry Kurtaev
98aed21dd4 Avoid copy of ONNX graph during import 2024-03-05 18:22:46 +03:00
Alexander Smorkalov
c6776ec136
Merge pull request #25159 from Kumataro:trial_to_fix_cv_check_24411
dnn: fix to iteration variable scope
2024-03-05 16:01:25 +03:00
Kumataro
216c6c3da1 dnn: fix to iteration variable scope 2024-03-05 18:33:56 +09:00
Maksim Shabunin
8cbdd0c833
Merge pull request #25075 from mshabunin:cleanup-imgproc-1
C-API cleanup: apps, imgproc_c and some constants #25075

Merge with https://github.com/opencv/opencv_contrib/pull/3642

* Removed obsolete apps - traincascade and createsamples (please use older OpenCV versions if you need them). These apps relied heavily on C-API
* removed all mentions of imgproc C-API headers (imgproc_c.h, types_c.h) - they were empty, included core C-API headers
* replaced usage of several C constants with C++ ones (error codes, norm modes, RNG modes, PCA modes, ...) - most part of this PR (split into two parts - all modules and calib+3d - for easier backporting)
* removed imgproc C-API headers (as separate commit, so that other changes could be backported to 4.x)

Most of these changes can be backported to 4.x.
2024-03-05 12:18:31 +03:00
Alexander Smorkalov
daa8f7dfc6 Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
Laurent Berger
5fe3933346
Merge pull request #25120 from LaurentBerger:I25103
Fixed ReduceMean layer behaviour #25120

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)
2024-03-04 09:36:53 +03:00
alexlyulkov
1d1faaabef
Merge pull request #24411 from alexlyulkov:al/dnn-type-inference
Added int32, int64 support and type inference to dnn #24411

**Added a type inference to dnn similar to the shape inference, added int32 and int64 support.**

- Added getTypes method for layers that calculates layer outputs types and internals types from inputs types (Similar to getMemoryShapes). By default outputs and internals types = input[0] type
- Added type inference pipeline similar to shape inference pipeline. LayersShapes struct (that is used in shape inference pipeline) now contains both shapes and types
- All layers output blobs are now allocated using the calculated types from the type inference.
- Inputs and constants with int32 and int64 types are not automatically converted into float32 now.
- Added int32 and int64 support for all the layers with indexing and for all the layers required in tests.

Added  int32 and int64 support for CUDA:
- Added host<->device data moving for int32 and int64
- Added int32 and int64 support for several layers (just slightly modified CUDA C++ templates)

Passed all the accuracy tests on CPU, OCL, OCL_FP16, CUDA, CUDA_FP16. (except RAFT model)

**CURRENT PROBLEMS**:
-  ONNX parser always converts int64 constants and layers attributes to int32, so some models with int64 constants doesn't work (e.g. RAFT). The solution is to disable int64->int32 conversion and fix attributes reading in a lot of ONNX layers parsers (https://github.com/opencv/opencv/issues/25102)
- I didn't add type inference and int support to VULCAN, so it doesn't work at all now.
- Some layers don't support int yet, so some unknown models may not work.

**CURRENT WORKAROUNDS**:
- CPU arg_layer indides are implemented in int32 followed by a int32->int64 conversion (the master branch has the same workaround with int32->float conversion)
- CPU and OCL pooling_layer indices are implemented in float followed by a float->int64 conversion
- CPU gather_layer indices are implemented in int32, so int64 indices are converted to int32 (the master branch has the same workaround with float->int32 conversion)

**DISABLED TESTS**:
- RAFT model

**REMOVED TESTS**:
- Greater_input_dtype_int64 (because it doesn't fit ONNX rules, the whole test is just comparing float tensor with int constant)

**TODO IN NEXT PULL REQUESTS**:
- Add int64 support for ONNX parser
- Add int support for more layers
- Add int support for OCL (currently int layers just run on CPU)
- Add int tests
- Add int support for other backends
2024-03-01 17:07:38 +03:00
CSBVision
e8582f2cf8 Update net_impl.cpp
See issue #25112
2024-03-01 14:56:00 +01:00
Alexander Smorkalov
010772b492 Extracted 1d test cases to reduce conflicts with 4.x. 2024-02-29 12:02:00 +03:00
Alexander Smorkalov
92b940792a
Merge pull request #25117 from Abdurrahheem:ash/scale-layer-1D-test
Scale layer 1d test
2024-02-29 11:32:13 +03:00
Alexander Smorkalov
a22130fbfa Merge branch 4.x 2024-02-28 18:49:05 +03:00
Yuantao Feng
5aa5c39210
Merge pull request #25076 from fengyuentau:improve_attention
dnn: try improving performance of Attention layer #25076

Checklist:

- [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward
- [x] Use layer internal buffer over temporary Mat buffer
- [x] Try a single fastGemmBatch on the Q/K/V calculation

Performance:

Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.

Data is in millisecond.

| | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K |
| - | - | - |
| Current | 10.96 | 1.58 |
| w/ Mat | 6.27 | 1.41 |
| w/ Internals | 5.87 | 1.38 |
| w/ fastGemmBatch | 6.12 | 2.14 |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-28 16:47:08 +03:00
Abdurrahheem
161c402f02 seperated working scale layer 1d test. 2024-02-28 13:04:48 +04:00
Laurent Berger
3c712cf77d
Merge pull request #25100 from LaurentBerger:I25077
Fix issue #25077 #25100

Fixes https://github.com/opencv/opencv/issues/25077

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-27 14:15:11 +03:00
Abduragim Shtanchaev
093ed08892
Merge pull request #24977 from Abdurrahheem:ash/primitive_1d_tests
Primitive 1D Tests #24977

This PR is designed to add tests for 1D inputs for layer, which is required after introducing 1d support in 5.x. Currently tests are written for following layers: 

- [x] `Add`, `Sub`
- [x]  `Product`, `Div`
- [x]  `Min`, `Max`
- [x] `Argmin`, `Argmax`
- [x] `Gather` 

This list is to be extended for more layer such `gemm`, `conv` etc.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-21 17:37:49 +03:00
Alexander Smorkalov
f084a229b4 Merge branch 4.x 2024-02-19 09:06:26 +03:00
Yuantao Feng
d4fd5157fa
Merge pull request #24980 from fengyuentau:on-fly-quantization-removal
dnn cleanup: On-fly-quantization removal #2498

On-fly-quantization is first introduced via https://github.com/opencv/opencv/pull/20228.
We decided to remove it but keep int8 layers implementation because on-fly-quantization
is less practical given the fact that there has been so many dedicated tools for model
quantization.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-16 18:21:45 +03:00
Dhanwanth1803
12aa0fe898
Merge pull request #24985 from Dhanwanth1803:hardswish
Fixes #24974 support HardSwishInt8 #24985

As given very clearly in the issue #24974 I made the required 2 changes to implement HardSwish Layer in INT8. Requesting comments.

resolves https://github.com/opencv/opencv/issues/24974

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: Dhanwanth1803 <dhanwanthvarala@gmail,com>
2024-02-16 18:19:29 +03:00
Alexander Smorkalov
fa3f1822ae
Merge pull request #24993 from asmorkalov:as/FastNeuralStyle_eccv16_CUDA
Relax test requirements for CUDA in DNNTestNetwork.FastNeuraStyle_eccv16
2024-02-12 16:37:57 +03:00
Alexander Smorkalov
5ce0acce40 Relax test requirements for CUDA in DNNTestNetwork.FastNeuraStyle_eccv16 2024-02-12 14:47:37 +03:00
Alexander Smorkalov
3a55f50133 Merge branch 4.x 2024-02-12 14:20:35 +03:00
Alexander Smorkalov
4b35b2f968
Merge pull request #24973 from asmorkalov:as/fix_weigths_proto_mess
Fix proto and weights mess in dnn performance tests
2024-02-07 11:10:32 +03:00
Alexander Smorkalov
77af137285 Fix proto and weights mess in dnn performance tests. 2024-02-07 09:16:09 +03:00
fengyuentau
fcaa8ce3c2 fix incorrect steps and elemsize when dtype changes 2024-02-06 16:27:25 +08:00
Haosonn
87f749277d
Merge pull request #24768 from Haosonn:pre-pr-2
Vulkan backend for NaryEltwiseLayer in DNN module #24768

We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:

- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
  - Wrong info output in ``context.cpp``

Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by

- find out the best ``VkMemoryProperty`` for various discrete GPUs

- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: IskXCr <IskXCr@outlook.com>
2024-01-29 18:41:49 +03:00
Alexander Alekhin
efc9837df1
Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage
DNN: avoid CV_16S usage for FP16 #24892

**Merge after**: #24918

TODO:
- [x] measure performance changes
- [x] optimize convertTo for OpenCL: #24918

12700K iGPU:

|Name of Test|0|1|1 vs 0 (x-factor)|
|---|:-:|:-:|:-:|
|AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99|
|CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00|
|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00|
|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00|
|EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01|
|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00|
|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00|
|Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00|
|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99|
|MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01|
|MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00|
|MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00|
|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01|
|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01|
|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98|
|OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01|
|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00|
|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96|
|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99|
|ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00|
|SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00|
|SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00|
|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00|
|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05|
|VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05|
|YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00|
|YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00|
|YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00|
|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00|
|YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00|
|YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00|
|YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99|
|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98|

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 16:34:17 +03:00
Yuantao Feng
37156a4719
Merge pull request #24925 from fengyuentau:loongarch_handle_warnings
Handle warnings in loongson-related code #24925

See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16

Warnings needs to be handled before we add the loongson server to our CI.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-26 13:38:00 +03:00
Alexander Smorkalov
decf6538a2 Merge branch 4.x 2024-01-23 17:06:52 +03:00
Alexander Smorkalov
d6424233f0
Merge pull request #24906 from Abdurrahheem:ash/fix_einsum_inner
Einsum Layer Inner Product Issue Solution
2024-01-23 09:26:22 +03:00
Abduragim
0e6b7f1656 fix 1D handling issue in inner product 2024-01-22 20:10:34 +04:00
Alexander Smorkalov
775210e701 Relax test requirements for OpenCL in test DNNTestNetwork.FastNeuralStyle_eccv16. 2024-01-22 17:11:41 +03:00
Alexander Smorkalov
c739117a7c Merge branch 4.x 2024-01-19 17:32:22 +03:00
Sean McBride
e64857c561
Merge pull request #23736 from seanm:c++11-simplifications
Removed all pre-C++11 code, workarounds, and branches #23736

This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.

* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.

---------

Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
2024-01-19 16:53:08 +03:00
fengyuentau
d269de0a03 initial commit 2024-01-18 11:17:50 +08:00
Alexander Smorkalov
d1e4bd8543
Merge pull request #24809 from Abdurrahheem:ash/yolo-nas-test
Added test for YOLO NAS
2024-01-17 20:36:15 +03:00
Alexander Smorkalov
ac4c0bffac
Merge pull request #24813 from fengyuentau:speedup_scatter
dnn: improve scatter and scatterND speed with multi-threading
2024-01-17 17:16:50 +03:00
Abduragim
d30bf1bc3c added test for yolo nas 2024-01-17 13:01:43 +03:00
Alexander Smorkalov
84bb1cda4e
Merge pull request #24865 from asmorkalov:as/dnn_concat_assert
Normalize axis parameter in DNN Concat to handle negative values
2024-01-16 14:39:28 +03:00
Alexander Smorkalov
26cf82a56c Normalize axis parameter in DNN Concat to handle negative values. 2024-01-16 12:22:22 +03:00
Alexander Smorkalov
99c86bb40c
Merge pull request #24556 from plctlab:rvp
Optimization based on RISC-V P Packed SIMD Extension v0.5.2
2024-01-16 11:36:31 +03:00
Alexander Smorkalov
68dc02e302
Merge pull request #24858 from Dhanwanth1803:avx-fix
Use AVX2 overload instread on AVX in AVX2 scope
2024-01-16 09:14:31 +03:00
Dhanwanth1803
a289eba357 Fixes #24677 2024-01-13 09:56:56 +05:30
Stefan Dragnev
2791bb7062
Merge pull request #24773 from tailsu:sd/pathlike
python: accept path-like objects wherever file names are expected #24773

Merry Christmas, all 🎄

Implements #15731

Support is enabled for all arguments named `filename` or `filepath` (case-insensitive), or annotated with `CV_WRAP_FILE_PATH`.

Support is based on `PyOS_FSPath`, which is available in Python 3.6+. When running on older Python versions the arguments must have a `str` value as before.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-12 16:23:05 +03:00
jimmylaw21
a7fa1e6f4b
Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer
dnn onnx: add group norm layer #24610

dnn onnx: add group norm layer

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2024-01-12 15:13:26 +03:00
Alexander Smorkalov
97c418ab86
Merge pull request #24840 from fengyuentau:ocl_innerproduct
dnn (opencl): integrate bias handling in the inner product opencl kernel
2024-01-12 15:10:16 +03:00
Abduragim Shtanchaev
c923c59833
Merge pull request #24812 from Abdurrahheem:ash/einsum_bachedGemm
Replace interactive batched Matrix Multiply. #24812

This PR replaces iterative batch matrix multiplication which `FastGemmBatch` in Einsum layer.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-12 14:23:43 +03:00
Yuantao Feng
e7ccff9805
Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast
dnn (cuda): support broadcasting if a.rank() != b.rank() #24834

Inspired by https://github.com/opencv/opencv/pull/24786. This PR keeps the fusion of `NaryEltwise` and `Concat` while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().

Resolves https://github.com/opencv/opencv/issues/23977
Resolves https://github.com/opencv/opencv/issues/24606
Resolves https://github.com/opencv/opencv/issues/24635
Resolves https://github.com/opencv/opencv/issues/24721 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-11 10:04:46 +03:00
fengyuentau
83acb656f1 integrate bias handling in ocl kernel 2024-01-11 11:15:17 +08:00
Yuantao Feng
7fb336322d
Merge pull request #24808 from fengyuentau:fix_layernorm
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808

Merge with https://github.com/opencv/opencv_extra/pull/1137

Resolves https://github.com/opencv/opencv/issues/24797

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-10 13:01:00 +03:00
Yuantao Feng
c955564cb3
Merge pull request #24765 from fengyuentau:mod_operator
dnn onnx: add mod #24765

Resolves https://github.com/opencv/opencv/issues/23174

TODO:

- [x] enable some conformance tests
- [x] add backends
    - [x] CANN
    - [x] OpenVINO
    - [x] CUDA

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-09 19:00:17 +03:00
Abduragim
6c28d7140a 1d support for einsum 2024-01-08 21:34:47 +03:00
fengyuentau
13127365e2 better comment 2024-01-08 11:55:06 +08:00
Yuantao Feng
b7d70613e4 fix failed assertion in debug build 2024-01-05 18:33:01 +00:00
fengyuentau
2ed97b9ef3 multi-threaded scatterND and refactor perf 2024-01-05 18:15:59 +08:00
fengyuentau
2997b4c5fe pretty format 2024-01-05 18:15:27 +08:00
fengyuentau
63cde0b90d multi-threaded scatter and refactor perf 2024-01-05 17:24:09 +08:00
Abduragim Shtanchaev
3b26e183cb changed weights of yolov7 2023-12-28 23:03:47 +03:00
cudawarped
7d681cf80d build: first class cuda support 2023-12-26 09:39:18 +03:00
Alexander Smorkalov
62f1a7410d
Merge pull request #24766 from asmorkalov:update_version_4.9.0-pre
pre: OpenCV 4.9.0 (version++)
2023-12-25 16:04:53 +03:00
Alexander Smorkalov
b407c58b96 pre: OpenCV 4.9.0 (version++). 2023-12-25 15:20:10 +03:00
Yuantao Feng
f978c99523
Merge pull request #24753 from fengyuentau:einsum_importer
dnn onnx: support constaint inputs in einsum importer #24753 

Merge with https://github.com/opencv/opencv_extra/pull/1132.

Resolves https://github.com/opencv/opencv/issues/24697

Credits to @LaurentBerger.

---

This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-25 14:42:05 +03:00
Alexander Alekhin
f49b26182b dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
Alexander Alekhin
96b894e0e1 Merge pull request #24761 from opencv-pushbot:gitee/alalek/test_skip_update_win32 2023-12-25 08:27:30 +00:00
Alexander Alekhin
f8502d45f9 dnn(test): skip tests on 32-bit Windows 2023-12-25 07:23:45 +00:00
Alexander Smorkalov
953dddd26b
Merge pull request #24747 from asmorkalov:as/tune_vitb_cuda
Increate Vit_b test threshold a bit for CUDA FP16.
2023-12-22 17:04:46 +03:00
Dmitry Kurtaev
938bc4d503 [CUDA] Hotfix Scale with 1 parameter 2023-12-22 15:49:27 +03:00
Dhanwanth1803
027aee8ad4
Merge pull request #24384 from Dhanwanth1803:feat-crop
Fixes #22747. Support [crop] configuration for DarkNet #24384

Request for comments. This is my first PR. 

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1112

resolves https://github.com/opencv/opencv/issues/22747

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-12-22 14:55:01 +03:00
Alexander Smorkalov
53cd921ab4 Increate Vit_b test threshold a bit for CUDA FP16. 2023-12-22 13:37:44 +03:00
Vadim Pisarevsky
853e5dfcdf
Merge pull request #24709 from vpisarev:winograd_mode
Try to enable Winograd by default in FP32 mode and disable it by default in FP16 mode #24709

Hopefully, it will resolve regressions since 4.8.1 (see also https://github.com/opencv/opencv/pull/24587)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-12-22 09:22:31 +03:00
Alexander Smorkalov
35e2ef8019
Merge pull request #24740 from opencv-pushbot:gitee/alalek/ocl_fix_kernel_compilation
ocl: fix kernels compilation
2023-12-22 09:20:47 +03:00
Alexander Smorkalov
f5d8245801
Merge pull request #24736 from opencv-pushbot:gitee/alalek/issue_24734
dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16
2023-12-21 20:01:01 +03:00
Alexander Alekhin
3340c71a2a ocl: fix kernels compilation 2023-12-21 14:29:23 +00:00
Alexander Alekhin
c9bb92d58b dnn(test): tune FP16 test tolerance 2023-12-21 13:39:05 +00:00
Alexander Alekhin
99c94d3d83 dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16
- OpenCL kernel code doesn't support that
2023-12-21 13:30:57 +00:00
llh721113
a30c987f87 feat: RVP052 Optimization for DNN int8layers 2023-12-21 14:51:41 +08:00
Yuantao Feng
0521a3a384
Merge pull request #24476 from fengyuentau:attention_layer
dnn: add attention layer #24476

Resolves #24609

Merge with: https://github.com/opencv/opencv_extra/pull/1128.

Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.

TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
    - [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack

## Benchmarks

Platform: Macbook Air M1.

### Attention Subgraph

Input scale: [1, 197, 768].

|                        | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75      | 3.68        | 3.22     |
| w/o Attention          | 9.06      | 9.01        | 8.24     |
| ORT (python)           | 4.32      | 2.63        | 2.50     |

### ViTs

All data in millisecond (ms).

| ViTs     | With Attention | Without Attention | ORT    |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77         | 365.35            | 109.70 |
| vit_b_32 | 89.92          | 116.22            | 30.36  |
| vit_l_16 | 1593.32        | 1730.74           | 419.92 |
| vit_l_32 | 468.11         | 577.41            | 134.12 |
| VitTrack | 3.80           | 3.87              | 2.25   |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-20 19:35:07 +03:00
Laurent Berger
3e6dcdc0a4
Merge pull request #24539 from LaurentBerger:blobrecttoimage
Add blobrecttoimage #24539

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/14659

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #14659
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-19 20:00:04 +03:00
Yuantao Feng
fa5ed62a66
Merge pull request #24694 from fengyuentau:matmul_refactor
dnn: refactor ONNX MatMul with fastGemm #24694

Done:
- [x] add backends
    - [x] CUDA
    - [x] OpenVINO
    - [x] CANN
    - [x] OpenCL
    - [x] Vulkan
- [x] add perf tests
- [x] const B case

### Benchmark

Tests are done on M1. All data is in milliseconds (ms).

| Configuration | MatMul (Prepacked) | MatMul | InnerProduct |
| - | - | - | - |
| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 | **0.39** | 0.41 | 1.33 |
| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0  | **0.42** | 0.42 | 1.17 |
| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0    | **0.13** | 0.15 | 0.33 |
| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0    | **0.11** | 0.13 | 0.22 |
| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 | **0.46** | 0.54 | 1.46 |
| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0  | **0.46** | 0.95 | 1.74 |
| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0    | **0.18** | 0.32 | 0.43 |
| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0    | **0.15** | 0.25 | 0.25 |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-19 19:36:41 +03:00
Wanli
6ae1709c6a
Merge pull request #24613 from WanliZhong:softmax_default_axis
Make default axis of softmax in onnx "-1" without opset option #24613

Try to solve problem: https://github.com/opencv/opencv/pull/24476#discussion_r1404821158

**ONNX**
`opset <= 11` use 1
`else` use -1

**TensorFlow**
`TF version = 2.x` use -1
`else` use 1

**Darknet, Caffe, Torch**
use 1 by definition
2023-12-15 10:41:42 +03:00
Wanli
9bbc890d96
Merge pull request #24681 from WanliZhong:err_armv8
Fixed armv8 compilation warnings #24681 

Fixes the following warning on  armv8:
```
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
```
Buildbot: https://pullrequest.opencv.org/buildbot/builders/4_x_ARMv8-lin
2023-12-12 15:38:07 +03:00
Wanli
6ee71fee88
Merge pull request #24547 from WanliZhong:refactor_conv_perf_test
Classify and extend convolution and depthwise performance tests #24547

This PR aims to:
1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added)

2. Classify the existing convolution performance test to below cases
    - CONV_1x1
    - CONV_3x3_S1_D1 (winograd)
    - CONV
    - DEPTHWISE

3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
(i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
(ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]`
(iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... `

> **Reproduced**: 1. follow step in https://github.com/alalek/opencv/commit/dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt)


**Performance test result on Apple M2**

**Test result details**:  [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md)

**Additional test result details with FP16**:  [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip)


**Brief summary for 4.8.1 vs 4.7.0 or 4.6.0**: 
1. `CONV_1x1_S1_D1` dropped significant with small or large input shape.
2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0. 

---

**Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)**: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

**Test result details**: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md)
**Brief summary for 4.8.1 vs 4.5.5**: 
1. `CONV_5x5_S1_D1` dropped significant. 
2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape.

---

TODO:
- [x] Perform tests on arm with each opencv version
- [x] Perform tests on x86 with each opencv version
- [x] Split each test classification with single test config
- [x] test enable fp16
2023-12-11 21:35:33 +03:00
Abduragim Shtanchaev
d3dd2e463c
Merge pull request #24611 from Abdurrahheem:ash/add_yolov6_test
Add test for YoloX Yolo v6 and Yolo v8 #24611

This PR adds test for YOLOv6 model (which was absent before)
The onnx weights for the test are located in this PR [ #1126](https://github.com/opencv/opencv_extra/pull/1126)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-11 16:42:51 +03:00
Dmitry Kurtaev
ac4b26a561 Replace Slice optional inputs removal to adjustment 2023-12-08 23:29:52 +03:00
Yuantao Feng
a2edf4d929
Merge pull request #24647 from fengyuentau:cuda_sub
dnn cuda: support Sub #24647

Related https://github.com/opencv/opencv/issues/24606#issuecomment-1837390257

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-06 13:46:24 +03:00
Yuantao Feng
f5ec92e4ca
Merge pull request #24655 from fengyuentau:graph_simplifier_optional_input
dnn onnx graph simplifier: handle optional inputs of Slice #24655

Resolves https://github.com/opencv/opencv/issues/24609

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-06 13:43:54 +03:00
Alexander Smorkalov
7b1a5fb3de Migrate Android Face Detection sample to DNN. 2023-11-29 11:02:44 +03:00
Abduragim Shtanchaev
5278560252
Merge pull request #24569 from Abdurrahheem:ash/padding_value_fix
Add support for custom padding in DNN preprocessing #24569

This PR add functionality for specifying value in padding.
It is required in many preprocessing pipelines in DNNs such as Yolox object detection model

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-28 11:54:09 +03:00
Dmitry Kurtaev
332748dd55
Merge pull request #24577 from dkurt:dnn_graph_match_stack
Fix graph fusion with commutative ops #24577

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/24568

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1125

TODO:
- [x]  replace recursive function to sequential

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-24 10:40:32 +03:00
skycat8
848dd12a1f
Merge pull request #24553 from skycat8:yolov5
Add yolov5n to tests #24553

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ X] I agree to contribute to the project under Apache 2 License.
- [ X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ X] The PR is proposed to the proper branch
- [ X] There is a reference to the original bug report and related work
- [ X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ X] The feature is well documented and sample code can be built with the project CMake
2023-11-24 10:36:06 +03:00
Yuantao Feng
d05fb709f9
Merge pull request #24552 from fengyuentau:layernorm_backends
dnn: add openvino, opencl and cuda backends for layer normalization layer #24552

Merge after https://github.com/opencv/opencv/pull/24544.

Todo:

- [x] openvino
- [x] opencl
- [x] cuda

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-21 15:33:01 +03:00
zihaomu
b913e73d04
DNN: add the Winograd fp16 support (#23654)
* add Winograd FP16 implementation

* fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime

* fixed some warnings

* hopefully fixed winograd on x64 (and maybe other platforms)

---------

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
2023-11-20 13:45:37 +03:00
Yuantao Feng
a478757483
Merge pull request #24544 from fengyuentau:layernorm_conformance
dnn test: move layer norm tests into conformance tests #24544

Merge with https://github.com/opencv/opencv_extra/pull/1122

## Motivation

Some ONNX operators, such as `LayerNormalization`, `BatchNormalization` and so on, produce outputs for training (mean, stdev). So they have reference outputs of conformance tests for those training outputs as well. However, when it comes to inference, we do not need and produce those outputs for training here in dnn. Hence, output size does not match if we use dnn to infer those conformance models. This has become the barrier if we want to test these operators using their conformance tests.

<!--
| Operator                | Inference needed                    | Outputs (required - total) | Optional outputs for training? |
| ----------------------- | ----------------------------------- | -------------------------- | ------------------------------ |
| BatchNormalization      | Yes                                 | 1 - 3                      | Yes                            |
| Dropout                 | Maybe, can be eliminated via fusion | 1 - 2                      | Yes                            |
| GRU                     | Yes                                 | 0 - 2                      | No                             |
| LSTM                    | Yes                                 | 0 - 3                      | No                             |
| LayerNormalization      | Yes                                 | 1 - 3                      | Yes                            |
| MaxPool                 | Yes                                 | 1 - 2                      | Yes                            |
| RNN                     | Yes                                 | 0 - 2                      | No                             |
| SoftmaxCrossEntropyLoss | No                                  | 1 - 2                      | --                             |
-->

**I checked all ONNX operators with optional outputs. Turns out there are only `BatchNormalization`, `Dropout`, `LayerNormalization` and `MaxPool` has optional outputs for training. All except `LayerNormalization` have models set for training mode and eval mode. Blame ONNX for that.**

## Solution

In this pull request, we remove graph outputs if the graph looks like the following:

```
    [X]   [Scale]  [Bias]                      [X]   [Scale]  [Bias]
      \      |      /         this patch         \      |      /
     LayerNormalization      ----------->       LayerNormalization
      /      |      \                                   |
    [Y]    [Mean]  [Stdev]                             [Y]
```

We can update conformance tests and turn on some cases as well if extending to more layers.

Notes:
1. This workaround does not solve expanded function operators if they are fused into a single operator, such as `$onnx/onnx/backend/test/data/node/test_layer_normalization_2d_axis1_expanded`, but they can be run without fusion. Note that either dnn or onnxruntime does not fuse those expanded function operators.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-20 11:19:24 +03:00
Abduragim Shtanchaev
8c10545d3c
Merge pull request #24509 from Abdurrahheem:ash/dev_einsum_fast_gemm
Fast gemm for einsum #24509

## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-16 16:20:17 +03:00
Yuantao Feng
024dfd54af
dnn cann backend: add hardswish, layernorm and instasnce norm for cann and bug fix (#24462)
* add hardswish for cann

* gemm cann bug fix

* fix indentation

* cann: add layer norm

* cann: add instance norm

* add supportBackend

* cann: layer norm does not support axis=-1 due to 1d mat issue

* disable instance norm for now

* fix doc

* remove tensor desc initialization for 1D tensor
2023-11-15 17:57:52 +03:00
fengyuentau
031846f2e1 remove filter 2023-11-13 14:47:40 +08:00
Alexander Smorkalov
960a926055
Merge pull request #24510 from asmorkalov:as/softmax_rvv
Enable softmax layer vectorization on RISC-V RVV #24510 

Related: https://github.com/opencv/opencv/pull/24466

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-11-11 09:09:14 +03:00
Dmitry Kurtaev
b7ec2ebb55
Merge pull request #24483 from dkurt:dnn_fusion_commutative_ops
Commutative rules for DNN subgraphs fusion #24483

### Pull Request Readiness Checklist

related: https://github.com/opencv/opencv/pull/24463#issuecomment-1783033931

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-08 16:26:33 +03:00
Alexander Smorkalov
34f34f6227 Merge branch 4.x 2023-11-08 14:39:48 +03:00
Abduragim Shtanchaev
9d0c8a9edb
Merge pull request #24445 from Abdurrahheem:ash/dev_einsum_pref
Einsum Layer Performance Test #24445

## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

**Notation:**
- WX: windows10_x64
- MX: macos_x64
- MA: macos_arm64
- UX: ubuntu_x64
- UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

---

Benchmarks:


| Equation                | Inputs Mat Dims                   | UX (ms)        | UA (ms) | MX (ms) | MA (ms) | WX (ms) |
|-------------------------|-----------------------------------|----------------|---------|---------|---------|---------|
| "ij, jk -> ik"          | [2, 3], [3,2]                     | 0.04 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [20, 30], [30,20]                 | 0.08 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [113, 127], [127,113]             | 2.41 ± 0.05    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 7, 9], [1, 5, 9, 8]        | 0.11 ± 0.00    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 70, 90], [1, 5, 90, 80]    | 15.49 ± 0.46   | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 73, 91], [1, 5, 91, 57]    | 11.53 ± 0.06   | -       | -       | -       | -       |
| "ij -> i"               | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "ij -> i"               | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [40, 50], [50,80]                 | 0.37 ± 0.01    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [47, 51], [51, 83]                | 0.43 ± 0.01    | -       | -       | -       | -       |

-----

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-08 11:56:21 +03:00
Yuantao Feng
6079e22523
Merge pull request #24500 from fengyuentau:test_layer_fusion
dnn (onnx): add subgraph fusion tests #24500

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-07 17:40:31 +03:00
Yuantao Feng
ee0822dc4d
Merge pull request #24378 from fengyuentau:instance_norm
dnn onnx: add instance norm layer #24378

Resolves https://github.com/opencv/opencv/issues/24377
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644

| Perf | multi-thread | single-thread |
| - | - | - |
| x: [2, 64, 180, 240] | 3.95ms | 11.12ms |

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

```
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
modules_filter:Custom=none
disable_ipp:Custom=ON
```
2023-11-07 12:59:10 +03:00
Wanli
ed52f7feea
Improve and refactor softmax layer (#24466)
* improve and refactor softmax layer

* fix building error

* compatible region layer

* fix axisStep when disable SIMD

* fix dynamic array

* try to fix error

* use nlanes from VTraits

* move axisBias to srcOffset

* fix bug caused by axisBias

* remove macro

* replace #ifdef with #if for CV_SIMD
2023-11-06 04:48:32 +03:00
Dmitry Kurtaev
fa56623458
Merge pull request #24463 from dkurt:dnn_shared_nodes_fusion
DNN graph fusion with shared nodes #24463

### Pull Request Readiness Checklist

For now, nodes from matched pattern are removed during the matching process so if nodes are used in similar subgraph, they cannot be found.

required for https://github.com/opencv/opencv/pull/24397

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1115

A part from [model_name ](https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/fcn/model/fcn-resnet101-11.onnx) with two Resize subgraphs with shared nodes:
![image](https://github.com/opencv/opencv/assets/25801568/611d89d9-12fb-4add-9218-13b10d2c086a)

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-03 12:34:09 +03:00
Yuantao Feng
c91af16fa7
Merge pull request #24409 from fengyuentau:norm_kernel
dnn: add shared fastNorm kernel for mvn, instance norm and layer norm #24409

Relates https://github.com/opencv/opencv/pull/24378#issuecomment-1756906570

TODO:

- [x] add fastNorm
- [x] refactor layer norm with fastNorm
- [x] refactor mvn with fastNorm
- [ ] add onnx mvn in importer (in a new PR?)
- [ ] refactor instance norm with fastNorm (in another PR https://github.com/opencv/opencv/pull/24378, need to merge this one first though)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-01 14:33:57 +03:00
alexlyulkov
b71be65f57
Merge pull request #24294 from alexlyulkov:al/remove-torch7-from-dnn
Remove torch (old torch7) from dnn in 5.x #24294

Merge with https://github.com/opencv/opencv_extra/pull/1097

Completely removed torch (old torch7) from dnn:
- removed modules/dnn/src/torch directory that contained torch7 model parser
- removed readNetFromTorch() and readTorchBlob() public functions
- removed torch7 references from comments and help texts
- replaced links to t7 models by links to similar onnx models in js_style_transfer turtorial (similar to https://github.com/opencv/opencv/pull/24245/files)
2023-10-26 11:27:56 +03:00
Kumataro
1911c63826
fix: supress GCC13 warnings (#24434)
* fix: supress GCC13 warnings

* fix for review and compile-warning on MacOS
2023-10-26 09:00:58 +03:00
Abduragim Shtanchaev
a3b3a589f9
Merge pull request #24322 from Abdurrahheem:ash/dev_einsum_ellips
Ellipses supported added for Einsum Layer #24322

This PR added addresses issues not covered in #24037. Namely these are: 
Test case for this patch is in this PR [#1106](https://github.com/opencv/opencv_extra/pull/1106) in opencv extra

Added: 
 - [x] Broadcasting reduction "...ii ->...I"
 - [x] Add lazy shape deduction. "...ij, ...jk->...ik"
 
 Features to add: 
- [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik")
- [ ] Add support for CUDA backend 
- [ ] BatchWiseMultiply optimize
- [ ] Performance test

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-24 16:47:00 +03:00
Alexander Smorkalov
97620c053f Merge branch 4.x 2023-10-23 11:53:04 +03:00
Amir Hassan
c2f909fc86
Merge pull request #23894 from kallaballa:blobFromImagesWithParams
Pertaining Issue: https://github.com/opencv/opencv/issues/5697

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-20 14:27:40 +03:00
Yuantao Feng
996b6c37c7
Merge pull request #24425 from fengyuentau:fix_timvx_test
dnn: fix HAVE_TIMVX macro definition in dnn test #24425

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-20 14:16:51 +03:00
Alexander Smorkalov
1c0ca41b6e
Merge pull request #24371 from hanliutong:clean-up
Clean up the obsolete API of Universal Intrinsic
2023-10-20 12:50:26 +03:00
andrewerf
b44cb33d2f
Merge pull request #21066 from andrewerf:21052-openvino-native-onnx
Native ONNX to Inference Engine backend #21066

Resolves #21052

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
- [x] The PR is proposed to proper branch
- [x] There is reference to original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-10-20 11:49:27 +03:00
fengyuentau
f2ef81a179 fp16 support for gather elements 2023-10-19 14:44:12 +08:00
Kumataro
6e4280ea81
Merge pull request #24372 from Kumataro:fix24369
Supporting protobuf v22 and later(with abseil-cpp/C++17) #24372

fix https://github.com/opencv/opencv/issues/24369
related https://github.com/opencv/opencv/issues/23791

1. This patch supports external protobuf v22 and later, it required abseil-cpp and c++17.
    Even if the built-in protobuf is upgraded to v22 or later, 
    the dependency on abseil-cpp and the requirement for C++17 will continue.
2. Some test for caffe required patched protobuf, so this patch disable them.

This patch is tested by following libraries.
-  Protobuf:                    /usr/local/lib/libprotobuf.so (4.24.4)
-  abseil-cpp:                YES (20230125)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-19 08:45:08 +03:00
Aser Atawya
240b245105
Merge pull request #24092 from Aser-Abdelfatah:GSoC_Support_GatherElements_ONNX
GSoC Add ONNX Support for GatherElements #24092

Merge with: https://github.com/opencv/opencv_extra/pull/1082
Adds support to the ONNX operator GatherElements [operator docs](https://github.com/onnx/onnx/blob/main/docs/Operators.md#GatherElements)
Added tests to opencv_extra at pull request https://github.com/opencv/opencv_extra/pull/1082

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-18 10:41:47 +03:00
alexlyulkov
014e8485b5
Merge pull request #24367 from alexlyulkov:al/fixed-cumsum-inplace-flag
Fixed CumSum layer inplace flag #24367

When exclusive is false:
dst[i] = dst[i-1] + src[i]
When exclusive is true:
dst[i] = dst[i-1] + src[i-1]
So CumSum layer can be inplace only when exclusive flag is false.
2023-10-18 09:21:40 +03:00
Yuantao Feng
d789cb459c
Merge pull request #24231 from fengyuentau:halide_cleanup_5.x
dnn: cleanup of halide backend for 5.x #24231

Merge with https://github.com/opencv/opencv_extra/pull/1092.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-13 16:53:18 +03:00
Liutong HAN
a287605c3e Clean up the Universal Intrinsic API. 2023-10-13 19:23:30 +08:00
Yuantao Feng
0507043a55
Merge pull request #24386 from fengyuentau:fix_dtype_nary_eltwise
dnn: fix inconsistent input dtype for nary eltwise layers #24386

Resolves https://github.com/opencv/opencv/issues/24385
Merge with https://github.com/opencv/opencv_extra/pull/1107
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1353964405

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-10-13 11:56:18 +03:00
Alexander Smorkalov
58285e5468
Merge pull request #24359 from asmorkalov:as/FastNeuralStyle_eccv16_tuning
Tuned threshold for FastNeuralStyle_eccv16 test
2023-10-13 10:29:41 +03:00
Yuantao Feng
590f150d5e
dnn: hotfixes for fast gemm (#24315)
* remove Conformance from test names

* integrate neon optimization into default

* quick fix: define CV_NEON_AARCH64 0 for non NEON platforms

* remove var batch that leads to memory leak

* put neon code back to fast_gemm_kernels.simd

* reorganize code to reduce duplicate code
2023-10-07 21:48:44 +03:00
Sean McBride
5fb3869775
Merge pull request #23109 from seanm:misc-warnings
* Fixed clang -Wnewline-eof warnings
* Fixed all trivial clang -Wextra-semi and -Wc++98-compat-extra-semi warnings
* Removed trailing semi from various macros
* Fixed various -Wunused-macros warnings
* Fixed some trivial -Wdocumentation warnings
* Fixed some -Wdocumentation-deprecated-sync warnings
* Fixed incorrect indentation
* Suppressed some clang warnings in 3rd party code
* Fixed QRCodeEncoder::Params documentation.

---------

Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
2023-10-06 13:33:21 +03:00
HAN Liutong
07bf9cb013
Merge pull request #24325 from hanliutong:rewrite
Rewrite Universal Intrinsic code: float related part #24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
#23885 First patch, an example
#23980 Core module
#24058 ImgProc module, part 1
#24132 ImgProc module, part 2
#24166 ImgProc module, part 3
#24301 Features2d and calib3d module
#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-10-05 17:57:25 +03:00
Dmitry Kurtaev
2c92eb3175 Enable more tests for OpenVINO 2023.0 2023-10-05 12:51:55 +03:00
Alexander Smorkalov
33d64d0491 Tuned threshold for FastNeuralStyle_eccv16 test for systems without AVX2. 2023-10-04 16:19:13 +03:00
Wanli
62b5470b78
Merge pull request #24298 from WanliZhong:extend_perf_net_test
Extend performance test models #24298

**Merged With https://github.com/opencv/opencv_extra/pull/1095**

This PR aims to extend the performance tests. 

- **YOLOv5** for object detection
- **YOLOv8** for object detection
- **EfficientNet** for classification

Models from OpenCV Zoo:

- **YOLOX** for object detection
- **YuNet** for face detection
- **SFace** for face recognization
- **MPPalm** for palm detection
- **MPHand** for hand landmark
- **MPPose** for pose estimation
- **ViTTrack** for object tracking
- **PPOCRv3** for text detection
- **CRNN** for text recognization
- **PPHumanSeg** for human segmentation

If other models should be added, **please leave some comments**. Thanks!



Build opencv with script:
```shell
-DBUILD_opencv_python2=OFF
-DBUILD_opencv_python3=OFF
-DBUILD_opencv_gapi=OFF
-DINSTALL_PYTHON_EXAMPLES=OFF
-DINSTALL_C_EXAMPLES=OFF
-DBUILD_DOCS=OFF
-DBUILD_EXAMPLES=OFF
-DBUILD_ZLIB=OFF
-DWITH_FFMPEG=OFF
```



Performance Test on **Apple M2 CPU**
```shell
MacOS 14.0
8 threads
```

**1 thread:**
| Name of Test | 4.5.5-1th | 4.6.0-1th | 4.7.0-1th | 4.8.0-1th | 4.8.1-1th |
|--------------|:---------:|:---------:|:---------:|:---------:|:---------:|
| CRNN         |  76.244   |  76.611   |  62.534   |  57.678   |  57.238   |
| EfficientNet |    ---    |    ---    |  109.224  |  130.753  |  109.076  |
| MPHand       |    ---    |    ---    |  19.289   |  22.727   |  27.593   |
| MPPalm       |  47.150   |  47.061   |  41.064   |  65.598   |  40.109   |
| MPPose       |    ---    |    ---    |  26.592   |  32.022   |  26.956   |
| PPHumanSeg   |  41.672   |  41.790   |  27.819   |  27.212   |  30.461   |
| PPOCRv3      |    ---    |    ---    |  140.371  |  187.922  |  170.026  |
| SFace        |  43.830   |  43.834   |  27.575   |  30.653   |  26.387   |
| ViTTrack     |    ---    |    ---    |    ---    |  14.617   |  15.028   |
| YOLOX        | 1060.507  | 1061.361  |  495.816  |  533.309  |  549.713  |
| YOLOv5       |    ---    |    ---    |    ---    |  191.350  |  193.261  |
| YOLOv8       |    ---    |    ---    |  198.893  |  218.733  |  223.142  |
| YuNet        |  27.084   |  27.095   |  26.238   |  30.512   |  34.439   |
| MobileNet_SSD_Caffe         |  44.742   |  44.565   |  33.005   |  29.421   |  29.286   |
| MobileNet_SSD_v1_TensorFlow |  49.352   |  49.274   |  35.163   |  32.134   |  31.904   |
| MobileNet_SSD_v2_TensorFlow |  83.537   |  83.379   |  56.403   |  42.947   |  42.148   |
| ResNet_50                   |  148.872  |  148.817  |  77.331   |  67.682   |  67.760   |


**n threads:**
| Name of Test | 4.5.5-nth | 4.6.0-nth | 4.7.0-nth | 4.8.0-nth | 4.8.1-nth |
|--------------|:---------:|:---------:|:---------:|:---------:|:---------:|
| CRNN         |  44.262   |  44.408   |  41.540   |  40.731   |  41.151   |
| EfficientNet |    ---    |    ---    |  28.683   |  42.676   |  38.204   |
| MPHand       |    ---    |    ---    |   6.738   |  13.126   |   8.155   |
| MPPalm       |  16.613   |  16.588   |  12.477   |  31.370   |  17.048   |
| MPPose       |    ---    |    ---    |  12.985   |  19.700   |  16.537   |
| PPHumanSeg   |  14.993   |  15.133   |  13.438   |  15.269   |  15.252   |
| PPOCRv3      |    ---    |    ---    |  63.752   |  85.469   |  76.190   |
| SFace        |  10.685   |  10.822   |   8.127   |   8.318   |   7.934   |
| ViTTrack     |    ---    |    ---    |    ---    |  10.079   |   9.579   |
| YOLOX        |  417.358  |  422.977  |  230.036  |  234.662  |  228.555  |
| YOLOv5       |    ---    |    ---    |    ---    |  74.249   |  75.480   |
| YOLOv8       |    ---    |    ---    |  63.762   |  88.770   |  70.927   |
| YuNet        |   8.589   |   8.731   |  11.269   |  16.466   |  14.513   |
| MobileNet_SSD_Caffe         |  12.575   |  12.636   |  11.529   |  12.114   |  12.236   |
| MobileNet_SSD_v1_TensorFlow |  13.922   |  14.160   |  13.078   |  12.124   |  13.298   |
| MobileNet_SSD_v2_TensorFlow |  25.096   |  24.836   |  22.823   |  20.238   |  20.319   |
| ResNet_50                   |  41.561   |  41.296   |  29.092   |  30.412   |  29.339   |


Performance Test on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)
```shell
Ubuntu 22.04.2 LTS
8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz)
4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz)
20 threads
```


**1 thread:**
| Name of Test | 4.5.5-1th | 4.6.0-1th | 4.7.0-1th | 4.8.0-1th | 4.8.1-1th |
|--------------|:---------:|:---------:|:---------:|:---------:|:---------:|
| CRNN         |  16.752   |  16.851   |  16.840   |  16.625   |  16.663   |
| EfficientNet |    ---    |    ---    |  61.107   |  76.037   |  53.890   |
| MPHand       |    ---    |    ---    |   8.906   |   9.969   |   8.403   |
| MPPalm       |  24.243   |  24.638   |  18.104   |  35.140   |  18.387   |
| MPPose       |    ---    |    ---    |  12.322   |  16.515   |  12.355   |
| PPHumanSeg   |  15.249   |  15.303   |  10.203   |  10.298   |  10.353   |
| PPOCRv3      |    ---    |    ---    |  87.788   |  144.253  |  90.648   |
| SFace        |  15.583   |  15.884   |  13.957   |  13.298   |  13.284   |
| ViTTrack     |    ---    |    ---    |    ---    |  11.760   |  11.710   |
| YOLOX        |  324.927  |  325.173  |  235.986  |  253.653  |  254.472  |
| YOLOv5       |    ---    |    ---    |    ---    |  102.163  |  102.621  |
| YOLOv8       |    ---    |    ---    |  87.013   |  103.182  |  103.146  |
| YuNet        |  12.806   |  12.645   |  10.515   |  12.647   |  12.711   |
| MobileNet_SSD_Caffe         |  23.556   |  23.768   |  24.304   |  22.569   |  22.602   |
| MobileNet_SSD_v1_TensorFlow |  26.136   |  26.276   |  26.854   |  24.828   |  24.961   |
| MobileNet_SSD_v2_TensorFlow |  43.521   |  43.614   |  46.892   |  44.044   |  44.682   |
| ResNet_50                   |  73.588   |  73.501   |  75.191   |  66.893   |  65.144   |


**n thread:**
| Name of Test | 4.5.5-nth | 4.6.0-nth | 4.7.0-nth | 4.8.0-nth | 4.8.1-nth | 
|--------------|:---------:|:---------:|:---------:|:---------:|:---------:|
| CRNN         |   8.665   |   8.827   |  10.643   |   7.703   |   7.743   | 
| EfficientNet |    ---    |    ---    |  16.591   |  12.715   |   9.022   |   
| MPHand       |    ---    |    ---    |   2.678   |   2.785   |   1.680   |           
| MPPalm       |   5.309   |   5.319   |   3.822   |  10.568   |   4.467   |       
| MPPose       |    ---    |    ---    |   3.644   |   6.088   |   4.608   |        
| PPHumanSeg   |   4.756   |   4.865   |   5.084   |   5.179   |   5.148   |        
| PPOCRv3      |    ---    |    ---    |  32.023   |  50.591   |  32.414   |      
| SFace        |   3.838   |   3.980   |   4.629   |   3.145   |   3.155   |       
| ViTTrack     |    ---    |    ---    |    ---    |  10.335   |  10.357   |   
| YOLOX        |  68.314   |  68.081   |  82.801   |  74.219   |  73.970   |      
| YOLOv5       |    ---    |    ---    |    ---    |  47.150   |  47.523   |    
| YOLOv8       |    ---    |    ---    |  32.195   |  30.359   |  30.267   |    
| YuNet        |   2.604   |   2.644   |   2.622   |   3.278   |   3.349   |    
| MobileNet_SSD_Caffe         |  13.005   |   5.935   |   8.586   |   4.629   |   4.713   |
| MobileNet_SSD_v1_TensorFlow |   7.002   |   7.129   |   9.314   |   5.271   |   5.213   |
| MobileNet_SSD_v2_TensorFlow |  11.939   |  12.111   |  22.688   |  12.038   |  12.086   |
| ResNet_50                   |  18.227   |  18.600   |  26.150   |  15.584   |  15.706   |
2023-10-04 13:05:32 +03:00
alexlyulkov
9bd14d5417
Merge pull request #24353 from alexlyulkov:al/fixed-cumsum-layer
Fixed CumSum dnn layer #24353

Fixes #20110

The algorithm had several errors, so I rewrote it.
Also the layer didn't work with non constant axis tensor. Fixed it.
Enabled CumSum layer tests from ONNX conformance.
2023-10-03 13:58:25 +03:00
Alexander Smorkalov
163d544ecf Merge branch 4.x 2023-10-02 10:17:23 +03:00
Alexander Smorkalov
5caee5cc64 Fixed OpenCL PF16 fallback in Einsum layer. 2023-09-29 15:52:23 +03:00
Dmitry Kurtaev
c7ec0d599a
Merge pull request #23987 from dkurt:openvino_int8_backend
OpenVINO backend for INT8 models #23987

### Pull Request Readiness Checklist

TODO:
- [x] DetectionOutput layer (https://github.com/opencv/opencv/pull/24069)
- [x] Less FP32 fallbacks (i.e. Sigmoid, eltwise sum)
- [x] Accuracy, performance tests (https://github.com/opencv/opencv/pull/24039)
- [x] Single layer tests (convolution)
- [x] ~~Fixes for OpenVINO 2022.1 (https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100334)~~


Performace results for object detection model `coco_efficientdet_lite0_v1_1.0_quant_2021_09_06.tflite`:
| backend | performance (median time) |
|---|---|
| OpenCV | 77.42ms |
| OpenVINO 2023.0 | 10.90ms |

CPU: `11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz`

Serialized model per-layer stats (note that Convolution should use `*_I8` primitives if they are quantized correctly): https://gist.github.com/dkurt/7772bbf1907035441bb5454f19f0feef

---

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-28 16:24:43 +03:00
Alexander Smorkalov
b8d4ac589d
Merge pull request #24334 from fengyuentau:fix_24319
dnn onnx: fix not-found constant indices for Gather if shared
2023-09-28 13:08:26 +03:00
fengyuentau
7fa0493ca0 init commit 2023-09-28 11:50:21 +08:00
Yuantao Feng
307324f4ac
Merge pull request #24283 from fengyuentau:halide_tests
dnn: merge tests from test_halide_layers to test_backends #24283

Context: https://github.com/opencv/opencv/pull/24231#pullrequestreview-1628649980

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-27 14:09:47 +03:00
Dmitry Kurtaev
2b6d0f36f0
Merge pull request #24309 from dkurt:gemm_ov_hotfix
Update OpenVINO init of new GEMM layer #24309

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

CI validation:

- [x] 2022.1.0: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100368
- [ ] 2021.4.2: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/100373

Checklist:
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-27 10:25:45 +03:00
Yuantao Feng
bb171a0c05
dnn: expand refactor with cv::broadcast for onnx models (#24295)
* add expand impl with cv::broadcast

* remove expandMid

* deduce shape from -1

* add constant folding

* handle input constant; handle input constant 1d

* add expand conformance tests; add checks to disallow shape of neg values; add early copy for unchanged total elements

* fix ExpandSubgraph

* dummy commit to trigger build

* dummy commit to trigger build 1

* remove conformance from test names
2023-09-27 09:28:52 +03:00
Alexander Smorkalov
9942757bab
Merge pull request #24316 from alexlyulkov:al/fix-caffe-read-segfault
Fixed segfault when reading Caffe model
2023-09-25 17:53:54 +03:00
Alexander Lyulkov
72e7672a6c Fixed segfault when reading Caffe model 2023-09-25 12:55:11 +07:00
Abduragim Shtanchaev
865e7cacca
Merge pull request #24037 from Abdurrahheem:ash/dev_einsum
Add Support for Einsum Layer #24037

### This PR adding support for [Einsum Layer](https://pytorch.org/docs/stable/generated/torch.einsum.html) (in progress). 

This PR is currently not to be merged but only reviewed. Test cases are located in [#1090](https://github.com/opencv/opencv_extra/pull/1090)RP in OpenCV extra

**DONE**: 
 - [x] 2-5D GMM support added
 - [x] Matrix transpose support added
 - [x] Reduction type comupte  'ij->j'
 - [x] 2nd shape computation - during forward 

**Next PRs**:
- [ ] Broadcasting reduction "...ii ->...i"
- [ ] Add lazy shape deduction. "...ij, ...jk->...ik"
- [ ] Add implicit output computation support. "bij,bjk ->" (output subscripts should be "bik")
- [ ] Add support for CUDA backend 
- [ ] BatchWiseMultiply optimize

**Later in 5.x version (requires support for 1D matrices)**: 
- [ ] Add 1D vector multiplication support 
- [ ] Inter product "i, i" (problems with 1D shapes)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-09-22 11:25:02 +03:00
Vadim Pisarevsky
416bf3253d
attempt to add 0d/1d mat support to OpenCV (#23473)
* attempt to add 0d/1d mat support to OpenCV

* revised the patch; now 1D mat is treated as 1xN 2D mat rather than Nx1.

* a step towards 'green' tests

* another little step towards 'green' tests

* calib test failures seem to be fixed now

* more fixes _core & _dnn

* another step towards green ci; even 0D mat's (a.k.a. scalars) are now partly supported!

* * fixed strange bug in aruco/charuco detector, not sure why it did not work
* also fixed a few remaining failures (hopefully) in dnn & core

* disabled failing GAPI tests - too complex to dig into this compiler pipeline

* hopefully fixed java tests

* trying to fix some more tests

* quick followup fix

* continue to fix test failures and warnings

* quick followup fix

* trying to fix some more tests

* partly fixed support for 0D/scalar UMat's

* use updated parseReduce() from upstream

* trying to fix the remaining test failures

* fixed [ch]aruco tests in Python

* still trying to fix tests

* revert "fix" in dnn's CUDA tensor

* trying to fix dnn+CUDA test failures

* fixed 1D umat creation

* hopefully fixed remaining cuda test failures

* removed training whitespaces
2023-09-21 18:24:38 +03:00
Alexander Smorkalov
799bb0cd18
Merge pull request #24291 from visitorckw:fix-memory-leak
Fix memory leak and handle realloc failure
2023-09-20 08:49:56 +03:00
Yuantao Feng
8a96e34e33
dnn: add gemm_layer in place of fully_connected_layer for onnx models (#23897)
* first commit

* turned C from input to constant; force C constant in impl; better handling 0d/1d cases

* integrate with gemm from ficus nn

* fix const inputs

* adjust threshold for int8 tryQuantize

* adjust threshold for int8 quantized 2

* support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet

* add gemm perf against innerproduct

* add perf tests for innerproduct with bias

* fix perf

* add memset

* renamings for next step

* add dedicated perf gemm

* add innerproduct in perf_gemm

* remove gemm and innerproduct perf tests from perf_layer

* add perf cases for vit sizes; prepack constants

* remove batched gemm; fix wrong trans; optimize KC

* remove prepacking for const A; several fixes for const B prepacking

* add todos and gemm expression

* add optimized branch for avx/avx2

* trigger build

* update macros and signature

* update signature

* fix macro

* fix bugs for neon aarch64 & x64

* add backends: cuda, cann, inf_ngraph and vkcom

* fix cuda backend

* test commit for cuda

* test cuda backend

* remove debug message from cuda backend

* use cpu dispatcher

* fix neon macro undef in dispatcher

* fix dispatcher

* fix inner kernel for neon aarch64

* fix compiling issue on armv7; try fixing accuracy issue on other platforms

* broadcast C with beta multiplied; improve func namings

* fix bug for avx and avx2

* put all platform-specific kernels in dispatcher

* fix typos

* attempt to fix compile issues on x64

* run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon

* fix typo

* quick fix: add macros for pack4

* quick fix: use vmlaq_f32 for armv7

* quick fix for missing macro of fast gemm pack f32 4

* disable conformance tests when optimized branches are not supported

* disable perf tests when optimized branches are not supported

* decouple cv_try_neon and cv_neon_aarch64

* drop googlenet_2023; add fastGemmBatched

* fix step in fastGemmBatched

* cpu: fix initialization ofb; gpu: support batch

* quick followup fix for cuda

* add default kernels

* quick followup fix to avoid macro redef

* optmized kernels for lasx

* resolve mis-alignment; remove comments

* tune performance for x64 platform

* tune performance for neon aarch64

* tune for armv7

* comment time consuming tests

* quick follow-up fix
2023-09-20 00:53:34 +03:00
Kuan-Wei Chiu
e16ca08b33 Fix memory leak and handle realloc failure
In the previous code, there was a memory leak issue where the
previously allocated memory was not freed upon a failed realloc
operation. This commit addresses the problem by releasing the old
memory before setting the pointer to NULL in case of a realloc failure.
This ensures that memory is properly managed and avoids potential
memory leaks.
2023-09-18 22:43:44 +08:00
Alexander Smorkalov
157b0e7760
Merge pull request #24275 from alexlyulkov:al/fix-tf-graph-simplifier
Fixed removePhaseSwitches in tf_graph_simplifier
2023-09-18 11:02:44 +03:00