Commit Graph

2336 Commits

Author SHA1 Message Date
Yuantao Feng
3f13ce797b
Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace
dnn: add DepthToSpace and SpaceToDepth #25779

We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However,  it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes.

Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth.

Backend support:

- [x] CPU
- [x] CUDA
- [x] OpenCL
- [x] OpenVINO
- [x] CANN
- [x] TIMVX
-  ~Vulkan~ (missing fundamental tools, like permutation and reshape)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-06-21 19:28:22 +03:00
Dmitry Kurtaev
3700f9e1e9
Merge pull request #25709 from dkurt:wrap_addLayer
* Wrap dnn addLayer
* Add typing stubs
2024-06-07 20:39:44 +03:00
Kumataro
1bd5ca1ebe
Merge pull request #25686 from Kumataro:fix25674
Suppress build warnings for GCC14 #25686

Close #25674

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-06-02 14:14:04 +03:00
CNOCycle
98b8825031
Merge pull request #25613 from CNOCycle:tflite/ops
Support Global_Pool_2D ops in .tflite model #25613

### Pull Request Readiness Checklist

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1180

This PR adds support for `GlobalAveragePooling2D` and `GlobalMaxPool2D` on the TFlite backend. When the k`eep_dims` option is enabled, the output is a 2D tensor, necessitating the inclusion of an additional flatten layer. Additionally, the names of these layers have been updated to match the output tensor names generated by `generate.py` from the opencv_extra repository.

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
2024-05-31 19:31:21 +03:00
Abduragim Shtanchaev
d7f04a9d33
Merge pull request #25660 from Abdurrahheem:ash/fix-slice-empty-input
Slice layer parser fix to support empty input case #25660

This PR fixes Slice Layer's parser to handle empty input cases (cases with initializer)
It fixed the issue rased in #24838

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-05-31 13:13:36 +03:00
Danial Javady
05e48605a0
Merge pull request #25412 from ZelboK:update-cudnn-to-9
Refactor DNN module to build with cudnn 9 #25412

A lot of APIs that are currently being used in the dnn module have been removed in cudnn 9. They were deprecated in 8. 
This PR updates said code accordingly to the newer API.

Some key notes:
1) This is my first PR. I am new to openCV. 
2) `opencv_test_core` tests pass
3) On a 3080, cuda 12.4(should be irrelevant since I didn't build the `opencv_modules`, gcc 11.4, WSL 2. 
4) For brevity I will avoid including macro code that will allow for older versions of cudnn to build.

I was unable to get the tests working for `opencv_test_dnn` and `opencv_perf_dnn`. The errors I get are of the following: 
```
 OpenCV tests: Can't find required data file: dnn/onnx/conformance/node/test_reduce_prod_default_axes_keepdims_example/model.onnx in function 'findData'
" thrown in the test body.
```
So before I spend more time investigating I was hoping to get a maintainer to point me in the right direction here. I would like to run these tests and confirm things are working as intended. I may have missed some details.


### Pull Request Readiness Checklist

relevant issue
(https://github.com/opencv/opencv/issues/24983

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-05-28 09:54:08 +03:00
Alexander Smorkalov
0b39a51be8 pre: OpenCV 4.10.0 (version++). 2024-05-21 11:37:05 +03:00
Alexander Smorkalov
5f509e2ec1 Skip Test_Caffe_layers.Concat with Vulkan due to sporadic failures. 2024-05-17 11:54:25 +03:00
Yuantao Feng
bc0618b688
Merge pull request #25582 from fengyuentau:dnn/dump_pbtxt
Current net exporter `dump` and `dumpToFile` exports the network structure (and its params) to a .dot file which works with `graphviz`. This is hard to use and not friendly to new user. What's worse, the produced picture is not looking pretty.
dnn: better net exporter that works with netron #25582

This PR introduces new exporter `dumpToPbtxt` and uses this new exporter by default with environment variable `OPENCV_DNN_NETWORK_DUMP`. It mimics the string output of a onnx model but modified with dnn-specific changes, see below for an example.

![image](https://github.com/opencv/opencv/assets/17219438/0644bed1-da71-4019-8466-88390698e4df)

## Usage

Call `cv::dnn::Net::dumpToPbtxt`:

```cpp
TEST(DumpNet, dumpToPbtxt) {
    std::string path = "/path/to/model.onnx";
    auto net = readNet(path);

    Mat input(std::vector<int>{1, 3, 640, 480}, CV_32F);
    net.setInput(input);

    net.dumpToPbtxt("yunet.pbtxt");
}
```

Set `export OPENCV_DNN_NETWORK_DUMP=1`

```cpp
TEST(DumpNet, env) {
    std::string path = "/path/to/model.onnx";
    auto net = readNet(path);

    Mat input(std::vector<int>{1, 3, 640, 480}, CV_32F);
    net.setInput(input);

    net.forward();
}
```

---

Note:
- `pbtxt` is registered as one of the ONNX model suffix in netron. So you can see `module: ai.onnx` and such in the model.
- We can get the string output of an ONNX model with the following script

```python
import onnx
net = onnx.load("/path/to/model.onnx")
net_str = str(net)
file = open("/path/to/model.pbtxt", "w")
file.write(net_str)
file.close()
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-05-17 11:07:05 +03:00
Alexander Smorkalov
78ed6de518
Merge pull request #25594 from LaurentBerger:I25587
typo
2024-05-16 08:46:56 +03:00
CNOCycle
7713c84465
Merge pull request #25297 from CNOCycle:tflite/transpose
Support Transpose op in TFlite #25297

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1168

The purpose of this PR is to introduce support for the Transpose op in TFlite format and to add a shape comparison between the output tensors and the references. In some occasional cases, the shape of the output tensor is `[1,4,1,1]`, while the shape of the reference tensor is `[1,4]`. Consequently, the norm check incorrectly reports that the test has passed, as the residual is zero.

Below is a Python script for generating testing data. The generated data can be integrated into the repo `opencv_extra`.

```python
import numpy as np
import tensorflow as tf

PREFIX_TFL = '/path/to/opencv_extra/testdata/dnn/tflite/'

def generator(input_tensor, model, saved_name):

    # convert keras model to .tflite format
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    #converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.optimizations = [None]
    tflite_model = converter.convert()
    with open(f'{PREFIX_TFL}/{saved_name}.tflite', 'wb') as f:
        f.write(tflite_model)

    # save the input tensor to .npy
    if input_tensor.ndim == 4:
        opencv_tensor = np.transpose(input_tensor, (0,3,1,2))
    else:
        opencv_tensor = input_tensor
    opencv_tensor = np.copy(opencv_tensor, order='C').astype(np.float32)
    np.save(f'{PREFIX_TFL}/{saved_name}_inp.npy', opencv_tensor)

    # generate output tenosr and save it to .npy
    mat_out = model(input_tensor).numpy()
    mat_out = np.copy(mat_out, order='C').astype(np.float32)
    if mat_out.ndim == 4:
        mat_out = np.transpose(mat_out, (0,3,1,2))
    interpreter = tf.lite.Interpreter(model_content=tflite_model)
    out_name = interpreter.get_output_details()[0]['name']
    np.save(f'{PREFIX_TFL}/{saved_name}_out_{out_name}.npy', mat_out)

def build_transpose():

    model_name = "keras_permute"
    mat_in = np.array([[[1,2,3], [4,5,6]]], dtype=np.float32)

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(2,3)))
    model.add(tf.keras.layers.Permute((2,1)))
    model.summary()

    generator(mat_in, model, model_name)

if __name__ == '__main__':
    build_transpose()
```

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake
2024-05-15 20:07:25 +03:00
unknown
5009109167 typo 2024-05-15 16:16:07 +02:00
Laurent Berger
76d9f7aaeb
Merge pull request #25591 from LaurentBerger:I25589
Remove dnn::layer::allocate in doc #25591

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #25589 
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-05-15 17:08:52 +03:00
alexlyulkov
03507e06b4
Merge pull request #25518 from alexlyulkov:al/fixed-gemm-openvino
Fixed OpenVINO gemm layer #25518

Fixed OpenVINO gemm layer
The problem was that our layer didn't properly handle all the possible gemm options in OpenVINO mode
Fixes #25472

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-05-14 17:41:19 +03:00
Alexander Smorkalov
d8e18f4576 Made fcn-resnet50-12.onnx model optional. 2024-05-03 16:14:22 +03:00
Alexander Smorkalov
ac9a858377
Merge pull request #25524 from alexlyulkov:al/openvino-layers
Added more OpenVINO layers to dnn
2024-05-03 13:16:56 +03:00
Wanli
ed47cce1c5 change fcn8s-heavy-pascal tests from caffe to onnx 2024-05-03 00:15:09 +08:00
Alexander Lyulkov
f3f29fa62c Added more OpenVINO layers to dnn 2024-05-02 14:37:40 +03:00
alexlyulkov
f9dd20eb07
Merge pull request #25414 from alexlyulkov:al/range-fixed
Fixed ONNX range layer #25414

Partially address https://github.com/opencv/opencv/issues/25363
Fixed ONNX range layer. It should support any input type.
Added tests (extra [PR](https://github.com/opencv/opencv_extra/pull/1170))

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-04-17 09:38:21 +03:00
Alexander Smorkalov
ecbfc1bfd8
Merge pull request #25395 from susumu-iino:fix-dnn-plugin-build-win32
Fix dnn plugin build win32
2024-04-12 11:05:34 +03:00
Yuantao Feng
197626a5bf
Merge pull request #25387 from fengyuentau:complete-float16_t-renaming
Rename remaining float16_t for future proof #25387

Resolves comment: https://github.com/opencv/opencv/pull/25217#discussion_r1547733187.

`std::float16_t` and `std::bfloat16_t` are introduced since c++23: https://en.cppreference.com/w/cpp/types/floating-point.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-04-11 14:02:44 +03:00
Alexander Smorkalov
e4677fbf64
Merge pull request #25361 from hanliutong:rvv-f32
Further optimize fastDepthwiseConv for RISC-V Vector.
2024-04-09 16:04:02 +03:00
ecchen
e63690a2d9 Add a shape checker for tflite models 2024-04-08 13:28:05 +00:00
Susumu IINO
a0b28f8b06 Add Definition "_USE_MATH_DEFINES" for dnn plugin on Win32 build 2024-04-07 21:08:09 +09:00
Liutong HAN
5be158a2b6 Further optimize fastDepthwiseConv for RVV. 2024-04-07 11:34:41 +08:00
Yuantao Feng
55d7e3f8cc
Merge pull request #1165 from fengyuentau:gold_yolo
[BugFix] dnn (ONNX): Foce dropping constant inputs in parseClip if they are shared #25319

Resolves https://github.com/opencv/opencv/issues/25278
Merge with https://github.com/opencv/opencv_extra/pull/1165

In Gold-YOLO ,`Div` has a constant input `B=6` which is then parsed into a `Const` layer in the ONNX importer, but `Clip` also has the shared constant input `max=6` which is already a `Const` layer and then connected to `Elementwise` layer. This should not happen because in the `forward()` of `Elementwise` layer, the legacy code goes through and apply activation to each input. More details on https://github.com/opencv/opencv/issues/25278#issuecomment-2032199630.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-04-03 15:56:59 +03:00
Dmitry Kurtaev
13c95efa74
Merge pull request #25312 from dkurt:dnn_hotfix_tflite
Ownership check in TFLite importer #25312

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/25310

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-04-03 09:41:40 +03:00
HAN Liutong
eba158fb0c
Merge pull request #25230 from hanliutong/rvv-conv
Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic. #25230

This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic.

This patch was tested on QEMU using VLEN=128 and VLEN=256 on `./bin/opencv_test_dnn --gtest_filter="*Int8*"`;
On the real device (k230, VLEN=128), `EfficientDet_int8` in `opencv_perf_dnn` showed a performance improvement of 1.46x.

| Name of Test                               |  Original | optimized | Speed-up |
| ------------------------------------------ | -------- | ---------- | -------- |
| EfficientDet_int8::DNNTestNetwork::OCV/CPU | 2843.467 | 1947.013   | 1.46     |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-03-31 16:47:06 +03:00
Yuantao Feng
b758897c29
Merge pull request #25271 from fengyuentau:matmul_bias
Merge with https://github.com/opencv/opencv_extra/pull/1158

Todo:

- [x] Fix Attention pattern recognition.
- [x] Handle other backends.

Benchmark:

"VIT_B_32 OCV/CPU", M1, results in milliseconds.

| Model | 4.x | This PR |
| - | - | - |
| VIT_B_32 OCV/CPU | 87.66 | **83.83** |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-29 17:35:23 +03:00
Alexander Smorkalov
9fc4b61074
Merge pull request #25291 from dkurt:einsum_openvino
Einsum OpenVINO backend
2024-03-29 15:54:26 +03:00
Dmitry Kurtaev
cfa42e4338 Einsum OpenVINO backend 2024-03-29 14:29:45 +03:00
Dmitry Kurtaev
01dc010436
Merge pull request #25273 from dkurt:tflite_new_layers
TFLite new layers #25273

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/25272, https://github.com/opencv/opencv/issues/24965

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1160

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-29 11:21:13 +03:00
Yuantao Feng
accf200408
Merge pull request #25238 from fengyuentau:optimized_const
dnn: avoid const layer forwarding in layer norm layer and attention layer #25238

While profiling ViTs with dnn, I found `ConstLayer` can take a proportion of the inference time, which is weird. This comes from the data copy during the inference of `ConstLayer`. There is a chance that we can improve the efficiency of data copying but the easiest and most convenient way is to avoid `ConstLayer`. This PR change the way how we handle constants in layer normalization layer and attention layer, which is storing in the layer blobs instead of making constant layers for them.

Checklists:

- [x] Backend compatibility in layer normalization layer.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-26 15:09:51 +03:00
Alexander Smorkalov
fc34554475
Merge pull request #25184 from dkurt:avoid_extra_memset
Avoid extra memset
2024-03-25 13:07:49 +03:00
Yuantao Feng
025e7602b9
Merge pull request #25166 from fengyuentau:fix_cann_gemm
dnn (CANN): Fix incorrect shape of 1d bias in Gemm #25166

Gemm layer was refactored some time ago. Users found that the mobilenet example in https://github.com/opencv/opencv/wiki/Huawei-CANN-Backend does not work because of incorrect shape set for 1d bias in Gemm. This PR resolves this issue.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-25 09:47:28 +03:00
Dmitry Kurtaev
0b6c9a2123
Merge pull request #25181 from dkurt:release_conv_weights
Release convolution weightsMat after usage #25181

### Pull Request Readiness Checklist

related (but not resolved): https://github.com/opencv/opencv/issues/24134

Minor memory footprint improvement. Also, adds a test for VmHWM.

RAM top memory usage (-230MB)

| YOLOv3 (237MB file) |   4.x   |    PR   |
|---------------------|---------|---------|
| no winograd         | 808 MB  | 581 MB  |
| winograd            | 1985 MB | 1750 MB |

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-25 09:03:28 +03:00
Oleg Pipikin
6da2ddcf0e Fix for OpenVINO 2024.0
Remove support OpenVINO lower than 2022.1 release
Remove legacy InferenceEngine wrappers
2024-03-18 15:05:50 +04:00
Dmitry Kurtaev
6a370ba9e7 Avoid extra memset in convolution initialization 2024-03-08 10:46:07 +03:00
Dmitry Kurtaev
98aed21dd4 Avoid copy of ONNX graph during import 2024-03-05 18:22:46 +03:00
Alexander Smorkalov
daa8f7dfc6 Partially back-port #25075 to 4.x 2024-03-05 12:15:39 +03:00
Laurent Berger
5fe3933346
Merge pull request #25120 from LaurentBerger:I25103
Fixed ReduceMean layer behaviour #25120

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

a93c31e3c9/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc (L433-L443)
2024-03-04 09:36:53 +03:00
CSBVision
e8582f2cf8 Update net_impl.cpp
See issue #25112
2024-03-01 14:56:00 +01:00
Yuantao Feng
5aa5c39210
Merge pull request #25076 from fengyuentau:improve_attention
dnn: try improving performance of Attention layer #25076

Checklist:

- [x] Use `Mat` over `Mat::zeros` for temporary buffer in forward
- [x] Use layer internal buffer over temporary Mat buffer
- [x] Try a single fastGemmBatch on the Q/K/V calculation

Performance:

Performance test case is `Layer_Attention.VisionTransformer/0`, which has input of shape {1, 197, 768}, weight of shape {768, 2304} and bias {2304}.

Data is in millisecond.

| | macOS 14.2.1, Apple M1 | Ubuntu 22.04.2, Intel i7 12700K |
| - | - | - |
| Current | 10.96 | 1.58 |
| w/ Mat | 6.27 | 1.41 |
| w/ Internals | 5.87 | 1.38 |
| w/ fastGemmBatch | 6.12 | 2.14 |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-28 16:47:08 +03:00
Laurent Berger
3c712cf77d
Merge pull request #25100 from LaurentBerger:I25077
Fix issue #25077 #25100

Fixes https://github.com/opencv/opencv/issues/25077

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-02-27 14:15:11 +03:00
Dhanwanth1803
12aa0fe898
Merge pull request #24985 from Dhanwanth1803:hardswish
Fixes #24974 support HardSwishInt8 #24985

As given very clearly in the issue #24974 I made the required 2 changes to implement HardSwish Layer in INT8. Requesting comments.

resolves https://github.com/opencv/opencv/issues/24974

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: Dhanwanth1803 <dhanwanthvarala@gmail,com>
2024-02-16 18:19:29 +03:00
Alexander Smorkalov
4b35b2f968
Merge pull request #24973 from asmorkalov:as/fix_weigths_proto_mess
Fix proto and weights mess in dnn performance tests
2024-02-07 11:10:32 +03:00
Alexander Smorkalov
77af137285 Fix proto and weights mess in dnn performance tests. 2024-02-07 09:16:09 +03:00
fengyuentau
fcaa8ce3c2 fix incorrect steps and elemsize when dtype changes 2024-02-06 16:27:25 +08:00
Haosonn
87f749277d
Merge pull request #24768 from Haosonn:pre-pr-2
Vulkan backend for NaryEltwiseLayer in DNN module #24768

We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by:

- add a basic framework for Vulkan backend in ``NaryEltwiseLayer``
- add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation)
- typo fixed:
  - Wrong info output in ``context.cpp``

Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by

- find out the best ``VkMemoryProperty`` for various discrete GPUs

- prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory)
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: IskXCr <IskXCr@outlook.com>
2024-01-29 18:41:49 +03:00
Alexander Alekhin
efc9837df1
Merge pull request #24892 from opencv-pushbot:gitee/alalek/dnn_avoid_16s_usage
DNN: avoid CV_16S usage for FP16 #24892

**Merge after**: #24918

TODO:
- [x] measure performance changes
- [x] optimize convertTo for OpenCL: #24918

12700K iGPU:

|Name of Test|0|1|1 vs 0 (x-factor)|
|---|:-:|:-:|:-:|
|AlexNet::DNNTestNetwork::OCV/OCL_FP16|7.441|7.480|0.99|
|CRNN::DNNTestNetwork::OCV/OCL_FP16|10.776|10.736|1.00|
|DenseNet_121::DNNTestNetwork::OCV/OCL_FP16|52.762|52.833|1.00|
|EAST_text_detection::DNNTestNetwork::OCV/OCL_FP16|60.694|60.721|1.00|
|EfficientNet::DNNTestNetwork::OCV/OCL_FP16|33.373|33.173|1.01|
|FastNeuralStyle_eccv16::DNNTestNetwork::OCV/OCL_FP16|81.840|81.724|1.00|
|GoogLeNet::DNNTestNetwork::OCV/OCL_FP16|20.965|20.927|1.00|
|Inception_5h::DNNTestNetwork::OCV/OCL_FP16|22.204|22.173|1.00|
|Inception_v2_SSD_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|47.115|47.460|0.99|
|MPHand::DNNTestNetwork::OCV/OCL_FP16|6.760|6.670|1.01|
|MPPalm::DNNTestNetwork::OCV/OCL_FP16|10.188|10.171|1.00|
|MPPose::DNNTestNetwork::OCV/OCL_FP16|12.510|12.561|1.00|
|MobileNet_SSD_Caffe::DNNTestNetwork::OCV/OCL_FP16|17.290|17.072|1.01|
|MobileNet_SSD_v1_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|19.473|19.306|1.01|
|MobileNet_SSD_v2_TensorFlow::DNNTestNetwork::OCV/OCL_FP16|22.874|23.404|0.98|
|OpenFace::DNNTestNetwork::OCV/OCL_FP16|9.568|9.517|1.01|
|OpenPose_pose_mpi_faster_4_stages::DNNTestNetwork::OCV/OCL_FP16|539.899|539.845|1.00|
|PPHumanSeg::DNNTestNetwork::OCV/OCL_FP16|18.015|18.769|0.96|
|PPOCRv3::DNNTestNetwork::OCV/OCL_FP16|63.122|63.540|0.99|
|ResNet_50::DNNTestNetwork::OCV/OCL_FP16|34.947|34.925|1.00|
|SFace::DNNTestNetwork::OCV/OCL_FP16|10.249|10.206|1.00|
|SSD::DNNTestNetwork::OCV/OCL_FP16|213.068|213.108|1.00|
|SqueezeNet_v1_1::DNNTestNetwork::OCV/OCL_FP16|4.867|4.878|1.00|
|VIT_B_32::DNNTestNetwork::OCV/OCL_FP16|200.563|190.788|1.05|
|VitTrack::DNNTestNetwork::OCV/OCL_FP16|7.528|7.173|1.05|
|YOLOX::DNNTestNetwork::OCV/OCL_FP16|132.858|132.701|1.00|
|YOLOv3::DNNTestNetwork::OCV/OCL_FP16|209.559|208.809|1.00|
|YOLOv4::DNNTestNetwork::OCV/OCL_FP16|221.357|220.924|1.00|
|YOLOv4_tiny::DNNTestNetwork::OCV/OCL_FP16|24.446|24.382|1.00|
|YOLOv5::DNNTestNetwork::OCV/OCL_FP16|43.922|44.080|1.00|
|YOLOv8::DNNTestNetwork::OCV/OCL_FP16|64.159|63.842|1.00|
|YuNet::DNNTestNetwork::OCV/OCL_FP16|10.177|10.231|0.99|
|opencv_face_detector::DNNTestNetwork::OCV/OCL_FP16|15.121|15.445|0.98|

Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
2024-01-26 16:34:17 +03:00
Yuantao Feng
37156a4719
Merge pull request #24925 from fengyuentau:loongarch_handle_warnings
Handle warnings in loongson-related code #24925

See https://github.com/fengyuentau/opencv/actions/runs/7665377694/job/20891162958#step:14:16

Warnings needs to be handled before we add the loongson server to our CI.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-26 13:38:00 +03:00
Sean McBride
e64857c561
Merge pull request #23736 from seanm:c++11-simplifications
Removed all pre-C++11 code, workarounds, and branches #23736

This removes a bunch of pre-C++11 workrarounds that are no longer necessary as C++11 is now required.
It is a nice clean up and simplification.

* No longer unconditionally #include <array> in cvdef.h, include explicitly where needed
* Removed deprecated CV_NODISCARD, already unused in the codebase
* Removed some pre-C++11 workarounds, and simplified some backwards compat defines
* Removed CV_CXX_STD_ARRAY
* Removed CV_CXX_MOVE_SEMANTICS and CV_CXX_MOVE
* Removed all tests of CV_CXX11, now assume it's always true. This allowed removing a lot of dead code.
* Updated some documentation consequently.
* Removed all tests of CV_CXX11, now assume it's always true
* Fixed links.

---------

Co-authored-by: Maksim Shabunin <maksim.shabunin@gmail.com>
Co-authored-by: Alexander Smorkalov <alexander.smorkalov@xperience.ai>
2024-01-19 16:53:08 +03:00
fengyuentau
d269de0a03 initial commit 2024-01-18 11:17:50 +08:00
Alexander Smorkalov
d1e4bd8543
Merge pull request #24809 from Abdurrahheem:ash/yolo-nas-test
Added test for YOLO NAS
2024-01-17 20:36:15 +03:00
Alexander Smorkalov
ac4c0bffac
Merge pull request #24813 from fengyuentau:speedup_scatter
dnn: improve scatter and scatterND speed with multi-threading
2024-01-17 17:16:50 +03:00
Abduragim
d30bf1bc3c added test for yolo nas 2024-01-17 13:01:43 +03:00
Alexander Smorkalov
84bb1cda4e
Merge pull request #24865 from asmorkalov:as/dnn_concat_assert
Normalize axis parameter in DNN Concat to handle negative values
2024-01-16 14:39:28 +03:00
Alexander Smorkalov
26cf82a56c Normalize axis parameter in DNN Concat to handle negative values. 2024-01-16 12:22:22 +03:00
Alexander Smorkalov
99c86bb40c
Merge pull request #24556 from plctlab:rvp
Optimization based on RISC-V P Packed SIMD Extension v0.5.2
2024-01-16 11:36:31 +03:00
Alexander Smorkalov
68dc02e302
Merge pull request #24858 from Dhanwanth1803:avx-fix
Use AVX2 overload instread on AVX in AVX2 scope
2024-01-16 09:14:31 +03:00
Dhanwanth1803
a289eba357 Fixes #24677 2024-01-13 09:56:56 +05:30
Stefan Dragnev
2791bb7062
Merge pull request #24773 from tailsu:sd/pathlike
python: accept path-like objects wherever file names are expected #24773

Merry Christmas, all 🎄

Implements #15731

Support is enabled for all arguments named `filename` or `filepath` (case-insensitive), or annotated with `CV_WRAP_FILE_PATH`.

Support is based on `PyOS_FSPath`, which is available in Python 3.6+. When running on older Python versions the arguments must have a `str` value as before.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-12 16:23:05 +03:00
jimmylaw21
a7fa1e6f4b
Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer
dnn onnx: add group norm layer #24610

dnn onnx: add group norm layer

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2024-01-12 15:13:26 +03:00
Alexander Smorkalov
97c418ab86
Merge pull request #24840 from fengyuentau:ocl_innerproduct
dnn (opencl): integrate bias handling in the inner product opencl kernel
2024-01-12 15:10:16 +03:00
Abduragim Shtanchaev
c923c59833
Merge pull request #24812 from Abdurrahheem:ash/einsum_bachedGemm
Replace interactive batched Matrix Multiply. #24812

This PR replaces iterative batch matrix multiplication which `FastGemmBatch` in Einsum layer.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-12 14:23:43 +03:00
Yuantao Feng
e7ccff9805
Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast
dnn (cuda): support broadcasting if a.rank() != b.rank() #24834

Inspired by https://github.com/opencv/opencv/pull/24786. This PR keeps the fusion of `NaryEltwise` and `Concat` while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().

Resolves https://github.com/opencv/opencv/issues/23977
Resolves https://github.com/opencv/opencv/issues/24606
Resolves https://github.com/opencv/opencv/issues/24635
Resolves https://github.com/opencv/opencv/issues/24721 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-11 10:04:46 +03:00
fengyuentau
83acb656f1 integrate bias handling in ocl kernel 2024-01-11 11:15:17 +08:00
Yuantao Feng
7fb336322d
Merge pull request #24808 from fengyuentau:fix_layernorm
dnn: no layer norm fusion if axes.back() is not the axis of last dimension #24808

Merge with https://github.com/opencv/opencv_extra/pull/1137

Resolves https://github.com/opencv/opencv/issues/24797

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-01-10 13:01:00 +03:00
Yuantao Feng
c955564cb3
Merge pull request #24765 from fengyuentau:mod_operator
dnn onnx: add mod #24765

Resolves https://github.com/opencv/opencv/issues/23174

TODO:

- [x] enable some conformance tests
- [x] add backends
    - [x] CANN
    - [x] OpenVINO
    - [x] CUDA

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-09 19:00:17 +03:00
fengyuentau
13127365e2 better comment 2024-01-08 11:55:06 +08:00
Yuantao Feng
b7d70613e4 fix failed assertion in debug build 2024-01-05 18:33:01 +00:00
fengyuentau
2ed97b9ef3 multi-threaded scatterND and refactor perf 2024-01-05 18:15:59 +08:00
fengyuentau
2997b4c5fe pretty format 2024-01-05 18:15:27 +08:00
fengyuentau
63cde0b90d multi-threaded scatter and refactor perf 2024-01-05 17:24:09 +08:00
Abduragim Shtanchaev
3b26e183cb changed weights of yolov7 2023-12-28 23:03:47 +03:00
cudawarped
7d681cf80d build: first class cuda support 2023-12-26 09:39:18 +03:00
Alexander Smorkalov
62f1a7410d
Merge pull request #24766 from asmorkalov:update_version_4.9.0-pre
pre: OpenCV 4.9.0 (version++)
2023-12-25 16:04:53 +03:00
Alexander Smorkalov
b407c58b96 pre: OpenCV 4.9.0 (version++). 2023-12-25 15:20:10 +03:00
Yuantao Feng
f978c99523
Merge pull request #24753 from fengyuentau:einsum_importer
dnn onnx: support constaint inputs in einsum importer #24753 

Merge with https://github.com/opencv/opencv_extra/pull/1132.

Resolves https://github.com/opencv/opencv/issues/24697

Credits to @LaurentBerger.

---

This is a workaround. I suggest to get input shapes and calculate the output shapes in `getMemoryShapes` so as to keep the best compatibility. It is not always robust getting shapes during the importer stage and we should avoid that as much as possible.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-25 14:42:05 +03:00
Alexander Alekhin
f49b26182b dnn(test): skip very long debug tests, reduce test time 2023-12-25 08:44:06 +00:00
Alexander Alekhin
96b894e0e1 Merge pull request #24761 from opencv-pushbot:gitee/alalek/test_skip_update_win32 2023-12-25 08:27:30 +00:00
Alexander Alekhin
f8502d45f9 dnn(test): skip tests on 32-bit Windows 2023-12-25 07:23:45 +00:00
Alexander Smorkalov
953dddd26b
Merge pull request #24747 from asmorkalov:as/tune_vitb_cuda
Increate Vit_b test threshold a bit for CUDA FP16.
2023-12-22 17:04:46 +03:00
Dmitry Kurtaev
938bc4d503 [CUDA] Hotfix Scale with 1 parameter 2023-12-22 15:49:27 +03:00
Dhanwanth1803
027aee8ad4
Merge pull request #24384 from Dhanwanth1803:feat-crop
Fixes #22747. Support [crop] configuration for DarkNet #24384

Request for comments. This is my first PR. 

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1112

resolves https://github.com/opencv/opencv/issues/22747

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-12-22 14:55:01 +03:00
Alexander Smorkalov
53cd921ab4 Increate Vit_b test threshold a bit for CUDA FP16. 2023-12-22 13:37:44 +03:00
Vadim Pisarevsky
853e5dfcdf
Merge pull request #24709 from vpisarev:winograd_mode
Try to enable Winograd by default in FP32 mode and disable it by default in FP16 mode #24709

Hopefully, it will resolve regressions since 4.8.1 (see also https://github.com/opencv/opencv/pull/24587)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2023-12-22 09:22:31 +03:00
Alexander Smorkalov
35e2ef8019
Merge pull request #24740 from opencv-pushbot:gitee/alalek/ocl_fix_kernel_compilation
ocl: fix kernels compilation
2023-12-22 09:20:47 +03:00
Alexander Smorkalov
f5d8245801
Merge pull request #24736 from opencv-pushbot:gitee/alalek/issue_24734
dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16
2023-12-21 20:01:01 +03:00
Alexander Alekhin
3340c71a2a ocl: fix kernels compilation 2023-12-21 14:29:23 +00:00
Alexander Alekhin
c9bb92d58b dnn(test): tune FP16 test tolerance 2023-12-21 13:39:05 +00:00
Alexander Alekhin
99c94d3d83 dnn(ocl): don't try KERNEL_TYPE_GEMM_LIKE with kernel_w > 16
- OpenCL kernel code doesn't support that
2023-12-21 13:30:57 +00:00
llh721113
a30c987f87 feat: RVP052 Optimization for DNN int8layers 2023-12-21 14:51:41 +08:00
Yuantao Feng
0521a3a384
Merge pull request #24476 from fengyuentau:attention_layer
dnn: add attention layer #24476

Resolves #24609

Merge with: https://github.com/opencv/opencv_extra/pull/1128.

Attention operator spec from onnxruntime: https://github.com/microsoft/onnxruntime/blob/v1.16.1/docs/ContribOperators.md#com.microsoft.Attention.

TODO:
- [x] benchmark (before this PR vs. with this PR vs. ORT).
- [x] Layer fusion: Take care Slice with end=INT64_MAX.
- [x] Layer fusion: match more potential attention (VIT) patterns.
    - [x] Single-head attention is supported.
- [x] Test AttentionSubgraph fusion.
- [x] Add acc tests for VIT_B_32 and VitTrack
- [x] Add perf tests for VIT_B_32 and VitTrack

## Benchmarks

Platform: Macbook Air M1.

### Attention Subgraph

Input scale: [1, 197, 768].

|                        | mean (ms) | median (ms) | min (ms) |
| ---------------------- | --------- | ----------- | -------- |
| w/ Attention (this PR) | 3.75      | 3.68        | 3.22     |
| w/o Attention          | 9.06      | 9.01        | 8.24     |
| ORT (python)           | 4.32      | 2.63        | 2.50     |

### ViTs

All data in millisecond (ms).

| ViTs     | With Attention | Without Attention | ORT    |
| -------- | -------------- | ----------------- | ------ |
| vit_b_16 | 302.77         | 365.35            | 109.70 |
| vit_b_32 | 89.92          | 116.22            | 30.36  |
| vit_l_16 | 1593.32        | 1730.74           | 419.92 |
| vit_l_32 | 468.11         | 577.41            | 134.12 |
| VitTrack | 3.80           | 3.87              | 2.25   |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-20 19:35:07 +03:00
Laurent Berger
3e6dcdc0a4
Merge pull request #24539 from LaurentBerger:blobrecttoimage
Add blobrecttoimage #24539

### Pull Request Readiness Checklist

resolves https://github.com/opencv/opencv/issues/14659

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work #14659
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-19 20:00:04 +03:00
Yuantao Feng
fa5ed62a66
Merge pull request #24694 from fengyuentau:matmul_refactor
dnn: refactor ONNX MatMul with fastGemm #24694

Done:
- [x] add backends
    - [x] CUDA
    - [x] OpenVINO
    - [x] CANN
    - [x] OpenCL
    - [x] Vulkan
- [x] add perf tests
- [x] const B case

### Benchmark

Tests are done on M1. All data is in milliseconds (ms).

| Configuration | MatMul (Prepacked) | MatMul | InnerProduct |
| - | - | - | - |
| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 | **0.39** | 0.41 | 1.33 |
| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0  | **0.42** | 0.42 | 1.17 |
| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0    | **0.13** | 0.15 | 0.33 |
| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0    | **0.11** | 0.13 | 0.22 |
| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 | **0.46** | 0.54 | 1.46 |
| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0  | **0.46** | 0.95 | 1.74 |
| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0    | **0.18** | 0.32 | 0.43 |
| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0    | **0.15** | 0.25 | 0.25 |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-19 19:36:41 +03:00
Wanli
6ae1709c6a
Merge pull request #24613 from WanliZhong:softmax_default_axis
Make default axis of softmax in onnx "-1" without opset option #24613

Try to solve problem: https://github.com/opencv/opencv/pull/24476#discussion_r1404821158

**ONNX**
`opset <= 11` use 1
`else` use -1

**TensorFlow**
`TF version = 2.x` use -1
`else` use 1

**Darknet, Caffe, Torch**
use 1 by definition
2023-12-15 10:41:42 +03:00
Wanli
9bbc890d96
Merge pull request #24681 from WanliZhong:err_armv8
Fixed armv8 compilation warnings #24681 

Fixes the following warning on  armv8:
```
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
```
Buildbot: https://pullrequest.opencv.org/buildbot/builders/4_x_ARMv8-lin
2023-12-12 15:38:07 +03:00
Wanli
6ee71fee88
Merge pull request #24547 from WanliZhong:refactor_conv_perf_test
Classify and extend convolution and depthwise performance tests #24547

This PR aims to:
1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added)

2. Classify the existing convolution performance test to below cases
    - CONV_1x1
    - CONV_3x3_S1_D1 (winograd)
    - CONV
    - DEPTHWISE

3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
(i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
(ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]`
(iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... `

> **Reproduced**: 1. follow step in https://github.com/alalek/opencv/commit/dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt)


**Performance test result on Apple M2**

**Test result details**:  [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md)

**Additional test result details with FP16**:  [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip)


**Brief summary for 4.8.1 vs 4.7.0 or 4.6.0**: 
1. `CONV_1x1_S1_D1` dropped significant with small or large input shape.
2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0. 

---

**Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)**: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

**Test result details**: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md)
**Brief summary for 4.8.1 vs 4.5.5**: 
1. `CONV_5x5_S1_D1` dropped significant. 
2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape.

---

TODO:
- [x] Perform tests on arm with each opencv version
- [x] Perform tests on x86 with each opencv version
- [x] Split each test classification with single test config
- [x] test enable fp16
2023-12-11 21:35:33 +03:00
Abduragim Shtanchaev
d3dd2e463c
Merge pull request #24611 from Abdurrahheem:ash/add_yolov6_test
Add test for YoloX Yolo v6 and Yolo v8 #24611

This PR adds test for YOLOv6 model (which was absent before)
The onnx weights for the test are located in this PR [ #1126](https://github.com/opencv/opencv_extra/pull/1126)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-11 16:42:51 +03:00