Commit Graph

84 Commits

Author SHA1 Message Date
Yuantao Feng
93e0c7e53f fix matmul crash 2024-08-15 16:10:40 +08:00
Aliaksei Urbanski
35ca2f78d6
Merge pull request #25880 from Jamim:fix/cuda-no-fp16
Fix CUDA for old GPUs without FP16 support #25880

Fixes #21461

~This is a build-time solution that reflects https://github.com/opencv/opencv/blob/4.10.0/modules/dnn/src/cuda4dnn/init.hpp#L68-L82.~
~We shouldn't add an invalid target while building with `CUDA_ARCH_BIN` < 53.~
_(please see [this discussion](https://github.com/opencv/opencv/pull/25880#discussion_r1668074505))_

This is a run-time solution that basically reverts [these lines](d0fe6ad109 (diff-757c5ab6ddf2f99cdd09f851e3cf17abff203aff4107d908c7ad3d0466f39604L245-R245)).

I've debugged these changes, [coupled with other fixes](https://github.com/gentoo/gentoo/pull/37479), on [Gentoo Linux](https://www.gentoo.org/) and [related tests passed](https://github.com/user-attachments/files/16135391/opencv-4.10.0.20240708-224733.log.gz) on my laptop with `GeForce GTX 960M`.

Alternative solution:
  - #21462

_Best regards!_

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] `n/a` There is accuracy test, performance test and test data in opencv_extra repository, if applicable
- [ ] `n/a` The feature is well documented and sample code can be built with the project CMake
2024-07-10 12:39:30 +03:00
Yuantao Feng
a7fd9446cf
Merge pull request #25630 from fengyuentau:nary-multi-thread
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630

This PR introduces the following changes:

- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable

## Performance

### i7-12700K, RAM 64GB, Ubuntu 22.04

```
Geometric mean (ms)

                Name of Test                     opencv        opencv        opencv
                                                  perf          perf          perf
                                              core.x64.0606 core.x64.0606 core.x64.0606
                                                                               vs
                                                                             opencv
                                                                              perf
                                                                          core.x64.0606
                                                                           (x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           16.116        11.161         1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        17.469        11.446         1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        17.531        11.469         1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      28.653        13.682         2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    21.899        13.422         1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       21.738        13.185         1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        16.172        11.473         1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       16.309        11.565         1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        16.166        11.454         1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        16.157        11.443         1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU        163.459       15.234         10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    10.880        10.868         1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    10.947        11.058         0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    10.948        10.910         1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    10.874        10.871         1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    10.971        10.920         1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        17.546        11.462         1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        16.175        11.475         1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU               11.339        11.333         1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU               16.154        11.102         1.46
```

### Apple M1, RAM 16GB, macOS 14.4.1

```
Geometric mean (ms)

                Name of Test                     opencv          opencv             opencv      
                                                  perf            perf               perf       
                                              core.m1.0606 core.m1.0606.patch core.m1.0606.patch
                                                                                      vs        
                                                                                    opencv      
                                                                                     perf       
                                                                                 core.m1.0606   
                                                                                  (x-factor)    
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU           28.418          3.768               7.54       
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU        6.942           5.679               1.22       
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU        5.822           5.653               1.03       
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU      5.751           5.628               1.02       
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU    5.797           5.599               1.04       
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU       7.272           5.578               1.30       
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU        5.777           5.562               1.04       
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU       5.819           5.559               1.05       
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU        5.830           5.574               1.05       
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU        5.759           5.567               1.03       
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU       342.260          74.655              4.58       
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU    8.338           8.280               1.01       
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU    8.359           8.309               1.01       
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU    8.412           8.295               1.01       
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU    8.380           8.297               1.01       
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU    8.356           8.323               1.00       
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU        6.818           5.561               1.23       
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU        5.805           5.570               1.04       
NHWC_C::Layer_NaryEltwise::OCV/CPU               3.834           4.817               0.80       
NHWC_H::Layer_NaryEltwise::OCV/CPU               28.402          3.771               7.53
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-07-03 10:09:05 +03:00
Alexander Smorkalov
3d74d646d8 Fixed CuDNN runtime version check for CuDNN 9+. 2024-07-01 17:33:24 +03:00
Yuantao Feng
3f13ce797b
Merge pull request #25779 from fengyuentau:dnn/fix_onnx_depthtospace
dnn: add DepthToSpace and SpaceToDepth #25779

We are working on updating WeChat QRCode module. One of the new models is a fully convolutional model and hence it should be able to run with different input shapes. However,  it has an operator `DepthToSpace`, which is parsed as a subgraph of `Reshape -> Permute -> Reshape` with a fixed shape getting during parsing. The subgraph itself is not a problem, but the true problem is the subgraph with a fixed input and output shape regardless input changes. This does not allow the model to run with different input shapes.

Solution is to add a dedicated layer for DepthtoSpace and SpaceToDepth.

Backend support:

- [x] CPU
- [x] CUDA
- [x] OpenCL
- [x] OpenVINO
- [x] CANN
- [x] TIMVX
-  ~Vulkan~ (missing fundamental tools, like permutation and reshape)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-06-21 19:28:22 +03:00
Danial Javady
05e48605a0
Merge pull request #25412 from ZelboK:update-cudnn-to-9
Refactor DNN module to build with cudnn 9 #25412

A lot of APIs that are currently being used in the dnn module have been removed in cudnn 9. They were deprecated in 8. 
This PR updates said code accordingly to the newer API.

Some key notes:
1) This is my first PR. I am new to openCV. 
2) `opencv_test_core` tests pass
3) On a 3080, cuda 12.4(should be irrelevant since I didn't build the `opencv_modules`, gcc 11.4, WSL 2. 
4) For brevity I will avoid including macro code that will allow for older versions of cudnn to build.

I was unable to get the tests working for `opencv_test_dnn` and `opencv_perf_dnn`. The errors I get are of the following: 
```
 OpenCV tests: Can't find required data file: dnn/onnx/conformance/node/test_reduce_prod_default_axes_keepdims_example/model.onnx in function 'findData'
" thrown in the test body.
```
So before I spend more time investigating I was hoping to get a maintainer to point me in the right direction here. I would like to run these tests and confirm things are working as intended. I may have missed some details.


### Pull Request Readiness Checklist

relevant issue
(https://github.com/opencv/opencv/issues/24983

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-05-28 09:54:08 +03:00
Yuantao Feng
b758897c29
Merge pull request #25271 from fengyuentau:matmul_bias
Merge with https://github.com/opencv/opencv_extra/pull/1158

Todo:

- [x] Fix Attention pattern recognition.
- [x] Handle other backends.

Benchmark:

"VIT_B_32 OCV/CPU", M1, results in milliseconds.

| Model | 4.x | This PR |
| - | - | - |
| VIT_B_32 OCV/CPU | 87.66 | **83.83** |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-29 17:35:23 +03:00
Yuantao Feng
accf200408
Merge pull request #25238 from fengyuentau:optimized_const
dnn: avoid const layer forwarding in layer norm layer and attention layer #25238

While profiling ViTs with dnn, I found `ConstLayer` can take a proportion of the inference time, which is weird. This comes from the data copy during the inference of `ConstLayer`. There is a chance that we can improve the efficiency of data copying but the easiest and most convenient way is to avoid `ConstLayer`. This PR change the way how we handle constants in layer normalization layer and attention layer, which is storing in the layer blobs instead of making constant layers for them.

Checklists:

- [x] Backend compatibility in layer normalization layer.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-03-26 15:09:51 +03:00
jimmylaw21
a7fa1e6f4b
Merge pull request #24610 from jimmylaw21:dnn-onnx-add-group-norm-layer
dnn onnx: add group norm layer #24610

dnn onnx: add group norm layer

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
2024-01-12 15:13:26 +03:00
Yuantao Feng
e7ccff9805
Merge pull request #24834 from fengyuentau:cuda_naryeltwise_broadcast
dnn (cuda): support broadcasting if a.rank() != b.rank() #24834

Inspired by https://github.com/opencv/opencv/pull/24786. This PR keeps the fusion of `NaryEltwise` and `Concat` while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().

Resolves https://github.com/opencv/opencv/issues/23977
Resolves https://github.com/opencv/opencv/issues/24606
Resolves https://github.com/opencv/opencv/issues/24635
Resolves https://github.com/opencv/opencv/issues/24721 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-11 10:04:46 +03:00
Yuantao Feng
c955564cb3
Merge pull request #24765 from fengyuentau:mod_operator
dnn onnx: add mod #24765

Resolves https://github.com/opencv/opencv/issues/23174

TODO:

- [x] enable some conformance tests
- [x] add backends
    - [x] CANN
    - [x] OpenVINO
    - [x] CUDA

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-01-09 19:00:17 +03:00
Dmitry Kurtaev
938bc4d503 [CUDA] Hotfix Scale with 1 parameter 2023-12-22 15:49:27 +03:00
Yuantao Feng
fa5ed62a66
Merge pull request #24694 from fengyuentau:matmul_refactor
dnn: refactor ONNX MatMul with fastGemm #24694

Done:
- [x] add backends
    - [x] CUDA
    - [x] OpenVINO
    - [x] CANN
    - [x] OpenCL
    - [x] Vulkan
- [x] add perf tests
- [x] const B case

### Benchmark

Tests are done on M1. All data is in milliseconds (ms).

| Configuration | MatMul (Prepacked) | MatMul | InnerProduct |
| - | - | - | - |
| A=[12, 197, 197], B=[12, 197, 64], trans_a=0, trans_b=0 | **0.39** | 0.41 | 1.33 |
| A=[12, 197, 64], B=[12, 64, 197], trans_a=0, trans_b=0  | **0.42** | 0.42 | 1.17 |
| A=[12, 50, 64], B=[12, 64, 50], trans_a=0, trans_b=0    | **0.13** | 0.15 | 0.33 |
| A=[12, 50, 50], B=[12, 50, 64], trans_a=0, trans_b=0    | **0.11** | 0.13 | 0.22 |
| A=[16, 197, 197], B=[16, 197, 64], trans_a=0, trans_b=0 | **0.46** | 0.54 | 1.46 |
| A=[16, 197, 64], B=[16, 64, 197], trans_a=0, trans_b=0  | **0.46** | 0.95 | 1.74 |
| A=[16, 50, 64], B=[16, 64, 50], trans_a=0, trans_b=0    | **0.18** | 0.32 | 0.43 |
| A=[16, 50, 50], B=[16, 50, 64], trans_a=0, trans_b=0    | **0.15** | 0.25 | 0.25 |

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-19 19:36:41 +03:00
Yuantao Feng
a2edf4d929
Merge pull request #24647 from fengyuentau:cuda_sub
dnn cuda: support Sub #24647

Related https://github.com/opencv/opencv/issues/24606#issuecomment-1837390257

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-12-06 13:46:24 +03:00
Yuantao Feng
d05fb709f9
Merge pull request #24552 from fengyuentau:layernorm_backends
dnn: add openvino, opencl and cuda backends for layer normalization layer #24552

Merge after https://github.com/opencv/opencv/pull/24544.

Todo:

- [x] openvino
- [x] opencl
- [x] cuda

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-11-21 15:33:01 +03:00
Yuantao Feng
ee0822dc4d
Merge pull request #24378 from fengyuentau:instance_norm
dnn onnx: add instance norm layer #24378

Resolves https://github.com/opencv/opencv/issues/24377
Relates https://github.com/opencv/opencv/pull/24092#discussion_r1349841644

| Perf | multi-thread | single-thread |
| - | - | - |
| x: [2, 64, 180, 240] | 3.95ms | 11.12ms |

Todo:

- [x] speed up by multi-threading
- [x] add perf
- [x] add backend: OpenVINO
- [x] add backend: CUDA
- [x] add backend: OpenCL (no fp16)
- [ ] add backend: CANN (will be done via https://github.com/opencv/opencv/pull/24462)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

```
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
modules_filter:Custom=none
disable_ipp:Custom=ON
```
2023-11-07 12:59:10 +03:00
Alexander Smorkalov
5466fd2606
Merge pull request #24104 from cudawarped:cuda_fix_cuda_toolkit_12_2
`cuda`: fix for compatibility with CUDA Toolkit >= 12.2.0
2023-08-04 12:11:15 +03:00
Dmitry Kurtaev
96f23e3da1
Merge pull request #24080 from dkurt:dnn_cuda_layers
Resolve uncovered CUDA dnn layer #24080

### Pull Request Readiness Checklist

* Gelu activation layer on CUDA
* Try to relax GEMM from ONNX

resolves https://github.com/opencv/opencv/issues/24064

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2023-08-03 09:13:42 +03:00
cudawarped
ab8cb6f8a9 cuda: fix for compatibility with CUDA Toolkit >= 12.2.0 2023-08-01 13:02:42 +03:00
Alexander Alekhin
96a45e842e
Merge pull request #23061 from WanliZhong:gemm_cuda
DNN: make GEMM can be supported with transA and transB in CUDA
2023-02-09 00:06:32 +03:00
wanli
4718a4bf81 make GEMM can be supported with transA and transB in CUDA 2023-01-31 15:14:17 +08:00
Tomoaki Teshima
186c18668c suppress warning 2023-01-23 22:47:43 +09:00
zoom
4891818114 make MatMul support 3D or 4D with broadcast 2022-12-15 10:36:08 +08:00
zihaomu
e36948cfbc add ONNX OP sign, shrink and reciprocal 2022-04-07 15:32:12 +08:00
Smirnov Egor
abebbf04b1 Add CUDA support for LSTM.
Co-authored-by: Julia Bareeva <jbareeva@gmail.com>
2022-03-31 16:38:22 +03:00
luz paz
8e8e4bbabc dnn: fix various dnn related typos
Fixes source comments and documentation related to dnn code.
2022-03-23 18:12:12 -04:00
Smirnov Egor
71a22e45b0 add celu, hardsigmoid, selu, thresholdedrelu layers 2021-12-18 03:19:54 +03:00
Smirnov Egor
1bd382c1d0 Add acos, acosh, asin, asinh, atan, atanh, cos, cosh, erf, hardswish, sin, sinh, softplus, softsign, tan layers 2021-12-17 18:19:40 +03:00
Smirnov Egor
4995aecd62 add alpha parameter to ELU 2021-11-30 14:43:18 +03:00
Smirnov Egor
1feb3838b5 add Ceil, Floor, Log, Round, Sqrt, Not, Equal, Less, Greater 2021-10-15 16:02:46 +03:00
Smirnov Egor
9c84749e2c backport YOLOv4x-mish new_coords CUDA implementation 2021-10-08 14:14:49 +03:00
YashasSamaga
505dde09de support broadcasting in eltwise ops 2021-10-04 12:38:45 +05:30
rogday
38b9ec7a18
Merge pull request #20682 from rogday:min
* Add Min layer to CPU, OpenCL, Halide, Inference Engine, NGraph and CUDA

* fix indentation

* add min to fusion and halide tests; fix doc
2021-09-22 15:17:37 +03:00
YashasSamaga
50462dcdc6 fix effrank assert to allow input effrank <= output effrank 2021-09-13 20:44:33 +05:30
YashasSamaga
32df5faa25 add MatMulOp 2021-05-22 01:01:29 +05:30
Alexander Alekhin
c89084e6b7 Merge pull request #19223 from YashasSamaga:cuda4dnn-halfpix-linear-resize 2021-03-30 13:19:41 +00:00
YashasSamaga
d0fe6ad109 fix checkVersions() 2021-03-06 19:03:03 +05:30
SamFC10
6111935835 Added exp layer 2021-02-20 22:16:00 +05:30
Sergei Slashchinin
ea41f89b40
Merge pull request #19058 from sl-sergei:cuda_1d
Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Add Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Fix typo

* fix comment

* Update fix

* make changes more correct for pooling layer

* Minor fixes for review

* Split skip blocks
2021-01-21 22:16:56 +00:00
YashasSamaga
8c74d7e4fa add half pixel centers and align corners param 2020-12-27 15:05:39 +05:30
Julien
48ddb53332
Merge pull request #18386 from JulienMaille:patch-1
* Make sure there is a cuda device before getting it

* Update init.hpp
2020-09-23 09:15:02 +00:00
YashasSamaga
a3106d424b add MVNOp 2020-08-02 12:44:35 +05:30
Yashas Samaga B L
f53f491cd2
Merge pull request #17939 from YashasSamaga:cuda4dnn-fix-eltwise-fusion
* fix eltwise fusion segfault, more eltwise fusions, fix power fusion

* add assertion
2020-08-01 15:03:07 +03:00
YashasSamaga
ae293f27cf add DetectionOutputOp 2020-07-29 12:28:00 +05:30
YashasSamaga
1949056423 improved diagnostics for build issues 2020-07-13 21:09:38 +05:30
Yashas Samaga B L
d0e6d2438c
Merge pull request #17363 from YashasSamaga:cuda4dnn-eltwise-fusion2
cuda4dnn(conv): fuse eltwise with convolutions

* fuse eltwise with convolutions

* manually rebase to avoid bad git merge
2020-07-09 16:02:21 +03:00
Alexander Alekhin
988bc804bf Merge pull request #17748 from YashasSamaga:cuda4dnn-data-parallel 2020-07-08 20:20:19 +00:00
Alexander Alekhin
6781ca7d55 Merge pull request #17685 from YashasSamaga:cuda4dnn-cudnn8-support 2020-07-06 22:48:07 +00:00
YashasSamaga
cbdaa93e54 reduce slice, concat to copy; enable more concat fusions 2020-07-05 20:52:35 +05:30
YashasSamaga
4988e131fd transfer output blobs in background 2020-07-04 12:55:12 +05:30