Commit Graph

61 Commits

Author SHA1 Message Date
zihaomu
e36948cfbc add ONNX OP sign, shrink and reciprocal 2022-04-07 15:32:12 +08:00
Smirnov Egor
abebbf04b1 Add CUDA support for LSTM.
Co-authored-by: Julia Bareeva <jbareeva@gmail.com>
2022-03-31 16:38:22 +03:00
luz paz
8e8e4bbabc dnn: fix various dnn related typos
Fixes source comments and documentation related to dnn code.
2022-03-23 18:12:12 -04:00
Smirnov Egor
71a22e45b0 add celu, hardsigmoid, selu, thresholdedrelu layers 2021-12-18 03:19:54 +03:00
Smirnov Egor
1bd382c1d0 Add acos, acosh, asin, asinh, atan, atanh, cos, cosh, erf, hardswish, sin, sinh, softplus, softsign, tan layers 2021-12-17 18:19:40 +03:00
Smirnov Egor
4995aecd62 add alpha parameter to ELU 2021-11-30 14:43:18 +03:00
Smirnov Egor
1feb3838b5 add Ceil, Floor, Log, Round, Sqrt, Not, Equal, Less, Greater 2021-10-15 16:02:46 +03:00
Smirnov Egor
9c84749e2c backport YOLOv4x-mish new_coords CUDA implementation 2021-10-08 14:14:49 +03:00
YashasSamaga
505dde09de support broadcasting in eltwise ops 2021-10-04 12:38:45 +05:30
rogday
38b9ec7a18
Merge pull request #20682 from rogday:min
* Add Min layer to CPU, OpenCL, Halide, Inference Engine, NGraph and CUDA

* fix indentation

* add min to fusion and halide tests; fix doc
2021-09-22 15:17:37 +03:00
YashasSamaga
50462dcdc6 fix effrank assert to allow input effrank <= output effrank 2021-09-13 20:44:33 +05:30
YashasSamaga
32df5faa25 add MatMulOp 2021-05-22 01:01:29 +05:30
Alexander Alekhin
c89084e6b7 Merge pull request #19223 from YashasSamaga:cuda4dnn-halfpix-linear-resize 2021-03-30 13:19:41 +00:00
YashasSamaga
d0fe6ad109 fix checkVersions() 2021-03-06 19:03:03 +05:30
SamFC10
6111935835 Added exp layer 2021-02-20 22:16:00 +05:30
Sergei Slashchinin
ea41f89b40
Merge pull request #19058 from sl-sergei:cuda_1d
Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Add Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Fix typo

* fix comment

* Update fix

* make changes more correct for pooling layer

* Minor fixes for review

* Split skip blocks
2021-01-21 22:16:56 +00:00
YashasSamaga
8c74d7e4fa add half pixel centers and align corners param 2020-12-27 15:05:39 +05:30
Julien
48ddb53332
Merge pull request #18386 from JulienMaille:patch-1
* Make sure there is a cuda device before getting it

* Update init.hpp
2020-09-23 09:15:02 +00:00
YashasSamaga
a3106d424b add MVNOp 2020-08-02 12:44:35 +05:30
Yashas Samaga B L
f53f491cd2
Merge pull request #17939 from YashasSamaga:cuda4dnn-fix-eltwise-fusion
* fix eltwise fusion segfault, more eltwise fusions, fix power fusion

* add assertion
2020-08-01 15:03:07 +03:00
YashasSamaga
ae293f27cf add DetectionOutputOp 2020-07-29 12:28:00 +05:30
YashasSamaga
1949056423 improved diagnostics for build issues 2020-07-13 21:09:38 +05:30
Yashas Samaga B L
d0e6d2438c
Merge pull request #17363 from YashasSamaga:cuda4dnn-eltwise-fusion2
cuda4dnn(conv): fuse eltwise with convolutions

* fuse eltwise with convolutions

* manually rebase to avoid bad git merge
2020-07-09 16:02:21 +03:00
Alexander Alekhin
988bc804bf Merge pull request #17748 from YashasSamaga:cuda4dnn-data-parallel 2020-07-08 20:20:19 +00:00
Alexander Alekhin
6781ca7d55 Merge pull request #17685 from YashasSamaga:cuda4dnn-cudnn8-support 2020-07-06 22:48:07 +00:00
YashasSamaga
cbdaa93e54 reduce slice, concat to copy; enable more concat fusions 2020-07-05 20:52:35 +05:30
YashasSamaga
4988e131fd transfer output blobs in background 2020-07-04 12:55:12 +05:30
YashasSamaga
62a63021c7 add cuDNN 8 support 2020-06-30 21:51:23 +05:30
Yashas Samaga B L
9ba5581d17
Merge pull request #17534 from YashasSamaga:cuda4dnn-remove-unused-funcs
cuda4dnn: reduce CUDA version requirements to at least CUDA 9.2

* remove half2 specializations

* do not remove atomicAdd for half in CUDA 10 and below

* remove fp16.hpp
2020-06-17 09:07:52 +00:00
YashasSamaga
265acccd56 allow multiple inputs to resize, fix tests 2020-06-11 19:31:48 +05:30
YashasSamaga
57ca10636c do not create redundant handles 2020-05-22 19:52:20 +05:30
YashasSamaga
3c35b563d7 add scale_x_y parameter to region 2020-05-10 16:53:28 +05:30
YashasSamaga
aff2c7c43c handle redundant slice in SliceOp 2020-04-24 12:54:17 +05:30
YashasSamaga
4e8cd4629c fix CUDNN_STATUS_NOT_SUPPORTED, remove redundant fusion checks 2020-03-23 19:47:00 +05:30
YashasSamaga
2aeb32d2d1 fix segfaults, support bias in untrainable mode, support batches in untrainable mode 2020-03-22 22:18:52 +05:30
YashasSamaga
034a43e7f7 release and relock on wrapper resize 2020-03-17 16:08:04 +05:30
Yashas Samaga B L
8808aaccff
Merge pull request #16658 from YashasSamaga:cuda4dnn-refactor-activations
cuda4dnn(activations, eltwise, scale_shift): refactor to reduce code duplication

* refactor activations

* refactor eltwise kernels

* move all functors to functors.hpp

* remove bias1 and scale1 kernels
2020-02-29 11:46:14 +03:00
YashasSamaga
c23ab37355 fix weights rank assertion in InnerProductOp 2020-02-22 16:59:09 +05:30
Alexander Alekhin
2ced568d34 Merge pull request #16220 from YashasSamaga:cuda4dnn-roi-pooling-test_fix-optim 2020-01-29 20:57:15 +00:00
Yashas Samaga B L
d85e67d3ec Merge pull request #16063 from YashasSamaga:cuda4dnn-shortcut-unequal
support eltwise sum with different number of input channels in CUDA backend

* add shortcut primitive

* add offsets in shortcut kernel

* skip tests involving more than two inputs

* remove redundant modulus operation

* support multiple inputs

* remove whole file indentation

* skip acc in0 trunc test if weighted

* use shortcut iff channels are unequal
2020-01-16 21:54:00 +03:00
YashasSamaga
fd369a5004 fix and optimize ROIPooling 2020-01-15 22:53:48 +05:30
Julien
4e2ef8c8f5 Merge pull request #16218 from JulienMaille:cuda-dnn-for-older-gpus
Enable cuda4dnn on hardware without support for __half

* Enable cuda4dnn on hardware without support for half (ie. compute capability < 5.3)

Update CMakeLists.txt

Lowered minimum CC to 3.0

* UPD: added ifdef on new copy kernel

* added fp16 support detection at runtime

* Clarified #if condition on atomicAdd definition

* More explicit CMake error message
2020-01-15 18:28:37 +03:00
Alexander Alekhin
1f2b2c5242 Merge pull request #16230 from YashasSamaga:cuda4dnn-fp-conversion 2020-01-05 11:59:33 +00:00
YashasSamaga
48eecafc89 simplify code to help MSVC 19.10 and lower 2019-12-30 23:02:17 +05:30
YashasSamaga
01f97f150c perfor fp conversions on GPU 2019-12-30 00:05:39 +05:30
YashasSamaga
17a35587e1 use optimized cuDNN path for conv + bias + relu 2019-12-29 13:08:38 +05:30
Alexander Alekhin
9ec3d76b21 Merge pull request #16241 from bwignall:typo 2019-12-27 16:18:57 +00:00
Brian Wignall
659ffaddb4 Fix spelling typos 2019-12-26 06:45:03 -05:00
YashasSamaga
16bc505d26 improve reduction logic and add fast transpose kernel 2019-12-24 00:23:45 +05:30
Alexander Alekhin
b8e0898c7c Merge pull request #16082 from YashasSamaga:cuda4dnn-roi-pooling 2019-12-18 14:41:58 +00:00