Commit Graph

4 Commits

Author SHA1 Message Date
Yashas Samaga B L
d981d04c76
Merge pull request #17200 from YashasSamaga:cuda4dnn-general-opt1
cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion

* bunch of optimizations

* more accurate implementation for mish
2020-05-09 17:20:30 +00:00
YashasSamaga
fd369a5004 fix and optimize ROIPooling 2020-01-15 22:53:48 +05:30
Julien
4e2ef8c8f5 Merge pull request #16218 from JulienMaille:cuda-dnn-for-older-gpus
Enable cuda4dnn on hardware without support for __half

* Enable cuda4dnn on hardware without support for half (ie. compute capability < 5.3)

Update CMakeLists.txt

Lowered minimum CC to 3.0

* UPD: added ifdef on new copy kernel

* added fp16 support detection at runtime

* Clarified #if condition on atomicAdd definition

* More explicit CMake error message
2020-01-15 18:28:37 +03:00
YashasSamaga
9b8ddba4d1 add ROIPoolingOp 2019-12-06 18:19:37 +05:30