opencv/cmake
Yashas Samaga B L 613c12e590 Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low
CUDA backend for the DNN module

* stub cuda4dnn design

* minor fixes for tests and doxygen

* add csl public api directory to module headers

* add low-level CSL components

* add high-level CSL components

* integrate csl::Tensor into backbone code

* switch to CPU iff unsupported; otherwise, fail on error

* add fully connected layer

* add softmax layer

* add activation layers

* support arbitary rank TensorDescriptor

* pass input wrappers to `initCUDA()`

* add 1d/2d/3d-convolution

* add pooling layer

* reorganize and refactor code

* fixes for gcc, clang and doxygen; remove cxx14/17 code

* add blank_layer

* add LRN layer

* add rounding modes for pooling layer

* split tensor.hpp into tensor.hpp and tensor_ops.hpp

* add concat layer

* add scale layer

* add batch normalization layer

* split math.cu into activations.cu and math.hpp

* add eltwise layer

* add flatten layer

* add tensor transform api

* add asymmetric padding support for convolution layer

* add reshape layer

* fix rebase issues

* add permute layer

* add padding support for concat layer

* refactor and reorganize code

* add normalize layer

* optimize bias addition in scale layer

* add prior box layer

* fix and optimize normalize layer

* add asymmetric padding support for pooling layer

* add event API

* improve pooling performance for some padding scenarios

* avoid over-allocation of compute resources to kernels

* improve prior box performance

* enable layer fusion

* add const layer

* add resize layer

* add slice layer

* add padding layer

* add deconvolution layer

* fix channelwise  ReLU initialization

* add vector traits

* add vectorized versions of relu, clipped_relu, power

* add vectorized concat kernels

* improve concat_with_offsets performance

* vectorize scale and bias kernels

* add support for multi-billion element tensors

* vectorize prior box kernels

* fix address alignment check

* improve bias addition performance of conv/deconv/fc layers

* restructure code for supporting multiple targets

* add DNN_TARGET_CUDA_FP64

* add DNN_TARGET_FP16

* improve vectorization

* add region layer

* improve tensor API, add dynamic ranks

1. use ManagedPtr instead of a Tensor in backend wrapper
2. add new methods to tensor classes
  - size_range: computes the combined size of for a given axis range
  - tensor span/view can be constructed from a raw pointer and shape
3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time)
4. remove device code from tensor classes (as they are unused)
5. enforce strict conditions on tensor class APIs to improve debugging ability

* fix parametric relu activation

* add squeeze/unsqueeze tensor API

* add reorg layer

* optimize permute and enable 2d permute

* enable 1d and 2d slice

* add split layer

* add shuffle channel layer

* allow tensors of different ranks in reshape primitive

* patch SliceOp to allow Crop Layer

* allow extra shape inputs in reshape layer

* use `std::move_backward` instead of `std::move` for insert in resizable_static_array

* improve workspace management

* add spatial LRN

* add nms (cpu) to region layer

* add max pooling with argmax ( and a fix to limits.hpp)

* add max unpooling layer

* rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA

* update supportBackend to be more rigorous

* remove stray include from preventing non-cuda build

* include op_cuda.hpp outside condition #if

* refactoring, fixes and many optimizations

* drop DNN_TARGET_CUDA_FP64

* fix gcc errors

* increase max. tensor rank limit to six

* add Interp layer

* drop custom layers; use BackendNode

* vectorize activation kernels

* fixes for gcc

* remove wrong assertion

* fix broken assertion in unpooling primitive

* fix build errors in non-CUDA build

* completely remove workspace from public API

* fix permute layer

* enable accuracy and perf. tests for DNN_TARGET_CUDA

* add asynchronous forward

* vectorize eltwise ops

* vectorize fill kernel

* fixes for gcc

* remove CSL headers from public API

* remove csl header source group from cmake

* update min. cudnn version in cmake

* add numerically stable FP32 log1pexp

* refactor code

* add FP16 specialization to cudnn based tensor addition

* vectorize scale1 and bias1 + minor refactoring

* fix doxygen build

* fix invalid alignment assertion

* clear backend wrappers before allocateLayers

* ignore memory lock failures

* do not allocate internal blobs

* integrate NVTX

* add numerically stable half precision log1pexp

* fix indentation, following coding style,  improve docs

* remove accidental modification of IE code

* Revert "add asynchronous forward"

This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70.

* [cmake] throw error for unsupported CC versions

* fix rebase issues

* add more docs, refactor code, fix bugs

* minor refactoring and fixes

* resolve warnings/errors from clang

* remove haveCUDA() checks from supportBackend()

* remove NVTX integration

* changes based on review comments

* avoid exception when no CUDA device is present

* add color code for CUDA in Net::dump
2019-10-21 14:28:00 +03:00
..
android FIx misc. source and comment typos 2019-08-15 13:09:52 +03:00
checks Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-09 19:46:18 +00:00
FindCUDA FIx misc. source and comment typos 2019-08-15 13:09:52 +03:00
platforms cmake: update initialization 2019-08-08 15:23:16 +03:00
templates Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-05 18:11:43 +00:00
cl2cpp.cmake cmake: don't add include <module>/src directory to avoid conflicts 2018-03-19 11:14:15 +03:00
copy_files.cmake cmake: fix android examples dependencies 2018-03-15 14:17:02 +03:00
FindCUDA.cmake Misc. typos 2018-07-31 18:44:23 +03:00
FindCUDNN.cmake Merge pull request #14660 from YashasSamaga:dnn-cuda-build 2019-06-02 14:47:15 +03:00
FindFlake8.cmake cmake: added check_flake8 target 2018-05-11 17:32:22 +03:00
FindOpenVX.cmake Updated OpenVX detector and wrappers to handle Reference attribute names change 2017-03-22 16:50:38 +03:00
FindPylint.cmake cmake: fix Pylint version detection 2017-08-28 19:03:44 +03:00
FindVulkan.cmake Merge pull request #12703 from wzw-intel:vkcom 2018-10-29 17:51:26 +03:00
OpenCVCompilerDefenses.cmake Added support for Clang build hardening (including Apple) 2019-05-07 13:00:43 +03:00
OpenCVCompilerOptimizations.cmake fix avx512 detection 2019-10-05 11:03:57 +00:00
OpenCVCompilerOptions.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-16 18:48:08 +03:00
OpenCVCRTLinkage.cmake cmake: update initialization 2019-08-08 15:23:16 +03:00
OpenCVDetectApacheAnt.cmake FIx misc. source and comment typos 2019-08-15 13:09:52 +03:00
OpenCVDetectCUDA.cmake Removing static linking of cuda library 2019-08-05 21:42:10 +05:30
OpenCVDetectCXXCompiler.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
OpenCVDetectDirectX.cmake Merge pull request #14294 from alalek:issue_14286 2019-04-11 17:50:15 +03:00
OpenCVDetectHalide.cmake cmake: add Halide support (#8794) 2017-06-21 14:33:47 +03:00
OpenCVDetectInferenceEngine.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-10-05 15:45:31 +00:00
OpenCVDetectOpenCL.cmake Merge pull request #14294 from alalek:issue_14286 2019-04-11 17:50:15 +03:00
OpenCVDetectPython.cmake Ported install layout refactoring from master branch 2019-08-29 17:01:49 +03:00
OpenCVDetectTBB.cmake Fix TBB debug 2019-05-27 10:26:06 +03:00
OpenCVDetectTrace.cmake trace: initial support for code trace 2017-06-26 17:07:13 +03:00
OpenCVDetectVTK.cmake Merge pull request #12887 from alalek:fix_cmake_conditions 2018-10-24 13:17:54 +00:00
OpenCVDetectVulkan.cmake Merge pull request #12703 from wzw-intel:vkcom 2018-10-29 17:51:26 +03:00
OpenCVDownload.cmake cmake: add directory creation to download helper scripts 2019-09-04 17:09:13 +03:00
OpenCVExtraTargets.cmake Added group targets for samples (opencv_samples, opencv_samples_<group>), install samples/data inseparate component 'samples_data' 2018-02-12 18:42:36 +03:00
OpenCVFindAtlas.cmake moved BLAS/LAPACK detection scripts from opencv_contrib/dnn to the main repository (#7918) 2016-12-22 22:57:44 +03:00
OpenCVFindFrameworks.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-11-02 05:33:35 +00:00
OpenCVFindIPP.cmake cmake: fix licenses install rules 2019-04-02 19:24:14 +03:00
OpenCVFindIPPIW.cmake cmake: add check for IPP IW license files 2019-10-01 18:24:03 +03:00
OpenCVFindLAPACK.cmake LAPACK: add support for complex numbers for MSVC 2019-05-22 18:43:45 +03:00
OpenCVFindLATEX.cmake Started top-level CMakeLists.txt file reorganization: cmake scripts are moved to separate folder; refactored BUILD_*, INSTALL_*, ENABLE_*, USE_*, WITH_* options. 2012-01-03 13:48:12 +00:00
OpenCVFindLibsGrfmt.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-06-10 19:05:28 +00:00
OpenCVFindLibsGUI.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-05-23 19:50:20 +03:00
OpenCVFindLibsPerf.cmake Merge pull request #13337 from sergiud:eigen-cross-compile 2019-05-20 21:05:42 +03:00
OpenCVFindLibsVideo.cmake cmake: use absolute library paths from 'pkgconfig' 2019-04-15 22:11:49 +00:00
OpenCVFindMKL.cmake cmake: fix variable expand in CMake conditions 2018-10-21 15:02:40 +00:00
OpenCVFindOpenBLAS.cmake cmake: update OpenBLAS support 2017-10-28 10:17:37 +03:00
OpenCVFindOpenEXR.cmake Normalize line endings and whitespace 2012-10-17 15:57:49 +04:00
OpenCVFindProtobuf.cmake Fix install with external protobuf 2018-10-04 13:48:59 +03:00
OpenCVFindVA_INTEL.cmake Removed unnecessary build-time MediaSDK detection 2018-09-13 13:43:11 +03:00
OpenCVFindVA.cmake cmake: allow to specify own libva paths 2018-08-10 16:03:10 +03:00
OpenCVFindWebP.cmake update CMakeList.txt 2018-02-05 16:23:52 +03:00
OpenCVFindXimea.cmake Merge pull request #13422 from mshabunin:split-videoio-cmake 2018-12-26 15:50:20 +03:00
OpenCVGenABI.cmake opencv4: fix abi-checker (to enable API/source checks only) 2018-11-28 18:42:19 +03:00
OpenCVGenAndroidMK.cmake next(android): java3 -> java4 2018-04-10 18:09:54 +03:00
OpenCVGenConfig.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
OpenCVGenHeaders.cmake cmake: support multiple CPU targets 2017-02-13 19:52:59 +03:00
OpenCVGenInfoPlist.cmake Merge pull request #8009 from Legoless:master 2017-01-20 19:16:01 +03:00
OpenCVGenPkgconfig.cmake cmake: update install paths (Linux) 2018-09-19 15:43:52 +03:00
OpenCVGenSetupVars.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-12-18 19:07:43 +00:00
OpenCVInstallLayout.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-09-20 21:11:49 +00:00
OpenCVMinDepVersions.cmake Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low 2019-10-21 14:28:00 +03:00
OpenCVModule.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-16 18:48:08 +03:00
OpenCVPackaging.cmake Merge pull request #14660 from YashasSamaga:dnn-cuda-build 2019-06-02 14:47:15 +03:00
OpenCVPCHSupport.cmake Merge pull request #15433 from huihut:master 2019-09-04 18:36:56 +03:00
OpenCVPylint.cmake cmake: fix Ninja generator warning about pylintrc 2018-04-10 12:23:10 +03:00
OpenCVUtils.cmake Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2019-08-23 19:24:37 +03:00
OpenCVVersion.cmake cmake: hide 'junk' dir from the root of build directory 2018-12-07 05:14:08 +00:00