Alexander Alekhin
8b4fa2605e
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-12-03 12:32:49 +00:00
Andrew Ryrie
ea7d4be3f8
Merge pull request #20658 from smbz:lstm_optimisation
...
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load
2021-11-29 21:43:00 +00:00
Alexander Alekhin
24fcb7f813
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2021-09-25 17:50:00 +00:00
Alexander Alekhin
1aacb9bb15
dnn(perf): update convolution tests
2021-09-10 13:11:02 +00:00
Alexander Alekhin
624d532000
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-12-17 21:05:34 +00:00
Alexander Alekhin
28aab134db
dnn(test): update tests for OpenVINO 2021.2
2020-12-17 07:53:35 +00:00
Omar Alzaibaq
a316b11aaa
Merge pull request #18220 from Omar-AE:hddl-supported
...
* added HDDL VPU support
* changed to return True in one line if any device connected
* dnn: use releaseHDDLPlugin()
* dnn(hddl): fix conditions
2020-11-17 19:47:24 +00:00
Alexander Alekhin
a7c150ec66
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-11-13 22:29:14 +00:00
Sergei Slashchinin
61144f935e
Merge pull request #18783 from sl-sergei:fix_conv1d
...
Add support for Conv1D on OpenCV backend
* Add support for Conv1D on OpenCV backend
* disable tests on other targets/backends
* Fix formatting
* Restore comment
* Remove unnecessary flag and fix test logic
* Fix perf test
* fix braces
* Fix indentation, assert check and remove unnecessary condition
* Remove unnecessary changes
* Add test cases for variable weights and bias
* dnn(conv): fallback on OpenCV+CPU instead of failures
* coding style
2020-11-13 22:22:10 +00:00
Alexander Alekhin
1b443219ed
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-10-09 20:09:26 +00:00
Alexander Alekhin
6da05f7086
dnn(test): update tests for OpenVINO 2021.1
2020-10-08 10:22:31 +00:00
Alexander Alekhin
9b7b22ee0e
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-07-16 20:13:27 +00:00
Alexander Alekhin
b2ebd37ee2
Merge pull request #17856 from alalek:dnn_openvino_2020.4.0
2020-07-16 20:08:00 +00:00
Alexander Alekhin
81e027eef7
dnn: fix OpenCL implementation of Slice layer
2020-07-16 04:33:52 +00:00
Alexander Alekhin
1c371d07b5
dnn(test): adjust tests for OpenVINO 2020.4
2020-07-15 23:47:40 +00:00
Alexander Alekhin
524a2fffe9
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-07-06 23:05:04 +00:00
Alexander Alekhin
99c4b76a6d
dnn(test): add YOLOv4-tiny tests
2020-07-06 21:36:19 +00:00
Alexander Alekhin
c3e8a82c9c
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2020-05-28 23:53:54 +00:00
Alexander Alekhin
e58e545584
Merge pull request #17392 from alalek:dnn_test_yolov4
2020-05-28 22:52:21 +00:00
Dmitry Kurtaev
d9bada9867
dnn: EfficientDet
2020-05-28 17:23:42 +03:00
Alexander Alekhin
6b89154afd
dnn(test): add YOLOv4 tests
2020-05-28 13:27:40 +00:00
Alexander Alekhin
8108fb0575
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-12-05 18:27:45 +03:00
Dmitry Kurtaev
d8e10f3a8d
Enable MaxPooling with indices in Inference Engine
2019-12-04 19:14:55 +03:00
Alexander Alekhin
4b0132ed7a
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-12-02 16:26:52 +03:00
Lubov Batanina
7523c777c5
Merge pull request #15537 from l-bat:ngraph
...
* Support nGraph
* Fix resize
2019-12-02 16:16:06 +03:00
Yashas Samaga B L
613c12e590
Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low
...
CUDA backend for the DNN module
* stub cuda4dnn design
* minor fixes for tests and doxygen
* add csl public api directory to module headers
* add low-level CSL components
* add high-level CSL components
* integrate csl::Tensor into backbone code
* switch to CPU iff unsupported; otherwise, fail on error
* add fully connected layer
* add softmax layer
* add activation layers
* support arbitary rank TensorDescriptor
* pass input wrappers to `initCUDA()`
* add 1d/2d/3d-convolution
* add pooling layer
* reorganize and refactor code
* fixes for gcc, clang and doxygen; remove cxx14/17 code
* add blank_layer
* add LRN layer
* add rounding modes for pooling layer
* split tensor.hpp into tensor.hpp and tensor_ops.hpp
* add concat layer
* add scale layer
* add batch normalization layer
* split math.cu into activations.cu and math.hpp
* add eltwise layer
* add flatten layer
* add tensor transform api
* add asymmetric padding support for convolution layer
* add reshape layer
* fix rebase issues
* add permute layer
* add padding support for concat layer
* refactor and reorganize code
* add normalize layer
* optimize bias addition in scale layer
* add prior box layer
* fix and optimize normalize layer
* add asymmetric padding support for pooling layer
* add event API
* improve pooling performance for some padding scenarios
* avoid over-allocation of compute resources to kernels
* improve prior box performance
* enable layer fusion
* add const layer
* add resize layer
* add slice layer
* add padding layer
* add deconvolution layer
* fix channelwise ReLU initialization
* add vector traits
* add vectorized versions of relu, clipped_relu, power
* add vectorized concat kernels
* improve concat_with_offsets performance
* vectorize scale and bias kernels
* add support for multi-billion element tensors
* vectorize prior box kernels
* fix address alignment check
* improve bias addition performance of conv/deconv/fc layers
* restructure code for supporting multiple targets
* add DNN_TARGET_CUDA_FP64
* add DNN_TARGET_FP16
* improve vectorization
* add region layer
* improve tensor API, add dynamic ranks
1. use ManagedPtr instead of a Tensor in backend wrapper
2. add new methods to tensor classes
- size_range: computes the combined size of for a given axis range
- tensor span/view can be constructed from a raw pointer and shape
3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time)
4. remove device code from tensor classes (as they are unused)
5. enforce strict conditions on tensor class APIs to improve debugging ability
* fix parametric relu activation
* add squeeze/unsqueeze tensor API
* add reorg layer
* optimize permute and enable 2d permute
* enable 1d and 2d slice
* add split layer
* add shuffle channel layer
* allow tensors of different ranks in reshape primitive
* patch SliceOp to allow Crop Layer
* allow extra shape inputs in reshape layer
* use `std::move_backward` instead of `std::move` for insert in resizable_static_array
* improve workspace management
* add spatial LRN
* add nms (cpu) to region layer
* add max pooling with argmax ( and a fix to limits.hpp)
* add max unpooling layer
* rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA
* update supportBackend to be more rigorous
* remove stray include from preventing non-cuda build
* include op_cuda.hpp outside condition #if
* refactoring, fixes and many optimizations
* drop DNN_TARGET_CUDA_FP64
* fix gcc errors
* increase max. tensor rank limit to six
* add Interp layer
* drop custom layers; use BackendNode
* vectorize activation kernels
* fixes for gcc
* remove wrong assertion
* fix broken assertion in unpooling primitive
* fix build errors in non-CUDA build
* completely remove workspace from public API
* fix permute layer
* enable accuracy and perf. tests for DNN_TARGET_CUDA
* add asynchronous forward
* vectorize eltwise ops
* vectorize fill kernel
* fixes for gcc
* remove CSL headers from public API
* remove csl header source group from cmake
* update min. cudnn version in cmake
* add numerically stable FP32 log1pexp
* refactor code
* add FP16 specialization to cudnn based tensor addition
* vectorize scale1 and bias1 + minor refactoring
* fix doxygen build
* fix invalid alignment assertion
* clear backend wrappers before allocateLayers
* ignore memory lock failures
* do not allocate internal blobs
* integrate NVTX
* add numerically stable half precision log1pexp
* fix indentation, following coding style, improve docs
* remove accidental modification of IE code
* Revert "add asynchronous forward"
This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70.
* [cmake] throw error for unsupported CC versions
* fix rebase issues
* add more docs, refactor code, fix bugs
* minor refactoring and fixes
* resolve warnings/errors from clang
* remove haveCUDA() checks from supportBackend()
* remove NVTX integration
* changes based on review comments
* avoid exception when no CUDA device is present
* add color code for CUDA in Net::dump
2019-10-21 14:28:00 +03:00
Alexander Alekhin
2ad0487cec
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-08-13 18:32:29 +00:00
Dmitry Kurtaev
6193e403e7
Enable some tests for 2019R2
2019-08-07 09:07:53 +03:00
Alexander Alekhin
174b4ce29d
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-08-05 18:11:43 +00:00
Dmitry Kurtaev
a0c3bb70a9
Modify SSD from TensorFlow graph generation script to enable MyriadX
2019-07-26 13:57:08 +03:00
Alexander Alekhin
0cf479dd5c
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-07-25 19:21:47 +00:00
Alexander Alekhin
416c693b3f
dnn(test): OpenVINO 2019R2
2019-07-25 19:01:16 +03:00
Alexander Alekhin
f6c573880e
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-07-12 18:45:06 +00:00
Lubov Batanina
8bcd7e122a
Merge pull request #14842 from l-bat:ocv_conv3d
...
* Support Conv3D on OCV backend
* Add header
* Add perf tests
* Support pool3d
* Enable Resnet34_kinetics on OCV backend
* Add test
* Fix conv
* Optimize Conv2D
2019-07-11 20:13:52 +03:00
Alexander Alekhin
b95e93c20a
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-06-26 20:19:04 +00:00
Alexander Alekhin
13a782c039
test: fix usage of findDataFile()
...
misused 'optional' mode
2019-06-20 18:20:14 +03:00
Alexander Alekhin
f3de2b4be7
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-06-05 19:11:52 +03:00
Dmitry Kurtaev
9c0af1f675
Enable more deconvolution layer configurations with IE backend
2019-06-03 08:15:52 +03:00
Alexander Alekhin
43467a2ac7
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-05-28 18:29:48 +00:00
Dmitry Kurtaev
44d21e5a79
Enable Slice layer on Inference Engine backend
2019-05-27 16:28:01 +03:00
Alexander Alekhin
4001346a30
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-04-03 19:33:52 +00:00
Alexander Alekhin
cafa010389
dnn(test): skip tests
2019-04-03 17:49:05 +03:00
Alexander Alekhin
33dde339fe
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-04-01 18:11:55 +03:00
Alexander Alekhin
fcb07c64f3
cmake: fix build of dnn tests with shared common code
...
- don't share .cpp files (PCH support is broken)
2019-03-31 08:52:25 +00:00
Alexander Alekhin
7442100caa
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-03-29 19:29:36 +00:00
Lubov Batanina
7d3d6bc4e2
Merge pull request #13932 from l-bat:MyriadX_master_dldt
...
* Fix precision in tests for MyriadX
* Fix ONNX tests
* Add output range in ONNX tests
* Skip tests on Myriad OpenVINO 2018R5
* Add detect MyriadX
* Add detect MyriadX on OpenVINO R5
* Skip tests on Myriad next version of OpenVINO
* dnn(ie): VPU type from environment variable
* dnn(test): validate VPU type
* dnn(test): update DLIE test skip conditions
2019-03-29 16:42:58 +03:00
Alexander Alekhin
c3cf35ab63
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-02-26 17:34:42 +03:00
Dmitry Kurtaev
ed710eaa1c
Make Inference Engine R3 as a minimal supported version
2019-02-21 09:32:26 +03:00
Alexander Alekhin
8bde6aea4b
Merge remote-tracking branch 'upstream/3.4' into merge-3.4
2019-02-19 19:49:13 +00:00
Liubov Batanina
183c0fcab1
Changed condition for resize and lrn layers
2019-02-14 13:11:14 +03:00