Reimplementation of Element-wise layers with broadcasting support
* init
* semi-working initial version
* add small_vector
* wip
* remove smallvec
* add nary function
* replace auto with Mat in lambda expr used in transform
* uncomment asserts
* autobuffer shape_buf & step_buf
* fix a missing bracket
* fixed a missing addLayer in parseElementWise
* solve one-dimensional broadcast
* remove pre_broadcast_transform for the case of two constants; fix missing constBlobsExtraInfo when addConstant is called
* one autobuffer for step & shape
* temporal fix for the missing original dimension information
* fix parseUnsqueeze when it gets a 1d tensor constant
* support sum/mean/min/max with only one input
* reuse old code to handle cases of two non-constant inputs
* add condition to handle div & mul of two non-constant inputs
* use || instead of or
* remove trainling spaces
* enlarge buf in binary_forward to contain other buffer
* use autobuffer in nary_forward
* generate data randomly and add more cases for perf
* add op and, or & xor
* update perf_dnn
* remove some comments
* remove legacy; add two ONNX conformance tests in filter
* move from cpu_denylist to all_denylist
* adjust parsing for inputs>=2
Co-authored-by: fengyuentau <yuantao.feng@opencv.org.cn>
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load
Add support for Conv1D on OpenCV backend
* Add support for Conv1D on OpenCV backend
* disable tests on other targets/backends
* Fix formatting
* Restore comment
* Remove unnecessary flag and fix test logic
* Fix perf test
* fix braces
* Fix indentation, assert check and remove unnecessary condition
* Remove unnecessary changes
* Add test cases for variable weights and bias
* dnn(conv): fallback on OpenCV+CPU instead of failures
* coding style
* Fix precision in tests for MyriadX
* Fix ONNX tests
* Add output range in ONNX tests
* Skip tests on Myriad OpenVINO 2018R5
* Add detect MyriadX
* Add detect MyriadX on OpenVINO R5
* Skip tests on Myriad next version of OpenVINO
* dnn(ie): VPU type from environment variable
* dnn(test): validate VPU type
* dnn(test): update DLIE test skip conditions