DNN: Accelerating convolution
* Fast Conv of ARM, X86 and universal intrinsics.
* improve code style.
* error fixed.
* improve the License
* optimize memory allocated and Adjust the threshold.
* change FasterRCNN_vgg16 to 2GB memory.
Fix issue 22015, let Clip layer support 1-3 inputs
* Fix issue 22015.
Let layer Clip support 1-3 inputs.
* Resolve other problems caused by modifications
* Update onnx_importer.cpp
added extra checks to min/max handling in Clip
* Add assertions to check the size of the input
* Add test for clip with min and max initializers
* Separate test for "clip_init_min_max". Change the check method for input_size to provide a clearer message in case of problem.
* Add tests for clip with min or max initializers
* Change the implementation of getting input
Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Fix LSTM support in ONNX
* fix LSTM and add peephole support
* disable old tests
* turn lambdas into functions
* more hacks for c++98
* add assertions
* slice fixes
* backport of cuda-related fixes
* address review comments
* add apply softmax option to ClassificationModel
* remove default arguments of ClassificationModel::setSoftMax()
* fix build for python
* fix docs warning for setSoftMax()
* add impl for ClassficationModel()
* fix failed build for docs by trailing whitespace
* move to implement classify() to ClassificationModel_Impl
* move to implement softmax() to ClassificationModel_Impl
* remove softmax from public method in ClassificationModel
Use YuNet of fixed input shape to fix not-supported-dynamic-zero-shape for FaceDetectorYN
* use yunet with input of fixed shape
* update yunet used in face recognition regression
Further optimize DNN for RISC-V Vector.
* Optimize DNN on RVV by using vsetvl.
* Rename vl.
* Update fastConv by using setvl instead of mask.
* Fix fastDepthwiseConv
Update RVV backend for using Clang.
* Update cmake file of clang.
* Modify the RVV optimization on DNN to adapt to clang.
* Modify intrin_rvv: Disable some existing types.
* Modify intrin_rvv: Reinterpret instead of load&cast.
* Modify intrin_rvv: Update load&store without cast.
* Modify intrin_rvv: Rename vfredsum to fredosum.
* Modify intrin_rvv: Rewrite Check all/any by using vpopc.
* Modify intrin_rvv: Use reinterpret instead of c-style casting.
* Remove all macros which is not used in v_reinterpret
* Rename vpopc to vcpop according to spec.
* dnn: LSTM optimisation
This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
fastGEMM1T is already used by the fully-connected layer. This commit involves two minor modifications:
- Use unaligned access. I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
- Allow for weight matrices where the number of columns is not a multiple of 8.
I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
* Fix warning about initialisation order
* Remove C++11 syntax
* Fix build when AVX(2) is not available
In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
* Minor changes as requested:
- Don't check hardware support for AVX(2) when dispatch is disabled for these
- Add braces
* Fix out-of-bounds access in fully connected layer
The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this. The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8. To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
* Improve tail mask handling
- Use static array for generating tail masks (as requested)
- Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
* Revert whitespace change
* Improve readability of conditions for using AVX
* dnn(lstm): minor coding style changes, replaced left aligned load
[GSoC] OpenCV.js: Accelerate OpenCV.js DNN via WebNN
* Add WebNN backend for OpenCV DNN Module
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Add WebNN head files into OpenCV 3rd partiy files
Create webnn.hpp
update cmake
Complete README and add OpenCVDetectWebNN.cmake file
add webnn.cpp
Modify webnn.cpp
Can successfully compile the codes for creating a MLContext
Update webnn.cpp
Update README.md
Update README.md
Update README.md
Update README.md
Update cmake files and
update README.md
Update OpenCVDetectWebNN.cmake and README.md
Update OpenCVDetectWebNN.cmake
Fix OpenCVDetectWebNN.cmake and update README.md
Add source webnn_cpp.cpp and libary libwebnn_proc.so
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
update dnn.cpp
update op_webnn
update op_webnn
Update op_webnn.hpp
update op_webnn.cpp & hpp
Update op_webnn.hpp
Update op_webnn
update the skeleton
Update op_webnn.cpp
Update op_webnn
Update op_webnn.cpp
Update op_webnn.cpp
Update op_webnn.hpp
update op_webnn
update op_webnn
Solved the problems of released variables.
Fixed the bugs in op_webnn.cpp
Implement op_webnn
Implement Relu by WebNN API
Update dnn.cpp for better test
Update elementwise_layers.cpp
Implement ReLU6
Update elementwise_layers.cpp
Implement SoftMax using WebNN API
Implement Reshape by WebNN API
Implement PermuteLayer by WebNN API
Implement PoolingLayer using WebNN API
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Implement poolingLayer by WebNN API and add more detailed logs
Update dnn.cpp
Update dnn.cpp
Remove redundant codes and add more logs for poolingLayer
Add more logs in the pooling layer implementation
Fix the indent issue and resolve the compiling issue
Fix the build problems
Fix the build issue
FIx the build issue
Update dnn.cpp
Update dnn.cpp
* Fix the build issue
* Implement BatchNorm Layer by WebNN API
* Update convolution_layer.cpp
This is a temporary file for Conv2d layer implementation
* Integrate some general functions into op_webnn.cpp&hpp
* Update const_layer.cpp
* Update convolution_layer.cpp
Still have some bugs that should be fixed.
* Update conv2d layer and fc layer
still have some problems to be fixed.
* update constLayer, conv layer, fc layer
There are still some bugs to be fixed.
* Fix the build issue
* Update concat_layer.cpp
Still have some bugs to be fixed.
* Update conv2d layer, fully connected layer and const layer
* Update convolution_layer.cpp
* Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron)
* Delete bib19450.aux
* Add WebNN backend for OpenCV DNN Module
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Add WebNN head files into OpenCV 3rd partiy files
Create webnn.hpp
update cmake
Complete README and add OpenCVDetectWebNN.cmake file
add webnn.cpp
Modify webnn.cpp
Can successfully compile the codes for creating a MLContext
Update webnn.cpp
Update README.md
Update README.md
Update README.md
Update README.md
Update cmake files and
update README.md
Update OpenCVDetectWebNN.cmake and README.md
Update OpenCVDetectWebNN.cmake
Fix OpenCVDetectWebNN.cmake and update README.md
Add source webnn_cpp.cpp and libary libwebnn_proc.so
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
Update dnn.cpp
update dnn.cpp
update op_webnn
update op_webnn
Update op_webnn.hpp
update op_webnn.cpp & hpp
Update op_webnn.hpp
Update op_webnn
update the skeleton
Update op_webnn.cpp
Update op_webnn
Update op_webnn.cpp
Update op_webnn.cpp
Update op_webnn.hpp
update op_webnn
update op_webnn
Solved the problems of released variables.
Fixed the bugs in op_webnn.cpp
Implement op_webnn
Implement Relu by WebNN API
Update dnn.cpp for better test
Update elementwise_layers.cpp
Implement ReLU6
Update elementwise_layers.cpp
Implement SoftMax using WebNN API
Implement Reshape by WebNN API
Implement PermuteLayer by WebNN API
Implement PoolingLayer using WebNN API
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Update pooling_layer.cpp
Implement poolingLayer by WebNN API and add more detailed logs
Update dnn.cpp
Update dnn.cpp
Remove redundant codes and add more logs for poolingLayer
Add more logs in the pooling layer implementation
Fix the indent issue and resolve the compiling issue
Fix the build problems
Fix the build issue
FIx the build issue
Update dnn.cpp
Update dnn.cpp
* Fix the build issue
* Implement BatchNorm Layer by WebNN API
* Update convolution_layer.cpp
This is a temporary file for Conv2d layer implementation
* Integrate some general functions into op_webnn.cpp&hpp
* Update const_layer.cpp
* Update convolution_layer.cpp
Still have some bugs that should be fixed.
* Update conv2d layer and fc layer
still have some problems to be fixed.
* update constLayer, conv layer, fc layer
There are still some bugs to be fixed.
* Update conv2d layer, fully connected layer and const layer
* Update convolution_layer.cpp
* Add OpenCV.js DNN module WebNN Backend (both using webnn-polyfill and electron)
* Update dnn.cpp
* Fix Error in dnn.cpp
* Resolve duplication in conditions in convolution_layer.cpp
* Fixed the issues in the comments
* Fix building issue
* Update tutorial
* Fixed comments
* Address the comments
* Update CMakeLists.txt
* Offer more accurate perf test on native
* Add better perf tests for both native and web
* Modify per tests for better results
* Use more latest version of Electron
* Support latest WebNN Clamp op
* Add definition of HAVE_WEBNN macro
* Support group convolution
* Implement Scale_layer using WebNN
* Add Softmax option for native classification example
* Fix comments
* Fix comments
* dnn(ocl4dnn): fix LRN layer accuracy problems
- FP16 intermediate computation is not accurate and may provide NaN values
* dnn(test): update tolerance for FP16
fix bug: wrong output dimension when "keep_dims" is false in pooling layer.
* fix bug in max layer
* code align
* delete permute layer and add test case
* add name assert
* check other cases
* remove c++11 features
* style:add "const" remove assert
* style:sanitize file names
* dnn: fix unaligned memory access crash on armv7
The getTensorContent function would return a Mat pointing to some
member of a Protobuf-encoded message. Protobuf does not make any
alignment guarantees, which results in a crash on armv7 when loading
models while bit 2 is set in /proc/cpu/alignment (or the relevant
kernel feature for alignment compatibility is disabled). Any read
attempt from the previously unaligned data member would send SIGBUS.
As workaround, this commit makes an aligned copy via existing clone
functionality in getTensorContent. The unsafe copy=false option is
removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact
writes(!) to the Protobuf message. We limit ourselves to fixing the
alignment issues in this commit, and add getTensorContentRefUnaligned
to cover the write case with a safe memcpy. A FIXME marks the issue.
* dnn: reduce amount of .clone() calls
* dnn: update FIXME comment
Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Make the implementation of optimization in DNN adjustable to different vector sizes with RVV intrinsics.
* Update fastGEMM for multi VLEN.
* Update fastGEMM1T for multi VLEN.
* Update fastDepthwiseConv for multi VLEN.
* Update fastConv for multi VLEN.
* Replace malloc with cv::AutoBuffer.
dnn : int8 quantized layers support in onnx importer
* added quantized layers support in onnx importer
* added more cases in eltwise node, some more checks
* added tests for quantized nodes
* relax thresholds for failed tests, address review comments
* refactoring based on review comments
* added support for unsupported cases and pre-quantized resnet50 test
* relax thresholds due to int8 resize layer
Add ExpandDims layer of tf_importer.cpp
* Add ExpandDims to tf_importer.
* add -1 expand test case.
* Support different dimensions of input.
* Compatible with 5-dimensional NDHWC data
* Code align
* support 3-dim input.
* 3-dim bug fixed.
* fixing error of code format.
Add support for YOLOv4x-mish
* backport to 3.4 for supporting yolov4x-mish
* add YOLOv4x-mish test
* address review comments
Co-authored-by: Guo Xu <guoxu@1school.com.cn>
Add Normalize subgraph, fix Slice, Mul and Expand
* Add Normalize subgraph, support for starts<0 and axis<0 in Slice, Mul broadcasting in the middle and fix Expand's unsqueeze
* remove todos
* remove range-based for loop
* address review comments
* change >> to > > in template
* fix indexation
* fix expand that does nothing
* support PPSeg model for dnn module
* fixed README for CI
* add test case
* fixed bug
* deal with comments
* rm dnn_model_runner
* update test case
* fixed bug for testcase
* update testcase
Optimization of DNN using native RISC-V vector intrinsics.
* Use RVV to optimize fastGEMM (FP32) in DNN.
* Use RVV to optimize fastGEMM1T in DNN.
* Use RVV to optimize fastConv in DNN.
* Use RVV to optimize fastDepthwiseConv in DNN.
* Vectorize tails using vl.
* Use "vl" instead of scalar to handle small block in fastConv.
* Fix memory access out of bound in "fastGEMM1T".
* Remove setvl.
* Remove useless initialization.
* Use loop unrolling to handle tail part instead of switch.
Add Python's test for LSTM layer
* Add Python's test for LSTM layer
* Set different test threshold for FP16 target
* rename test to test_input_3d
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
Support non-zero hidden state for LSTM
* fully support non-zero hidden state for LSTM
* check dims of hidden state for LSTM
* fix failed test Test_Model.TextRecognition
* add new tests for LSTM w/ non-zero hidden params
Co-authored-by: Julie Bareeva <julia.bareeva@xperience.ai>
ONNX diagnostic tool
* Final
* Add forgotten Normalize layer to the set of supported types
* ONNX diagnostic tool corrections
* Fixed CI test warnings
* Added code minor corrections
Co-authored-by: Sergey Slashchinin <sergei.slashchinin@xperience.ai>