Commit Graph

11 Commits

Author SHA1 Message Date
HAN Liutong
e5fb50476c
Merge pull request #20521 from hanliutong:dev-rvv-multiVLEN
Make the implementation of optimization in DNN adjustable to different vector sizes with RVV intrinsics.

* Update fastGEMM for multi VLEN.

* Update fastGEMM1T for multi VLEN.

* Update fastDepthwiseConv for multi VLEN.

* Update fastConv for multi VLEN.

* Replace malloc with cv::AutoBuffer.
2021-10-05 15:35:00 +00:00
HAN Liutong
aaca4987c9
Merge pull request #20287 from hanliutong:dev-rvv-0.10
Optimization of DNN using native RISC-V vector intrinsics.

* Use RVV to optimize fastGEMM (FP32) in DNN.

* Use RVV to optimize fastGEMM1T in DNN.

* Use RVV to optimize fastConv in DNN.

* Use RVV to optimize fastDepthwiseConv in DNN.

* Vectorize tails using vl.

* Use "vl" instead of scalar to handle small block in fastConv.

* Fix memory access out of bound in "fastGEMM1T".

* Remove setvl.

* Remove useless initialization.

* Use loop unrolling to handle tail part instead of switch.
2021-08-11 01:16:03 +03:00
Vadim Pisarevsky
77b01deb80
Merge pull request #17858 from vpisarev:dnn_depthwise_conv
* added depth-wise convolution; gives ~20-30% performance improvement in MobileSSD networks

* hopefully, eliminated compile warnings, errors, as well as failure in one test

* * fixed a few typos
* decreased buffer size in some cases
* added more optimal im2row branch in the case of 1x1 convolutions
* tuned fastConv to reduce the number of passes over arrays
2020-08-01 15:05:05 +03:00
rockzhan
1187a7fa34 Merge pull request #11649 from rockzhan:dnn_dw_prelu
dnn: Fix output mismatch when forward dnn model contain [depthwise conv(group=1) + bn + prelu]  (#11649)

* this can make sure [depthwise conv(group=1) + bn + prelu] output not shift

* add TEST to show the output mismatch in [DWconv+Prelu]

* fix typo

* change loading image to init cvMat directly

* build runtime model, without loading external model

* remove whitespace

* change way to create a cvmat

* add bias_term, add target output

* fix [dwconv + prelu] value mismatch when no optimizations

* fix Test error when change output channels

* add parametric test

* change num_output to group value

* change conv code and change test back
2018-06-07 13:45:54 +00:00
Arjan van de Ven
a75840d19c Merge pull request #10468 from fenrus75:avx512-2
* Add a 512 bit codepath to the AVX512 fastConv function

this patch adds a 512 wide codepath to the fastConv() function for
AVX512 use.
The basic idea is to process the first N * 16 elements of the vector
with avx512, and then run the rest of the vector using the traditional
AVX2 codepath.

* dnn: use unaligned AVX512 load (OpenCV aligns data on 32-byte boundary)

* dnn: change "vecsize" condition for AVX512

* dnn: fix indentation
2018-01-31 16:34:12 +03:00
Alexander Alekhin
7d67d60fb1 cmake(opt): AVX512_SKX 2017-12-29 07:18:11 +00:00
Alexander Alekhin
898ca38257 cmake: AVX512 -> AVX_512F 2017-12-28 15:20:27 +00:00
Arjan van de Ven
2938860b3f Provide a few AVX512 optimized functions for the DNN module
This patch adds AVX512 optimized fastConv as well as the hookups
needed to get these called in the convolution_layer.

AVX512 fastConv is code-identical on a C level to the AVX2 one,
but is measurably faster due to AVX512 having more registers available
to cache results in.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2017-12-26 16:00:17 +00:00
Alexander Alekhin
3dee92ec50 fix usage of CV_FMA3 macro 2017-09-26 17:23:54 +03:00
Alexander Alekhin
4784c7be5f dnn: cleanup dispatched code, fix SIMD128 types 2017-07-13 19:00:34 +03:00
Vadim Pisarevsky
ed9564106c reuse AVX2-optimized kernels for AVX1 CPUs (like IvyBridge) 2017-07-06 21:36:59 +03:00