Commit Graph

59 Commits

Author SHA1 Message Date
thebhatman
8a18d132fc Port Swish and Mish layers 2019-12-01 11:55:39 +03:00
Alexander Alekhin
24790e4061
Merge pull request #14899 from alalek:dnn_fix_bnll_layer
* dnn: fix BNLLLayer implementation

details: https://github.com/BVLC/caffe/blame/1.0/src/caffe/layers/bnll_layer.cpp#L17

* dnn: enable OCV/OpenCL BNLL layer
2019-06-26 23:04:26 +03:00
Dmitry Kurtaev
dfdc91f8c9 dnn: fix MVN layer (issue 14683) 2019-06-14 18:38:05 +03:00
Alexander Alekhin
eab6744ac7 dnn(ocl): use compile-time LOCAL_SIZE parameter
instead of get_local_size(0) and dynamic local memory allocation
2019-02-05 15:51:16 +03:00
Alexander Alekhin
eec468fa13 dnn(ocl4dnn): calculate activation expression once
- to avoid multiple conditional calls via sub_group() functions
2018-10-02 21:23:41 +00:00
Alexander Alekhin
0f031b6680 dnn(ocl4dnn): drop weights_buf
- avoid memory access violation during "prefetch" stage
2018-09-30 20:35:41 +00:00
Dmitry Kurtaev
24ab751547 Merge pull request #12565 from dkurt:dnn_non_intel_gpu
* Remove isIntel check from deep learning layers

* Remove fp16->fp32 fallbacks where it's not necessary

* Fix Kernel::run to prevent localsize > globalsize
2018-09-26 16:27:00 +03:00
Lubov Batanina
43f889ae1f Merge pull request #12519 from l-bat:l-bat/onnx_parser
Support asymmetric padding in pooling layer (#12519)

* Add Inception_V1 support in ONNX

* Add asymmetric padding in OpenCL and Inference engine

* Refactoring
2018-09-17 20:26:17 +03:00
Dmitry Kurtaev
09fa758725 Replace Darknet's Reorg to permute layer 2018-09-12 18:13:39 +03:00
Marat K
38f8fc6c82 Merge pull request #12249 from kopytjuk:feature/region-layer-batch-mode
Feature/region layer batch mode (#12249)

* Add batch mode for Darknet networks.

Swap variables in test_darknet.

Adapt reorg layer to batch mode.

Adapt region layer.

Add OpenCL implementation.

Remove trailing whitespace.

Bugifx reorg opencl implementation.

Fix bug in OpenCL reorg.

Fix modulo bug.

Fix bug.

Reorg openCL.

Restore reorg layer opencl code.

OpenCl fix.

Work on openCL reorg.

Remove whitespace.

Fix openCL region layer implementation.

Fix bug.

Fix softmax region opencl bug.

Fix opencl bug.

Fix openCL bug.

Update aff_trans.cpp

When the fullAffine parameter is set to false, the estimateRigidTransform function maybe return empty, then the _localAffineEstimate function will be called, but the bug in it will result in incorrect results.

core(libva): support YV12 too

Added to CPU path only.
OpenCL code path still expects NV12 only (according to Intel OpenCL extension)

cmake: allow to specify own libva paths

via CMake:
- `-DVA_LIBRARIES=/opt/intel/mediasdk/lib64/libva.so.2\;/opt/intel/mediasdk/lib64/libva-drm.so.2`

android: NDK17 support

tested with NDK 17b (17.1.4828580)

Enable more deep learning tests using Intel's Inference Engine backend

ts: don't pass NULL for std::string() constructor

openvino: use 2018R3 defines

experimental version++

OpenCV version++

OpenCV 3.4.3

OpenCV version '-openvino'

openvino: use 2018R3 defines

Fixed windows build with InferenceEngine

dnn: fix variance setting bug for PriorBoxLayer

- The size of second channel should be size[2] of output tensor,
- The Scalar should be {variance[0], variance[0], variance[0], variance[0]}
  for _variance.size() == 1 case.

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

Fix lifetime of networks which are loaded from Model Optimizer IRs

Adds a small note describing BUILD_opencv_world (#12332)

* Added a mall note describing BUILD_opencv_world cmake option to the Installation in Windows tutorial.

* Made slight changes in BUILD_opencv_world documentation.

* Update windows_install.markdown

improved grammar

Update opengl_interop.cpp

resolves #12307

java: fix LIST_GET macro

fix typo

Added option to fail on missing testdata

Fixed that object_detection.py does not work in python3.

cleanup: IPP Async (IPP_A)

except header file with conversion routines (will be removed in OpenCV 4.0)

imgcodecs: add null pointer check

Include preprocessing nodes to object detection TensorFlow networks (#12211)

* Include preprocessing nodes to object detection TensorFlow networks

* Enable more fusion

* faster_rcnn_resnet50_coco_2018_01_28 test

countNonZero function reworked to use wide universal intrinsics instead of SSE2 intrinsics

resolve #5788

imgcodecs(webp): multiple fixes

- don't reallocate passed 'img' (test fixed - must use IMREAD_UNCHANGED / IMREAD_ANYCOLOR)
- avoid memory DDOS
- avoid reading of whole file during header processing
- avoid data access after allocated buffer during header processing (missing checks)
- use WebPFree() to free allocated buffers (libwebp >= 0.5.0)
- drop unused & undefined `.close()` method
- added checks for channels >= 5 in encoder

ml: fix adjusting K in KNearest (#12358)

dnn(perf): fix and merge Convolution tests

- OpenCL tests didn't run any OpenCL kernels
- use real configuration from existed models (the first 100 cases)
- batch size = 1

dnn(test): use dnnBackendsAndTargets() param generator

Bit-exact resize reworked to use wide intrinsics (#12038)

* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize

Bit-exact GaussianBlur reworked to use wide intrinsics (#12073)

* Bit-exact GaussianBlur reworked to use wide intrinsics

* Added v_mul_hi universal intrinsic

* Removed custom SSE2 branch from bit-exact GaussianBlur

* Removed loop unrolling for gaussianBlur horizontal smoothing

doc: fix English gramma in tutorial out-of-focus-deblur filter (#12214)

* doc: fix English gramma in tutorial out-of-focus-deblur filter

* Update out_of_focus_deblur_filter.markdown

slightly modified one sentence

doc: add new tutorial motion deblur filter (#12215)

* doc: add new tutorial motion deblur filter

* Update motion_deblur_filter.markdown

a few minor changes

Replace Slice layer to Crop in Faster-RCNN networks from Caffe

js: use generated list of OpenCV headers

- replaces hand-written list

imgcodecs(webp): use safe cast to size_t on Win32

* Put Version status back to -dev.

follow the common codestyle

Exclude some target engines.

Refactor formulas.

Refactor code.

* Remove unused variable.

* Remove inference engine check for yolov2.

* Alter darknet batch tests to test with two different images.

* Add yolov3 second image GT.

* Fix bug.

* Fix bug.

* Add second test.

* Remove comment.

* Add NMS on network level.

* Add helper files to dev.

* syntax fix.

* Fix OD sample.

Fix sample dnn object detection.

Fix NMS boxes bug.

remove trailing whitespace.

Remove debug function.

Change thresholds for opencl tests.

* Adapt score diff and iou diff.

* Alter iouDiffs.

* Add debug messages.

* Adapt iouDiff.

* Fix tests
2018-09-12 13:29:43 +03:00
Alexander Alekhin
b597c87bed dnn(ocl): avoid memory access violation 2018-07-27 15:35:11 +03:00
Dmitry Kurtaev
faa6c4e1e1 Faster-RCNN anf RFCN models on CPU using Intel's Inference Engine backend.
Enable Torch layers tests with Intel's Inference Engine backend.
2018-07-25 19:04:55 +03:00
Alexander Alekhin
78d07e841d Merge pull request #11959 from pengli:3.4 2018-07-17 11:20:02 +00:00
Li Peng
f0cadaa6e3 enable concat layer fuse for OCL target
Signed-off-by: Li Peng <peng.li@intel.com>
2018-07-17 12:46:16 +08:00
Dmitry Kurtaev
dcc1beb1f8 Clip kernel for OpenCL PriorBox layer 2018-07-13 14:49:13 +03:00
Li Peng
4c5a86828a Fix gemmlike convolution input reading
use vload3 for half3 or float3 input vector reading,
also check read position to see if it exceed input width

Signed-off-by: Li Peng <peng.li@intel.com>
2018-07-11 15:25:21 +08:00
Li Peng
145eae321e pooling ocl kernel optimization
set global size with real output size, also optimize

max pooling index computation if necessary.

Signed-off-by: Li Peng <peng.li@intel.com>
2018-06-29 15:22:49 +08:00
Li, Peng
ab8022f74e update convolution opencl kernels in dnn module (#11762)
* optimize ocl kernel enqueue in fc layer

Signed-off-by: Li Peng <peng.li@intel.com>

* use CV_LOG_INFO in convolution auto tuning

Signed-off-by: Li Peng <peng.li@intel.com>

* update convolution IDLF kernel

extend parameter tuning range, also cleanup
ocl kernel implementation

Signed-off-by: Li Peng <peng.li@intel.com>

* update in-memory convolution cache config

fp16 and fp32 cache config are stored separately

Signed-off-by: Li Peng <peng.li@intel.com>
2018-06-25 17:06:18 +03:00
Tomoaki Teshima
2e9e71ab9e make ocl4dnn available to run on other platform than Intel GPU 2018-05-29 19:18:10 +09:00
Li Peng
ba5e8befa9 fp16 ocl support for more layers
Signed-off-by: Li Peng <peng.li@intel.com>
2018-05-16 22:45:04 +08:00
Li Peng
3dd916882a fp16 ocl support for googlenet
Signed-off-by: Li Peng <peng.li@intel.com>
2018-05-16 22:45:02 +08:00
Wu Zhiwen
ef937dd676 ocl4dnn: Fix SAME padding mode for convolve
Signed-off-by: Wu, Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-28 21:02:41 +08:00
Li, Peng
5caf6244a3 Merge pull request #10922 from pengli:dnn
* ave pooling ocl fix

support the padded area control in ave pooling

Signed-off-by: Li Peng <peng.li@intel.com>

* warning fix: ununitialized field
2018-02-22 21:01:12 +03:00
Alexander Alekhin
53305d4a7e Merge pull request #10891 from pengli:dnn 2018-02-20 08:59:07 +00:00
Li Peng
2863f950d6 ReLU6 layer ocl support
include relu6 ocl kernel and layer fusion support

Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-20 15:11:09 +08:00
Dmitry Kurtaev
8b4871a28d Use only absolute prior boxes explicit sizes. Remove scales attributes. (#10874)
* Use only absolute prior boxes explicit sizes. Remove scales attributes.

* Simplified PriorBox layer forward pass
2018-02-19 17:25:18 +03:00
Li Peng
00d2f34888 ocl fix for detection_output and prior_box layer
Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-13 23:09:14 +08:00
luz.paz
5718d09e39 Misc. modules/ typos
Found via `codespell`
2018-02-12 07:09:43 -05:00
Li Peng
389fa5d38e slice layer ocl update
Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-06 22:59:47 +08:00
Li Peng
6aec71d7ee mvn layer ocl update
it fuse ocl kernels to reduce kernel enqueue

Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-01 17:48:12 +08:00
Li Peng
54c81cbde4 eltwise layer SUM op update
Signed-off-by: Li Peng <peng.li@intel.com>
2018-02-01 17:46:06 +08:00
Li Peng
7a4c5e9421 slice layer ocl support
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-29 22:34:32 +08:00
Li Peng
2493083935 mvn, batch_norm and relu layer fusion
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-25 18:57:05 +08:00
Li Peng
e15928b49e convolution and tanh layer fusion
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-25 17:45:33 +08:00
Li Peng
fe494297e4 more update on MVN layer ocl implementation
cut one ocl kernel if normVariance is disabled,
also use native_powr for performance reason.

Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-19 22:54:04 +08:00
Li Peng
2124361ff7 ocl support for Deconvolution layer
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-18 23:40:22 +08:00
Li Peng
e77af4ae33 MVN layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-17 17:11:32 +08:00
Li Peng
7bc017601f Power, Tanh and Channels ReLU layer ocl support
Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-17 17:11:27 +08:00
Li Peng
4189214d04 batch_norm layer ocl update
use a batch_norm ocl kernel to do the work

Signed-off-by: Li Peng <peng.li@intel.com>
2018-01-16 19:01:58 +08:00
Li Peng
181b448c4d add one more convolution kernel tuning candidate
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-22 21:37:00 +08:00
Li Peng
c5fc8e03ff cleanup unnecessary macros in convolution ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-21 20:32:36 +08:00
Li Peng
436d7e4eaf add depthwise convolution kernel
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-19 17:59:13 +08:00
Li Peng
910d7dab1f prior box layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-19 17:44:10 +08:00
Li Peng
59cbaca4d3 detection_output layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-06 22:35:59 +08:00
Li Peng
66feea6cac region layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-07 02:26:46 +08:00
Li Peng
7707c9bfba reorg layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-07 02:26:46 +08:00
Li Peng
7b7033ac60 permute layer ocl implementation
Signed-off-by: Li Peng <peng.li@intel.com>
2017-12-05 22:10:05 +08:00
Alexander Alekhin
13f374660f dnn(ocl4dnn): drop unused batch_size_ in pooling 2017-11-23 20:46:56 +00:00
Alexander Alekhin
e34b64c979 dnn(ocl4dnn): refactor pooling OpenCL calls 2017-11-23 20:46:44 +00:00
Alexander Alekhin
f071a48ec7 Merge pull request #10143 from pengli:ocl4dnn 2017-11-23 18:47:14 +00:00