Commit Graph

188 Commits

Author SHA1 Message Date
Alexander Alekhin
7fa7fa0226 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-11-21 08:33:39 +00:00
Dmitry Kurtaev
0d117312c9 DNN_TARGET_FPGA using Intel's Inference Engine 2018-11-19 11:41:43 +03:00
Alexander Alekhin
22dbcf98c5 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-11-17 14:17:35 +00:00
Alexander Alekhin
dd3398416b experimental version++ 2018-11-17 10:22:17 +00:00
Alexander Alekhin
96c71dd3d2 dnn: reduce set of ignored warnings 2018-11-15 13:15:59 +03:00
WuZhiwen
6e3ea8b49d Merge pull request #12703 from wzw-intel:vkcom
* dnn: Add a Vulkan based backend

This commit adds a new backend "DNN_BACKEND_VKCOM" and a
new target "DNN_TARGET_VULKAN". VKCOM means vulkan based
computation library.

This backend uses Vulkan API and SPIR-V shaders to do
the inference computation for layers. The layer types
that implemented in DNN_BACKEND_VKCOM include:
Conv, Concat, ReLU, LRN, PriorBox, Softmax, MaxPooling,
AvePooling, Permute

This is just a beginning work for Vulkan in OpenCV DNN,
more layer types will be supported and performance
tuning is on the way.

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

* dnn/vulkan: Add FindVulkan.cmake to detect Vulkan SDK

In order to build dnn with Vulkan support, need installing
Vulkan SDK and setting environment variable "VULKAN_SDK" and
add "-DWITH_VULKAN=ON" to cmake command.

You can download Vulkan SDK from:
https://vulkan.lunarg.com/sdk/home#linux

For how to install, see
https://vulkan.lunarg.com/doc/sdk/latest/linux/getting_started.html
https://vulkan.lunarg.com/doc/sdk/latest/windows/getting_started.html
https://vulkan.lunarg.com/doc/sdk/latest/mac/getting_started.html
respectively for linux, windows and mac.

To run the vulkan backend, also need installing mesa driver.
On Ubuntu, use this command 'sudo apt-get install mesa-vulkan-drivers'

To test, use command '$BUILD_DIR/bin/opencv_test_dnn --gtest_filter=*VkCom*'

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

* dnn/Vulkan: dynamically load Vulkan runtime

No compile-time dependency on Vulkan library.
If Vulkan runtime is unavailable, fallback to CPU path.

Use environment "OPENCL_VULKAN_RUNTIME" to specify path to your
own vulkan runtime library.

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

* dnn/Vulkan: Add a python script to compile GLSL shaders to SPIR-V shaders

The SPIR-V shaders are in format of text-based 32-bit hexadecimal
numbers, and inserted into .cpp files as unsigned int32 array.

* dnn/Vulkan: Put Vulkan headers into 3rdparty directory and some other fixes

Vulkan header files are copied from
https://github.com/KhronosGroup/Vulkan-Docs/tree/master/include/vulkan
to 3rdparty/include

Fix the Copyright declaration issue.

Refine OpenCVDetectVulkan.cmake

* dnn/Vulkan: Add vulkan backend tests into existing ones.

Also fixed some test failures.

- Don't use bool variable as uniform for shader
- Fix dispathed group number beyond max issue
- Bypass "group > 1" convolution. This should be support in future.

* dnn/Vulkan: Fix multiple initialization in one thread.
2018-10-29 17:51:26 +03:00
Alexander Alekhin
a8b0db4e5d Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-28 14:14:47 +03:00
Dmitry Kurtaev
f8398d80bc add Net::getUnconnectedOutLayersNames method 2018-09-25 18:10:45 +03:00
Alexander Alekhin
861415133e Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-19 10:58:43 +03:00
Dmitry Kurtaev
8ac7b21716 Enable Myriad device for OpenVINO models test 2018-09-18 13:49:24 +03:00
Alexander Alekhin
e6171d17f8 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-18 12:49:52 +03:00
Alexander Alekhin
808ba552c5 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-14 23:44:35 +00:00
Dmitry Kurtaev
58ac3e09da Change default value of crop argument of blobFromImage from true to false 2018-09-12 19:02:58 +03:00
Marat K
38f8fc6c82 Merge pull request #12249 from kopytjuk:feature/region-layer-batch-mode
Feature/region layer batch mode (#12249)

* Add batch mode for Darknet networks.

Swap variables in test_darknet.

Adapt reorg layer to batch mode.

Adapt region layer.

Add OpenCL implementation.

Remove trailing whitespace.

Bugifx reorg opencl implementation.

Fix bug in OpenCL reorg.

Fix modulo bug.

Fix bug.

Reorg openCL.

Restore reorg layer opencl code.

OpenCl fix.

Work on openCL reorg.

Remove whitespace.

Fix openCL region layer implementation.

Fix bug.

Fix softmax region opencl bug.

Fix opencl bug.

Fix openCL bug.

Update aff_trans.cpp

When the fullAffine parameter is set to false, the estimateRigidTransform function maybe return empty, then the _localAffineEstimate function will be called, but the bug in it will result in incorrect results.

core(libva): support YV12 too

Added to CPU path only.
OpenCL code path still expects NV12 only (according to Intel OpenCL extension)

cmake: allow to specify own libva paths

via CMake:
- `-DVA_LIBRARIES=/opt/intel/mediasdk/lib64/libva.so.2\;/opt/intel/mediasdk/lib64/libva-drm.so.2`

android: NDK17 support

tested with NDK 17b (17.1.4828580)

Enable more deep learning tests using Intel's Inference Engine backend

ts: don't pass NULL for std::string() constructor

openvino: use 2018R3 defines

experimental version++

OpenCV version++

OpenCV 3.4.3

OpenCV version '-openvino'

openvino: use 2018R3 defines

Fixed windows build with InferenceEngine

dnn: fix variance setting bug for PriorBoxLayer

- The size of second channel should be size[2] of output tensor,
- The Scalar should be {variance[0], variance[0], variance[0], variance[0]}
  for _variance.size() == 1 case.

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

Fix lifetime of networks which are loaded from Model Optimizer IRs

Adds a small note describing BUILD_opencv_world (#12332)

* Added a mall note describing BUILD_opencv_world cmake option to the Installation in Windows tutorial.

* Made slight changes in BUILD_opencv_world documentation.

* Update windows_install.markdown

improved grammar

Update opengl_interop.cpp

resolves #12307

java: fix LIST_GET macro

fix typo

Added option to fail on missing testdata

Fixed that object_detection.py does not work in python3.

cleanup: IPP Async (IPP_A)

except header file with conversion routines (will be removed in OpenCV 4.0)

imgcodecs: add null pointer check

Include preprocessing nodes to object detection TensorFlow networks (#12211)

* Include preprocessing nodes to object detection TensorFlow networks

* Enable more fusion

* faster_rcnn_resnet50_coco_2018_01_28 test

countNonZero function reworked to use wide universal intrinsics instead of SSE2 intrinsics

resolve #5788

imgcodecs(webp): multiple fixes

- don't reallocate passed 'img' (test fixed - must use IMREAD_UNCHANGED / IMREAD_ANYCOLOR)
- avoid memory DDOS
- avoid reading of whole file during header processing
- avoid data access after allocated buffer during header processing (missing checks)
- use WebPFree() to free allocated buffers (libwebp >= 0.5.0)
- drop unused & undefined `.close()` method
- added checks for channels >= 5 in encoder

ml: fix adjusting K in KNearest (#12358)

dnn(perf): fix and merge Convolution tests

- OpenCL tests didn't run any OpenCL kernels
- use real configuration from existed models (the first 100 cases)
- batch size = 1

dnn(test): use dnnBackendsAndTargets() param generator

Bit-exact resize reworked to use wide intrinsics (#12038)

* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize

Bit-exact GaussianBlur reworked to use wide intrinsics (#12073)

* Bit-exact GaussianBlur reworked to use wide intrinsics

* Added v_mul_hi universal intrinsic

* Removed custom SSE2 branch from bit-exact GaussianBlur

* Removed loop unrolling for gaussianBlur horizontal smoothing

doc: fix English gramma in tutorial out-of-focus-deblur filter (#12214)

* doc: fix English gramma in tutorial out-of-focus-deblur filter

* Update out_of_focus_deblur_filter.markdown

slightly modified one sentence

doc: add new tutorial motion deblur filter (#12215)

* doc: add new tutorial motion deblur filter

* Update motion_deblur_filter.markdown

a few minor changes

Replace Slice layer to Crop in Faster-RCNN networks from Caffe

js: use generated list of OpenCV headers

- replaces hand-written list

imgcodecs(webp): use safe cast to size_t on Win32

* Put Version status back to -dev.

follow the common codestyle

Exclude some target engines.

Refactor formulas.

Refactor code.

* Remove unused variable.

* Remove inference engine check for yolov2.

* Alter darknet batch tests to test with two different images.

* Add yolov3 second image GT.

* Fix bug.

* Fix bug.

* Add second test.

* Remove comment.

* Add NMS on network level.

* Add helper files to dev.

* syntax fix.

* Fix OD sample.

Fix sample dnn object detection.

Fix NMS boxes bug.

remove trailing whitespace.

Remove debug function.

Change thresholds for opencl tests.

* Adapt score diff and iou diff.

* Alter iouDiffs.

* Add debug messages.

* Adapt iouDiff.

* Fix tests
2018-09-12 13:29:43 +03:00
Lubov Batanina
0c8590027f Merge pull request #12071 from l-bat/l-bat:onnx_parser
* Add Squeezenet support in ONNX

* Add AlexNet support in ONNX

* Add Googlenet support in ONNX

* Add CaffeNet and RCNN support in ONNX

* Add VGG16 and VGG16 with batch normalization support in ONNX

* Add RCNN, ZFNet, ResNet18v1 and ResNet50v1 support in ONNX

* Add ResNet101_DUC_HDC

* Add Tiny Yolov2

* Add CNN_MNIST, MobileNetv2 and LResNet100 support in ONNX

* Add ONNX models for emotion recognition

* Add DenseNet121 support in ONNX

* Add Inception v1 support in ONNX

* Refactoring

* Fix tests

* Fix tests

* Skip unstable test

* Modify Reshape operation
2018-09-10 21:07:51 +03:00
Alexander Alekhin
dca657a2fd Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-10 00:10:21 +03:00
Hamdi Sahloul
a39e0daacf Utilize CV_UNUSED macro 2018-09-07 20:33:52 +09:00
Alexander Alekhin
73bfe68821 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-07 12:40:27 +03:00
Dmitry Kurtaev
d486204a0d Merge pull request #12264 from dkurt:dnn_remove_forward_method
* Remove a forward method in dnn::Layer

* Add a test

* Fix tests

* Mark multiple dnn::Layer::finalize methods as deprecated

* Replace back dnn's inputBlobs to vector of pointers

* Remove Layer::forward_fallback from CV_OCL_RUN scopes
2018-09-06 13:26:47 +03:00
Alexander Alekhin
d74b98c3d9 Merge remote-tracking branch 'upstream/3.4' into merge-3.4 2018-09-04 18:39:03 +00:00
Alexander Alekhin
f10fd64630 dnn: update "guard" inline namespace
- differ from 3.4 branch
2018-09-03 20:46:57 +00:00
Dmitry Kurtaev
c7cf8fb35c Import SSDs from TensorFlow by training config (#12188)
* Remove TensorFlow and protobuf dependencies from object detection scripts

* Create text graphs for TensorFlow object detection networks from sample
2018-09-03 17:08:40 +03:00
Alexander Alekhin
b38c50b3d0 OpenCV 3.4.3 2018-08-28 15:58:21 +03:00
berak
21f3987d53 python: add support for NMSBoxes 2018-08-25 08:44:45 +02:00
Alexander Alekhin
781721ca50 experimental version++ 2018-08-14 14:10:37 +03:00
Dmitry Kurtaev
faa6c4e1e1 Faster-RCNN anf RFCN models on CPU using Intel's Inference Engine backend.
Enable Torch layers tests with Intel's Inference Engine backend.
2018-07-25 19:04:55 +03:00
Dmitry Kurtaev
070393dfda uint8 inputs for deep learning networks 2018-07-19 14:37:33 +03:00
Dmitry Kurtaev
8b5f061dae Replace std::vector<char> to std::vector<uchar> for Java bindings of dnn importers 2018-07-11 18:58:56 +03:00
Dmitry Kurtaev
d57e5406f0 Add readNet* functions which parse models from byte arrays 2018-07-10 11:12:01 +03:00
asciian
61d8719b8d Reading net from std::ifstream
Remove some assertions

Replace std::ifstream to std::istream

Add test for new importer

Remove constructor to load file

Rename cfgStream and darknetModelStream to ifile

Add error notification to inform pathname to user

Use FileStorage instead of std::istream

Use FileNode instead of FileStorage

Fix typo
2018-07-09 10:02:05 +03:00
Alexander Alekhin
ab11b17d4b experimental version++ 2018-06-10 10:20:38 +03:00
Vadim Pisarevsky
3cbd2e2764 Merge pull request #11650 from dkurt:dnn_default_backend 2018-06-06 09:30:39 +00:00
Dmitry Kurtaev
b781ac7346 Make Intel's Inference Engine backend is default if no preferable backend is specified. 2018-06-04 18:31:46 +03:00
Kuang Fangjun
9ae28415ec fix doc. 2018-06-03 17:44:24 +08:00
Dmitry Kurtaev
f96f934426 Update Intel's Inference Engine deep learning backend (#11587)
* Update Intel's Inference Engine deep learning backend

* Remove cpu_extension dependency

* Update Darknet accuracy tests
2018-05-31 14:05:21 +03:00
Dmitry Kurtaev
8488f2e265 EAST: An Efficient and Accurate Scene Text Detector (https://arxiv.org/abs/1704.03155v2) 2018-05-11 14:55:42 +03:00
Dmitry Kurtaev
709cf5d038 OpenCL GPU target for Inference Engine deep learning backend
Enable FP16 GPU target for DL Inference Engine backend.
2018-04-09 17:21:35 +03:00
Dmitry Kurtaev
7972f47ed4 Load networks from intermediate representation of Intel's Deep learning deployment toolkit. 2018-03-26 07:24:21 +03:00
Dmitry Kurtaev
538fd42363 Add test for Scalar arguments at CommandLineParser 2018-03-13 11:01:07 +03:00
Dmitry Kurtaev
f2440ceae6 Update tutorials. A new cv::dnn::readNet function 2018-03-04 20:30:22 +03:00
Dmitry Kurtaev
e8d94ea87c Unite deep learning object detection samples 2018-03-03 14:47:13 +03:00
Alexander Alekhin
4a74408eee experimental version++ 2018-02-23 11:38:33 +03:00
Dmitry Kurtaev
514e6df460 Refactored deep learning layers fusion 2018-02-13 14:35:58 +03:00
luz.paz
5718d09e39 Misc. modules/ typos
Found via `codespell`
2018-02-12 07:09:43 -05:00
Rémi Ratajczak
b67523550f dnn : Added an imagesFromBlob method to the dnn module (#10607)
* Added the imagesFromBlob method to the dnn module.

* Rewritten imagesFromBlob based on first dkurt comments

* Updated code with getPlane()

* Modify comment of imagesFromBlob() in dnn module

* modified comments, removed useless assertions & added OutputArrayOfArray

* replaced tabs with whitespaces & put vectorOfChannels instantiation outside the loop

* Changed pre-commit.sample to pre-commit in .git/hooks/

* Added a test for imagesFromBlob in test_misc.cpp (dnn)

* Changed nbOfImages, robustified test with cv::randu, modified assertion
2018-02-12 14:51:07 +03:00
Dmitry Kurtaev
10e1de74d2 Intel Inference Engine deep learning backend (#10608)
* Intel Inference Engine deep learning backend.

* OpenFace network using Inference Engine backend
2018-02-06 11:57:35 +03:00
Dmitry Kurtaev
6a395d88ff dnn::blobFromImage with OutputArray 2018-01-13 18:20:24 +03:00
Vadim Pisarevsky
eecb64a973 Merge pull request #10331 from arrybn:python_dnn_net 2017-12-20 14:30:27 +00:00
Dmitry Kurtaev
6aabd6cc7a Remove cv::dnn::Importer 2017-12-18 18:08:28 +03:00
Alexander Rybnikov
19c914db51 Changed wrapping mode for cv::dnn::Net::forward 2017-12-18 15:56:09 +03:00
Alexander Alekhin
3fddce67c6 experimental version++ 2017-12-16 01:30:36 +03:00
Dmitry Kurtaev
f503515082 JavaScript bindings for dnn module 2017-12-08 18:33:48 +03:00
Alexander Alekhin
f071a48ec7 Merge pull request #10143 from pengli:ocl4dnn 2017-11-23 18:47:14 +00:00
Li Peng
636d6368ee use OutputArrayOfArrays in net forward interface
It allows umat buffers used in net forward interface

Signed-off-by: Li Peng <peng.li@intel.com>
2017-11-24 02:19:10 +08:00
Alexander Alekhin
f37f4cf3b4 Merge pull request #9994 from r2d3:dnn_memory_load 2017-11-22 18:15:00 +00:00
David Geldreich
f723cede2e add loading TensorFlow/Caffe net from memory buffer
add a corresponding test
2017-11-20 16:28:22 +01:00
Li Peng
8f99083726 Add new layer forward interface
Add layer forward interface with InputArrayOfArrays and
OutputArrayOfArrays parameters, it allows UMat buffer to be
processed and transferred in the layers.

Signed-off-by: Li Peng <peng.li@intel.com>
2017-11-09 15:59:39 +08:00
Dmitry Kurtaev
e1ebc4e991 Specify layer types for Caffe FP32->FP16 weights converter 2017-10-31 12:31:40 +03:00
Vadim Pisarevsky
bc93775385 Merge pull request #9862 from sovrasov:dnn_nms 2017-10-27 11:19:57 +00:00
Vladislav Sovrasov
5bf39ceb5d dnn: handle 4-channel images in blobFromImage (#9944) 2017-10-27 14:06:53 +03:00
Vladislav Sovrasov
c704942b8a dnn: add a documentation for NMS, fix missing experimantal namespace 2017-10-25 13:35:49 +03:00
Vladislav Sovrasov
acedb4a579 dnn: make NMS function public 2017-10-25 13:35:49 +03:00
Alexander Alekhin
a871f9e4f7 Merge branch 'update_version' into release 2017-10-23 18:41:12 +03:00
Vladislav Sovrasov
47e1133e71 dnn: add crop flag to blobFromImage 2017-10-11 15:46:20 +03:00
Vadim Pisarevsky
b7ff9ddcdd Merge pull request #9705 from AlexeyAB:dnn_darknet_yolo_v2 2017-10-10 12:02:03 +00:00
Alexander Alekhin
949ec486c5 experimental version++ 2017-10-10 12:29:57 +03:00
AlexeyAB
ecc34dc521 Added DNN Darknet Yolo v2 for object detection 2017-10-09 21:08:44 +03:00
Dmitry Kurtaev
e4aa39f9e5 Text TensorFlow graphs parsing. MobileNet-SSD for 90 classes. 2017-10-08 22:25:29 +03:00
pengli
e340ff9c3a Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module  (#9114)

* import libdnn code

Signed-off-by: Li Peng <peng.li@intel.com>

* add convolution layer ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* add pooling layer ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* add softmax layer ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* add lrn layer ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* add innerproduct layer ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* add HAVE_OPENCL macro

Signed-off-by: Li Peng <peng.li@intel.com>

* fix for convolution ocl

Signed-off-by: Li Peng <peng.li@intel.com>

* enable getUMat() for multi-dimension Mat

Signed-off-by: Li Peng <peng.li@intel.com>

* use getUMat for ocl acceleration

Signed-off-by: Li Peng <peng.li@intel.com>

* use CV_OCL_RUN macro

Signed-off-by: Li Peng <peng.li@intel.com>

* set OPENCL target when it is available

and disable fuseLayer for OCL target for the time being

Signed-off-by: Li Peng <peng.li@intel.com>

* fix innerproduct accuracy test

Signed-off-by: Li Peng <peng.li@intel.com>

* remove trailing space

Signed-off-by: Li Peng <peng.li@intel.com>

* Fixed tensorflow demo bug.

Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.

libdnn don't calculate output dimension anymore and just use one
passed in by config.

* split gemm ocl file

split it into gemm_buffer.cl and gemm_image.cl

Signed-off-by: Li Peng <peng.li@intel.com>

* Fix compile failure

Signed-off-by: Li Peng <peng.li@intel.com>

* check env flag for auto tuning

Signed-off-by: Li Peng <peng.li@intel.com>

* switch to new ocl kernels for softmax layer

Signed-off-by: Li Peng <peng.li@intel.com>

* update softmax layer

on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.

Signed-off-by: Li Peng <peng.li@intel.com>

* fallback to cpu path for fc layer with multi output

Signed-off-by: Li Peng <peng.li@intel.com>

* update output message

Signed-off-by: Li Peng <peng.li@intel.com>

* update fully connected layer

fallback to gemm API if libdnn return false

Signed-off-by: Li Peng <peng.li@intel.com>

* Add ReLU OCL implementation

* disable layer fusion for now

Signed-off-by: Li Peng <peng.li@intel.com>

* Add OCL implementation for concat layer

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

* libdnn: update license and copyrights

Also refine libdnn coding style

Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>

* DNN: Don't link OpenCL library explicitly

* DNN: Make default preferableTarget to DNN_TARGET_CPU

User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.

Also don't fusion when using DNN_TARGET_OPENCL

* DNN: refine coding style

* Add getOpenCLErrorString

* DNN: Use int32_t/uint32_t instread of alias

* Use namespace ocl4dnn to include libdnn things

* remove extra copyTo in softmax ocl path

Signed-off-by: Li Peng <peng.li@intel.com>

* update ReLU layer ocl path

Signed-off-by: Li Peng <peng.li@intel.com>

* Add prefer target property for layer class

It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.

Signed-off-by: Li Peng <peng.li@intel.com>

* Add cl_event based timer for cv::ocl

* Rename libdnn to ocl4dnn

Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>

* use UMat for ocl4dnn internal buffer

Remove allocateMemory which use clCreateBuffer directly

Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>

* enable buffer gemm in ocl4dnn innerproduct

Signed-off-by: Li Peng <peng.li@intel.com>

* replace int_tp globally for ocl4dnn kernels.

Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>

* create UMat for layer params

Signed-off-by: Li Peng <peng.li@intel.com>

* update sign ocl kernel

Signed-off-by: Li Peng <peng.li@intel.com>

* update image based gemm of inner product layer

Signed-off-by: Li Peng <peng.li@intel.com>

* remove buffer gemm of inner product layer

call cv::gemm API instead

Signed-off-by: Li Peng <peng.li@intel.com>

* change ocl4dnn forward parameter to UMat

Signed-off-by: Li Peng <peng.li@intel.com>

* Refine auto-tuning mechanism.

- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
  for fine-tuned kernel configuration.
  e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
  the cache directory will be /home/tmp/spatialkernels/ on Linux.

- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
  auto-tuning.

- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
  for OpenCL command queue. This fix basic kernel get wrong running
  time, i.e. 0ms.

- If creating cache directory failed, disable auto-tuning.

* Detect and create cache dir on windows

Signed-off-by: Li Peng <peng.li@intel.com>

* Refine gemm like convolution kernel.

Signed-off-by: Li Peng <peng.li@intel.com>

* Fix redundant swizzleWeights calling when use cached kernel config.

* Fix "out of resource" bug when auto-tuning too many kernels.

* replace cl_mem with UMat in ocl4dnnConvSpatial class

* OCL4DNN: reduce the tuning kernel candidate.

This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>

* replace cl_mem with umat in ocl4dnn convolution

Signed-off-by: Li Peng <peng.li@intel.com>

* remove weight_image_ of ocl4dnn inner product

Actually it is unused in the computation

Signed-off-by: Li Peng <peng.li@intel.com>

* Various fixes for ocl4dnn

1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device

Signed-off-by: Li Peng <peng.li@intel.com>

* add build option for log softmax

Signed-off-by: Li Peng <peng.li@intel.com>

* remove unused ocl kernels in ocl4dnn

Signed-off-by: Li Peng <peng.li@intel.com>

* replace ocl4dnnSet with opencv setTo

Signed-off-by: Li Peng <peng.li@intel.com>

* replace ALIGN with cv::alignSize

Signed-off-by: Li Peng <peng.li@intel.com>

* check kernel build options

Signed-off-by: Li Peng <peng.li@intel.com>

* Handle program compilation fail properly.

* Use std::numeric_limits<float>::infinity() for large float number

* check ocl4dnn kernel compilation result

Signed-off-by: Li Peng <peng.li@intel.com>

* remove unused ctx_id

Signed-off-by: Li Peng <peng.li@intel.com>

* change clEnqueueNDRangeKernel to kernel.run()

Signed-off-by: Li Peng <peng.li@intel.com>

* change cl_mem to UMat in image based gemm

Signed-off-by: Li Peng <peng.li@intel.com>

* check intel subgroup support for lrn and pooling layer

Signed-off-by: Li Peng <peng.li@intel.com>

* Fix convolution bug if group is greater than 1

Signed-off-by: Li Peng <peng.li@intel.com>

* Set default layer preferableTarget to be DNN_TARGET_CPU

Signed-off-by: Li Peng <peng.li@intel.com>

* Add ocl perf test for convolution

Signed-off-by: Li Peng <peng.li@intel.com>

* Add more ocl accuracy test

Signed-off-by: Li Peng <peng.li@intel.com>

* replace cl_image with ocl::Image2D

Signed-off-by: Li Peng <peng.li@intel.com>

* Fix build failure in elementwise layer

Signed-off-by: Li Peng <peng.li@intel.com>

* use getUMat() to get blob data

Signed-off-by: Li Peng <peng.li@intel.com>

* replace cl_mem handle with ocl::KernelArg

Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(build): don't use C++11, OPENCL_LIBRARIES fix

* dnn(ocl4dnn): remove unused OpenCL kernels

* dnn(ocl4dnn): extract OpenCL code into .cl files

* dnn(ocl4dnn): refine auto-tuning

Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.

Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50

If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.

* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups

* dnn(ocl4dnn): fix perf test

OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.

* use ocl::KernelArg as much as possible

Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): fix bias bug for gemm like kernel

* dnn(ocl4dnn): wrap cl_mem into UMat

Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): Refine signature of kernel config

- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
  24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.

* dnn(ocl4dnn): swap width/height in configuration

* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only

* core: make configuration helper functions accessible from non-core modules

* dnn(ocl4dnn): update kernel auto-tuning behavior

Avoid unwanted creation of directories

* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash

* dnn(ocl4dnn): remove redundant code

* dnn(ocl4dnn): Add more clear message for simd size dismatch.

* dnn(ocl4dnn): add const to const argument

Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel

* dnn(ocl4dnn): drop unused tuneLocalSize()

* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method

* dnn(ocl4dnn): sanitize file names used for cache

* dnn(perf): enable Network tests with OpenCL

* dnn(ocl4dnn/conv): drop computeGlobalSize()

* dnn(ocl4dnn/conv): drop unused fields

* dnn(ocl4dnn/conv): simplify ctor

* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL

* dnn(ocl4dnn/conv): drop unsupported double / untested half types

* dnn(ocl4dnn/conv): drop unused variable

* dnn(ocl4dnn/conv): alignSize/divUp

* dnn(ocl4dnn/conv): use enum values

* dnn(ocl4dnn): drop unused innerproduct variable

Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): add an generic function to check cl option support

* dnn(ocl4dnn): run softmax subgroup version kernel first

Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 15:38:00 +03:00
Vadim Pisarevsky
f7df5dd32c Merge pull request #9305 from dkurt:public_dnn_importer_is_deprecated 2017-09-18 09:35:35 +00:00
Vadim Pisarevsky
3358b8910b Merge pull request #9591 from dkurt:feature_dnn_caffe_importer_fp16 2017-09-18 09:26:23 +00:00
Dmitry Kurtaev
bd8e6b7e14 Make external cv::dnn::Importer usage is deprecated 2017-09-18 08:52:36 +03:00
Vadim Pisarevsky
4196543cd5 Merge pull request #9313 from dkurt:dnn_perf_test 2017-09-16 19:39:23 +00:00
Dmitry Kurtaev
8646d5fb84 FP16 Caffe models import and export 2017-09-15 18:06:34 +03:00
Vadim Pisarevsky
93c3f20deb Merge pull request #9569 from dkurt:test_dnn_ssd_halide 2017-09-13 13:29:50 +00:00
Dmitry Kurtaev
cad7c4d51d MobileNet-SSD and VGG-SSD topologies in Halide 2017-09-08 09:55:53 +03:00
Alexander Alekhin
01519313d7 dnn: invalid bindings 2017-08-31 19:35:48 +03:00
Dmitry Kurtaev
5c43a394c5 Added performance test for Caffe framework 2017-08-27 19:40:58 +03:00
Aleksandr Rybnikov
8b1146deb2 Added function to get timings for layers 2017-08-23 13:40:05 +03:00
Vadim Pisarevsky
0488d9bdb2 optimize out scaleLayer & concatLayer whenever possible
fixed problem in concat layer by disabling memory re-use in layers with multiple inputs

trying to fix the tests when Halide is used to run deep nets

another attempt to fix Halide tests

see if the Halide tests will pass with concat layer fusion turned off

trying to fix failures in halide tests; another try

one more experiment to make halide_concat & halide_enet tests pass

continue attempts to fix halide tests

moving on

uncomment parallel concat layer

seemingly fixed failures in Halide tests and re-enabled concat layer fusion; thanks to dkurt for the patch
2017-07-14 18:30:53 +03:00
Alexander Alekhin
544908d06c dnn: some minor fixes in docs, indentation, unused code 2017-07-13 15:33:49 +03:00
abratchik
8f7181429f add java wrappers to dnn module 2017-07-02 11:46:20 +04:00
Alexander Alekhin
da0960321b dnn: added "hidden" experimental namespace
Main purpose of this namespace is to avoid using of incompatible
binaries that will cause applications crashes.

This additional namespace will not impact "Source code API".
This change allows to maintain ABI checks (with easy filtering out).
2017-06-28 20:36:57 +00:00
Vadim Pisarevsky
2ae849091c Merge pull request #9009 from alalek:fix_dnn_initialization 2017-06-28 08:26:29 +00:00
Vadim Pisarevsky
8b3d6603d5 another round of dnn optimization (#9011)
* another round of dnn optimization:
* increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
* improved SIMD optimization of pooling layer, optimized average pooling
* cleaned up convolution layer implementation
* made activation layer "attacheable" to all other layers, including fully connected and addition layer.
* fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
* greatly optimized permutation layer, which improved SSD performance
* parallelized element-wise binary/ternary/... ops (sum, prod, max)

* also, added missing copyrights to many of the layer implementation files

* temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders
2017-06-28 11:15:22 +03:00
Alexander Alekhin
00dd433368 dnn: fix LayerFactory initialization 2017-06-27 23:19:53 +03:00
Alexander Alekhin
7f12836df9 dnn: fix public headers guards 2017-06-26 14:21:33 +03:00
Alexander Alekhin
93729784bb dnn: move module from opencv_contrib
e6f63c7a38/modules/dnn
2017-06-26 13:41:51 +03:00