add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
GSoC 2017: Improve and Extend the JavaScript Bindings for OpenCV (#9466)
* Initial support for build with emscripten
mkdir build_js
cd build_js
cmake -D CMAKE_TOOLCHAIN_FILE=/path/to/emsdk/emsdk-portable/emscripten/master/cmake/Modules/Platform/Emscripten.cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
* Add js module
The output is build/bin/opencv_js.js
* Fix opencv2/calib3d.hpp not found issue
* Add module name
Usage:
var cv = cv();
* Add total memory as 128MB and allow growth
* Add compilation flags for emscripten
* Use EMSCRIPTEN build target
* Disable js module for non emscripten build
* Bind the preload file path to root
Usage:
face_cascade.load('haarcascade_frontalface_default.xml');
* add test folder
* fix test files
* Copy js module test to bin
* Support to run tests on Node.js
Fix tests to import cv Module when runtime is node.
Add tests.js to use qunit to auto run tests.
Modify umd wrapper to support Module is not defined.
Usage:
node tests.js
* Support UMD and file system
Wrap the opencv_js.js to opencv.js by UMD wrapper
Use emscripten file system API to load files instead of generating data file or
embedding them. It supports both browser and node.js usages.
* Fix incorrect module name in tests
* Add package.json to add dependence of qunit
* Add js_tutorials folder and a intro page of opencv.js
Enable BUILD_DOCS in CMakeLists.txt.
Add new folder of js_tutorials in folder opencv/doc.
Imitate the tutorials of OpenCV-Python to create a intro page of opencv.js and a setup guide
* Import and use binding gen from opencvjs project
* Modify the embindgen.py to pass the build and test
* Add classes and functions white list
* Consolidate hdr_parser.py (#31)
Use hdr_parser.py of python module
Add js flag to support js binding generator.
* Use emscripten::vecFromJSArray for input vector param
Fix part of #23
* Fix test cases after #34Fix#39
* Expose groupRectangles and CascadeClassifier.empty
* Add js highgui tutorials
add tutorials of imread&imshow and createTrackbar in doc/js_tutorials/js_gui folder
add interactive tutorial webpage for imread&imshow and createTrackbar in doc/js_tutorials/js_interactive_tutorials folder, and some images needed.
change doc/CMakeLists.txt to copy the interactive tutorial webpage and opencv.js to the tutorials' destination folder
* rm useless annotation in doc/CMakeLists.txt
* fix some nonstandard indentation and space
* add check if canvas is valid
* Expose BackgroundSubtractorMOG2
Fix#43
* Fix build of js doc
Limit copy_js_interactive_tutorials for doxygen build
Add dep to opencv.js
Fix#53
* Implement cv.imread & cv.imshow and insert interactive pages in tutorials (#55)
* add helper.js
* delete ALL in add target copy_js_interactive_tutorials to avoid dependence error
* Insert interactive pages in tutorials
insert the old interactive pages in markdown by using \htmlonly and \endhtmlonly command.
delete the useless interactive page
rename js_interactive_tutorials to js_assets to put some images needed in
* fix the depends of the target doxygen
add opencv.js to depends and delete the useless target of copy_js_assets
* change filename helper.js to helpers.js
* disable button or trankbar before opencv.js is ready
* Expose CV_64F
Fix#65
* improve cv.imshow to display different types as native imshow
* add utils.js to reuse functions and update tutorials
* Make doxygen depend on bin/opencv.js
* Fix memory issue of matFromArray
Fix#37
* Merge pull request from ganwenyao/tutorial_18
* Add notes for ganwenyao/tutorial_18
* Modifying for ganwenyao/tutorial_18
* Change Mat constructor with data to 5 parameters
* Mat supports constructor with Scalar
Fix#60
* update cv.imread cause the memory issue of matFromArray has been fixed
* fix canvas name and default input image
* Expose cv::Moments
Fix#85
* Add -Wno-missing-prototypes for emscripten build
* fix canvas name
* add tutorial of video input and output
* Expose enums as emscripten consts
Fix#72
* update the tutorial to use Mat constructor with Scalar and change lena.jpg
* Exclude cv::Mat for vecFromJSArray
Fix#82
* Add unit tests for cv.moments
* Fix the unit tests.
* add checkbox and stop button
* add adapter.js to make sure compatibility fo video tutorials
* Support default parameters with function overloading
* modify enums to constants
* Use https URL for MathJax.js
Fix#109
* Comment out the debug print in embindgen.py
* Expose RotatedRect
Fix#96
* replace enum with constants and improve onload function
* delete some useless paras cause #105 fixed this
* Modify const name
* Modify Contour Properties
* tutorials for imgprc2 and objdec
* Expose more functions for img proc tutorials
Fix#76
* Expose polylines for video analysis tutorial
Fix#121
* Expose constants for default parameters of img proc tutorials
Fix#122
* Fix wrong parameter types of Mat.copyTo
Fix#87
* Support default parameters of mat.convertTo
Fix#123
* Support default parameters for external constructors
Fix#131
* Revert "Expose polylines for video analysis tutorial"
This reverts commit 3ce7615652e510d30e3c0014706ac38c98883189.
Fix#121
* Support cv.minMaxLoc
Fix#127
* Expose cv.minEnclosingCircle
Fix#126
* Add video analysis tutorials
add three video tutorials, Meanshift and Camshift, Optical Flow Background Subtraction
add cup.mp4 and box.mp4 for demo in tutorials
* improve image processing tutorials
* repalce console.warn with throw to throw exception
* add try-catch to throw exception in code demo
* Change mat.size() return value to JS Array object
Fix#140
* add a note about different channels order between canvas and native opencv
* add a note about how to capture video from video files
* Binding cv.Scalar to JS array
Fix#147
* Add JS cv.Scalar object into helpers.js
* Update Install OpenCV-JavaScript tutorial page
Fix#44
* Update the OpenCV-JavaScript introduction page
Fix#44
* add cv.VideoCapture and read() function
* set the size of the hidden canvas same as the video
* Add Using OpenCV-JavaScript tutorial page
Fix#44
* fix some bad code style
* Update tutorials after 8/2 sync meeting
Changes include:
- Use OpenCV.js name instead of OpenCV-JavaScript
- Put using OpenCV.js ahead of build OpenCV.js
- Refine usage and introduction page
- Muted the video in tutorials
* Fix a typo in introduction page
* use cv.VideoCapture and its read() function to read video
* replace OpenCV-JavaScript with OpenCV.js
* Use onload of async script in js_usage tutorial
* add more info about mat.data
* Change Size to value_object
* Integrate Moh and Sajjad's editing into introduction page
* Change Point to value_object
* Change Rect to value_object with helper object
* Add helper objects for Point and Size
* Change RotatedRect to value_object with helpers
* Change MinMaxLoc and Circle to value_object
* Change TermCriteria to value_object
* Fix core_bindings.cpp for MinMaxLoc and Circle
* Remove unused types
* Change meanShift and CamShift to return Rect
* Change methods of RotatedRect to static
* Change mat.data from methods to property
Fix#75 and #77
* support img id and element in cv.imread
* Change mat.size to property and add mat.step
Fix#163
* Add matFromArray and matFromImageData as JS helpers
Fix#79, #78
* Lower camel case for Mat element getters
Fix#81
* Mat.getRoiRect and tests
Fix#86
* Support type for Mat.ptr
Fix#83
* Name changing of Mat element getters
'getUcharAt` -> 'ucharAt'
* fix code style and args names
* Fix helpers.js due to cv.Mat API update
* Fix opencv.js usage tutorial
* Fix a typo of js_setup
* Change Moments to value_object
* Add Range as value_object
Fix#171
* Support Mat.diag and Mat.isContinous
Fix#84 and #89
* Support Mat.setTo
Fix#88
* Apply edits to js_intro
* Apply edits to js_usage
* Apply edits to js_setup
* update tutorials to apply data type change
* Modify tutorials
* add core tutorials
* delete MatVector elements and delete useless set operation
* add tutorials_objdec_camera
* Add instructions for WebAssembly
* apply tech writer's feedbacks into tutorials
* Organize white list by modules
* Change size to method and bind to MatExpr.size()
Fix#177
* improve tutorials
* Modify core tutorials
* add params list and explanations for OpenCV.js functions
* remove face_profile from Face Detection in Video Capture
* Add demos link
* Change Gui to GUI
* Update js_intro based on Moh and Sajjad's edits
* Fixup for 3.3.0 rebase
* Update js_intro per Moh's suggestion
* Update contributors list per Moh's idea
* add adapter.js in video_display tutorial
* Change Mat.getRoiRect to Mat.roi
Fix#194
* Remove unnecessary files for test
Fix#192
* Licenses updated to UC BSD 3-Clause
* Apply OpenCV coding style for C++ files
* Add OpenCV license for python and js files
* Fix coding style issue in helpers.js
* Remove unused test_commons.js
* Fix coding style of test_imgproc.js
* Fix coding style of test_mat.js
* Fix space before semicolon
* Fix coding style of test_objdetect.js
* Fix coding style of tests.js
* Fix coding style of test_utils.js
* Fix coding style of test_video.js
* Fix failures of node.js tests
* Add eslint rule config and fix eslint errors
* Add eslint config for js/src and fix eslint errors
* Clean up the opencv.js dependencies
Fix#186
* Fix build issue for python generator
* Fix doxygen buildbot failure
* delete trailing whitespace, blank line at EOF and replace tab with space
* Fix tutorial_js_root reference issue for non opencv.js build
* replace the file with small size
* Initial commit of build_js.py
* Move the js build configurations to build script
* Add wasm build support
* Update OpenCV.js build tutorial by using script
* Fix global var issue in tests
* Add a README.md for build_js.py
* Copy the haar cascade files from data dir for tutorials
* Not use memory init file
* Disable debug print for modules/js/CMakeLists.txt
* Check files when build done
* Fix image name in js_gradients tutorial
* Fix image load issue in js_trackbar tutorial
* Find the opencv source directory via relative path by default
* Make the cmake args based on build_doc option
* Fix a typo in js_setup.markdown
* Fix make failure issue on config generated by build_js.py
* Eliminate js branch of hdr_parser.py
* Extract examples from js_basic_ops tutorial
* Fix coding style of utils.js
* Improve examples error handling
Handle:
1. opencv.js loading errors
2. script errors (Error)
3. cv::Exception
Fix#217
* Add enable_exception option into build_js.py
* Support print exception for exception catching disabled build
* Extract example from js_usage tutorial
* Avoid copying .eslintrc.json when building doc
Fix#223
* Revert to use onload as opencv.js ready event
* Use 4 spaces indention for js examples
* embed html in tutorials with iframe tag
* Revert to use onload as opencv.js ready event
* Extract examples from js_video_display tutorial
* Implement Utils object
* modify core imgprc and face_detection tutorials
* Fix examples of js_gui tutorials
* Fix coding style of utils.js
* Modify tutorials
* Extract example from js_face_detection_camera tutorial
* Disable new-cap check in eslint
* Extract examples from js_meanshift tutorial
* Extract examples from video tutorials
* Remove new-cap declaration and update grammer in comments
* Change textarea width to 100 to align with eslint config
* Fix printError issue when opencv.js loading fails
* Remove BUILD_opencv_js dependency for doc build
Fix#213
* Expose cv::getBuildInformation
* Dump opencv build info when opencv.js loaded for live examples
* Make the button to stand out in js live examples
Fix#235
* Style for disabled button
* Add js_imgproc_camera.html example
* Fix coding style of imgproc_camera example
* Add js_imgproc_camera tutorial
* Remove link to opencv.js demos
* doc: copy opencv.js on build, use absolute paths for assets
* doc: reuse existed file box.mp4
Added forkfour Latex command to math js support.
Split cv::norm documentation between the cv::norm and its overload, to make things clearer
Corrected some typos and cleaned up grammar.
Result is clearer documentation for the norms.
Work pending...
This adds the possibility to use multi-channel masks for the functions
cv::mean, cv::meanStdDev and the method Mat::setTo. The tests have now a
probability to use multi-channel masks for operations that support them.
This also includes Mat::copyTo, which supported multi-channel masks
before, but there was no test confirming this.
CUDA implementation wants to convert std::vector<KeyPoint> <-> GpuMat.
There is no direct mapping from KeyPoint (mix of int/float fields)
into cv::Mat element type, so this conversion must be avoided.
Legacy mode is turned back for CUDA builds.