The same code was repeated several time for different data types, so
it was extracted as a templated function to improve maintability and
make a code more clear.
Exception may be rasied inside the body of a copying constructor after
refcount has been increased, and beacause in the case of the exception
destrcutor is never called what causes memory leak. This commit adds a
workaround that calls the release() function before the exception is
thrown outside the contructor.
Adds fitEllipseDirect to imgproc: The Direct least square (Direct) method by Fitzgibbon1999.
New Tests are included for the methods.
fitEllipseAMS Tests
fitEllipseDirect Tests
Comparative examples are added to fitEllipse.cpp in Samples.
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
GSoC 2017: Improve and Extend the JavaScript Bindings for OpenCV (#9466)
* Initial support for build with emscripten
mkdir build_js
cd build_js
cmake -D CMAKE_TOOLCHAIN_FILE=/path/to/emsdk/emsdk-portable/emscripten/master/cmake/Modules/Platform/Emscripten.cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
* Add js module
The output is build/bin/opencv_js.js
* Fix opencv2/calib3d.hpp not found issue
* Add module name
Usage:
var cv = cv();
* Add total memory as 128MB and allow growth
* Add compilation flags for emscripten
* Use EMSCRIPTEN build target
* Disable js module for non emscripten build
* Bind the preload file path to root
Usage:
face_cascade.load('haarcascade_frontalface_default.xml');
* add test folder
* fix test files
* Copy js module test to bin
* Support to run tests on Node.js
Fix tests to import cv Module when runtime is node.
Add tests.js to use qunit to auto run tests.
Modify umd wrapper to support Module is not defined.
Usage:
node tests.js
* Support UMD and file system
Wrap the opencv_js.js to opencv.js by UMD wrapper
Use emscripten file system API to load files instead of generating data file or
embedding them. It supports both browser and node.js usages.
* Fix incorrect module name in tests
* Add package.json to add dependence of qunit
* Add js_tutorials folder and a intro page of opencv.js
Enable BUILD_DOCS in CMakeLists.txt.
Add new folder of js_tutorials in folder opencv/doc.
Imitate the tutorials of OpenCV-Python to create a intro page of opencv.js and a setup guide
* Import and use binding gen from opencvjs project
* Modify the embindgen.py to pass the build and test
* Add classes and functions white list
* Consolidate hdr_parser.py (#31)
Use hdr_parser.py of python module
Add js flag to support js binding generator.
* Use emscripten::vecFromJSArray for input vector param
Fix part of #23
* Fix test cases after #34Fix#39
* Expose groupRectangles and CascadeClassifier.empty
* Add js highgui tutorials
add tutorials of imread&imshow and createTrackbar in doc/js_tutorials/js_gui folder
add interactive tutorial webpage for imread&imshow and createTrackbar in doc/js_tutorials/js_interactive_tutorials folder, and some images needed.
change doc/CMakeLists.txt to copy the interactive tutorial webpage and opencv.js to the tutorials' destination folder
* rm useless annotation in doc/CMakeLists.txt
* fix some nonstandard indentation and space
* add check if canvas is valid
* Expose BackgroundSubtractorMOG2
Fix#43
* Fix build of js doc
Limit copy_js_interactive_tutorials for doxygen build
Add dep to opencv.js
Fix#53
* Implement cv.imread & cv.imshow and insert interactive pages in tutorials (#55)
* add helper.js
* delete ALL in add target copy_js_interactive_tutorials to avoid dependence error
* Insert interactive pages in tutorials
insert the old interactive pages in markdown by using \htmlonly and \endhtmlonly command.
delete the useless interactive page
rename js_interactive_tutorials to js_assets to put some images needed in
* fix the depends of the target doxygen
add opencv.js to depends and delete the useless target of copy_js_assets
* change filename helper.js to helpers.js
* disable button or trankbar before opencv.js is ready
* Expose CV_64F
Fix#65
* improve cv.imshow to display different types as native imshow
* add utils.js to reuse functions and update tutorials
* Make doxygen depend on bin/opencv.js
* Fix memory issue of matFromArray
Fix#37
* Merge pull request from ganwenyao/tutorial_18
* Add notes for ganwenyao/tutorial_18
* Modifying for ganwenyao/tutorial_18
* Change Mat constructor with data to 5 parameters
* Mat supports constructor with Scalar
Fix#60
* update cv.imread cause the memory issue of matFromArray has been fixed
* fix canvas name and default input image
* Expose cv::Moments
Fix#85
* Add -Wno-missing-prototypes for emscripten build
* fix canvas name
* add tutorial of video input and output
* Expose enums as emscripten consts
Fix#72
* update the tutorial to use Mat constructor with Scalar and change lena.jpg
* Exclude cv::Mat for vecFromJSArray
Fix#82
* Add unit tests for cv.moments
* Fix the unit tests.
* add checkbox and stop button
* add adapter.js to make sure compatibility fo video tutorials
* Support default parameters with function overloading
* modify enums to constants
* Use https URL for MathJax.js
Fix#109
* Comment out the debug print in embindgen.py
* Expose RotatedRect
Fix#96
* replace enum with constants and improve onload function
* delete some useless paras cause #105 fixed this
* Modify const name
* Modify Contour Properties
* tutorials for imgprc2 and objdec
* Expose more functions for img proc tutorials
Fix#76
* Expose polylines for video analysis tutorial
Fix#121
* Expose constants for default parameters of img proc tutorials
Fix#122
* Fix wrong parameter types of Mat.copyTo
Fix#87
* Support default parameters of mat.convertTo
Fix#123
* Support default parameters for external constructors
Fix#131
* Revert "Expose polylines for video analysis tutorial"
This reverts commit 3ce7615652e510d30e3c0014706ac38c98883189.
Fix#121
* Support cv.minMaxLoc
Fix#127
* Expose cv.minEnclosingCircle
Fix#126
* Add video analysis tutorials
add three video tutorials, Meanshift and Camshift, Optical Flow Background Subtraction
add cup.mp4 and box.mp4 for demo in tutorials
* improve image processing tutorials
* repalce console.warn with throw to throw exception
* add try-catch to throw exception in code demo
* Change mat.size() return value to JS Array object
Fix#140
* add a note about different channels order between canvas and native opencv
* add a note about how to capture video from video files
* Binding cv.Scalar to JS array
Fix#147
* Add JS cv.Scalar object into helpers.js
* Update Install OpenCV-JavaScript tutorial page
Fix#44
* Update the OpenCV-JavaScript introduction page
Fix#44
* add cv.VideoCapture and read() function
* set the size of the hidden canvas same as the video
* Add Using OpenCV-JavaScript tutorial page
Fix#44
* fix some bad code style
* Update tutorials after 8/2 sync meeting
Changes include:
- Use OpenCV.js name instead of OpenCV-JavaScript
- Put using OpenCV.js ahead of build OpenCV.js
- Refine usage and introduction page
- Muted the video in tutorials
* Fix a typo in introduction page
* use cv.VideoCapture and its read() function to read video
* replace OpenCV-JavaScript with OpenCV.js
* Use onload of async script in js_usage tutorial
* add more info about mat.data
* Change Size to value_object
* Integrate Moh and Sajjad's editing into introduction page
* Change Point to value_object
* Change Rect to value_object with helper object
* Add helper objects for Point and Size
* Change RotatedRect to value_object with helpers
* Change MinMaxLoc and Circle to value_object
* Change TermCriteria to value_object
* Fix core_bindings.cpp for MinMaxLoc and Circle
* Remove unused types
* Change meanShift and CamShift to return Rect
* Change methods of RotatedRect to static
* Change mat.data from methods to property
Fix#75 and #77
* support img id and element in cv.imread
* Change mat.size to property and add mat.step
Fix#163
* Add matFromArray and matFromImageData as JS helpers
Fix#79, #78
* Lower camel case for Mat element getters
Fix#81
* Mat.getRoiRect and tests
Fix#86
* Support type for Mat.ptr
Fix#83
* Name changing of Mat element getters
'getUcharAt` -> 'ucharAt'
* fix code style and args names
* Fix helpers.js due to cv.Mat API update
* Fix opencv.js usage tutorial
* Fix a typo of js_setup
* Change Moments to value_object
* Add Range as value_object
Fix#171
* Support Mat.diag and Mat.isContinous
Fix#84 and #89
* Support Mat.setTo
Fix#88
* Apply edits to js_intro
* Apply edits to js_usage
* Apply edits to js_setup
* update tutorials to apply data type change
* Modify tutorials
* add core tutorials
* delete MatVector elements and delete useless set operation
* add tutorials_objdec_camera
* Add instructions for WebAssembly
* apply tech writer's feedbacks into tutorials
* Organize white list by modules
* Change size to method and bind to MatExpr.size()
Fix#177
* improve tutorials
* Modify core tutorials
* add params list and explanations for OpenCV.js functions
* remove face_profile from Face Detection in Video Capture
* Add demos link
* Change Gui to GUI
* Update js_intro based on Moh and Sajjad's edits
* Fixup for 3.3.0 rebase
* Update js_intro per Moh's suggestion
* Update contributors list per Moh's idea
* add adapter.js in video_display tutorial
* Change Mat.getRoiRect to Mat.roi
Fix#194
* Remove unnecessary files for test
Fix#192
* Licenses updated to UC BSD 3-Clause
* Apply OpenCV coding style for C++ files
* Add OpenCV license for python and js files
* Fix coding style issue in helpers.js
* Remove unused test_commons.js
* Fix coding style of test_imgproc.js
* Fix coding style of test_mat.js
* Fix space before semicolon
* Fix coding style of test_objdetect.js
* Fix coding style of tests.js
* Fix coding style of test_utils.js
* Fix coding style of test_video.js
* Fix failures of node.js tests
* Add eslint rule config and fix eslint errors
* Add eslint config for js/src and fix eslint errors
* Clean up the opencv.js dependencies
Fix#186
* Fix build issue for python generator
* Fix doxygen buildbot failure
* delete trailing whitespace, blank line at EOF and replace tab with space
* Fix tutorial_js_root reference issue for non opencv.js build
* replace the file with small size
* Initial commit of build_js.py
* Move the js build configurations to build script
* Add wasm build support
* Update OpenCV.js build tutorial by using script
* Fix global var issue in tests
* Add a README.md for build_js.py
* Copy the haar cascade files from data dir for tutorials
* Not use memory init file
* Disable debug print for modules/js/CMakeLists.txt
* Check files when build done
* Fix image name in js_gradients tutorial
* Fix image load issue in js_trackbar tutorial
* Find the opencv source directory via relative path by default
* Make the cmake args based on build_doc option
* Fix a typo in js_setup.markdown
* Fix make failure issue on config generated by build_js.py
* Eliminate js branch of hdr_parser.py
* Extract examples from js_basic_ops tutorial
* Fix coding style of utils.js
* Improve examples error handling
Handle:
1. opencv.js loading errors
2. script errors (Error)
3. cv::Exception
Fix#217
* Add enable_exception option into build_js.py
* Support print exception for exception catching disabled build
* Extract example from js_usage tutorial
* Avoid copying .eslintrc.json when building doc
Fix#223
* Revert to use onload as opencv.js ready event
* Use 4 spaces indention for js examples
* embed html in tutorials with iframe tag
* Revert to use onload as opencv.js ready event
* Extract examples from js_video_display tutorial
* Implement Utils object
* modify core imgprc and face_detection tutorials
* Fix examples of js_gui tutorials
* Fix coding style of utils.js
* Modify tutorials
* Extract example from js_face_detection_camera tutorial
* Disable new-cap check in eslint
* Extract examples from js_meanshift tutorial
* Extract examples from video tutorials
* Remove new-cap declaration and update grammer in comments
* Change textarea width to 100 to align with eslint config
* Fix printError issue when opencv.js loading fails
* Remove BUILD_opencv_js dependency for doc build
Fix#213
* Expose cv::getBuildInformation
* Dump opencv build info when opencv.js loaded for live examples
* Make the button to stand out in js live examples
Fix#235
* Style for disabled button
* Add js_imgproc_camera.html example
* Fix coding style of imgproc_camera example
* Add js_imgproc_camera tutorial
* Remove link to opencv.js demos
* doc: copy opencv.js on build, use absolute paths for assets
* doc: reuse existed file box.mp4
Added forkfour Latex command to math js support.
Split cv::norm documentation between the cv::norm and its overload, to make things clearer
Corrected some typos and cleaned up grammar.
Result is clearer documentation for the norms.
Work pending...
This adds the possibility to use multi-channel masks for the functions
cv::mean, cv::meanStdDev and the method Mat::setTo. The tests have now a
probability to use multi-channel masks for operations that support them.
This also includes Mat::copyTo, which supported multi-channel masks
before, but there was no test confirming this.
CUDA implementation wants to convert std::vector<KeyPoint> <-> GpuMat.
There is no direct mapping from KeyPoint (mix of int/float fields)
into cv::Mat element type, so this conversion must be avoided.
Legacy mode is turned back for CUDA builds.
This function is the counterpart of "Context::getProg".
With this function, users have chance to unload a program
from global run-time cached programs, and save resource.
OpenCL runtime does not require OpenCL development file (libOpenCL.so),
just the "run" library (so.1).
This patch searches for the run library (so.1) if the dev library (.so)
is not found.
Web search shows that this error has been present since at least 2015
http://answers.opencv.org/question/80532/haveopencl-return-false/
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
- Optimizations set change. Now IPP integrations will provide code for SSE42, AVX2 and AVX512 (SKX) CPUs only. For HW below SSE42 IPP code is disabled.
- Performance regressions fixes for IPP code paths;
- cv::boxFilter integration improvement;
- cv::filter2D integration improvement;
RGB2Lab_f added, bugs fixed, moved to float
several bugs fixed
LUT fixed, no switch in tetraInterpolate()
temporary code; to be removed and rewritten
before refactoring
extra interpolations removed, some things to do left
added Lab2RGB_b +XYZ version, etc.
basic version is done, to be sped up
tetra refactored
interpolations: LUT for weights, refactor., etc.
address arithm optimized
initial version of vectorized code added (not compiling now)
compilation fixed, now segfaults
a lot of fixes, vectorization temp. disabled
fixed trilinear shift size, max error dropped from 19 to 10
fixed several bugs (255 vs 256, signed vs unsigned, bIdx)
minor changes
packed: address arithmetics fixed
shorter code
experiments with pure integer calculations
Lab2RGB max error decreased to 2; need to clean the code
ready for vectorization; need cleaning
vectorized, to be debugged
precision fixed, max error is 2
Lab->XYZ shortened
minor fixes
Lab2RGB_f version fixed, to be completely rewritten using _b code
RGB2Lab_f vectorized
minors
moved to separate file
refactored Lab2RGB to float and int versions
minor fix
Lab2RGB_f vectorized
minor refactoring
Lab2RGBint refactored: process methods, vectorize by 4 pix
Lab2RGB_f int version is done
cleanup extra code
code copied to color.cpp
fixed blue idx bug
optimizations enabled when testing; mulFracConst introduced
divConst -> mulFracConst
calc min time in perf instead of avg
minors
process() slightly sped up
Lab2RGB_f: disabled int version
reinterpret added, minor fixes in names
some warnings fixed
changes transferred to color.cpp
RGB2Lab_f code (and trilinear interpolation code) moved to rgb2lab_faster
whitespace
shift negative fixed
more warnings fixed
"constant condition" warnings fixed, little speed up
minor changes
test_photo decolor fixed
changes copied to test_lab.cpp
idx bounds checking in LUT init
several fixes
WIP: softfloat almost integrated
test_lab partially rewritten to SoftFloat
color.cpp rewritten to SoftFloat
test_lab.cpp: accuracy code added
several fixes
RGB2Lab_b testing fixed
splineBuild() rewritten to SoftFloat
accuracy control improved
rounding fixed
Luv <=> RGB: rewritten to SoftFloat
OCL cvtColor Lab and Lut rewritten to SoftFloat
minor fixes
refactored to new SoftFloat interface
round() -> cvRound, etc.
fixed OCL tests
softfloat.cpp: internal functions made static, unused ones removed
meaningful constants
extra lines removed
unused function removed
unfinished work
it works, need to fix TODOs
refactoring; more calls rewritten
mulFracConst removed
constants made bit exact; minors
changes moved to color.cpp
fixed 1 bug and 4 warnings
OCL: fixed constants
pow(x, _1_3f) replaced by cubeRoot(x)
fixed compilation on MSVC32
magic constants explained
file with internal accuracy&speed tests moved to lab_tetra branch
Add constructors taking initializer_list for some of OpenCV data types (#9034)
* Add a constructor taking initializer_list for Matx
* Add a constructor taking initializer list for Mat and Mat_
* Add one more method to initialize Mat to the corresponding tutorial
* Add a note how to initialize Matx
* CV_CXX_11->CV_CXX11
BufferPoolController has a non virtual protected destructor (which is legitimate)
However, Visual Studio sees this as a bug, if you enable more warnings, like below
```
add_compile_options(/W3) # level 3 warnings
add_compile_options(/we4265) # warning about missing virtual destructors
```
This is a proposition in order to silence this warning.
See https://github.com/ivsgroup/boost_warnings_minimal_demo for a demo of the same problem
with boost/exception.hpp
Remove unnecessary Non-ASCII characters from source code (#9075)
* Remove unnecessary Non-ASCII characters from source code
Remove unnecessary Non-ASCII characters and replace them with ASCII
characters
* Remove dashes in the @param statement
Remove dashes and place single space in the @param statement to keep
coding style
* misc: more fixes for non-ASCII symbols
* misc: fix non-ASCII symbol in CMake file
* another round of dnn optimization:
* increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
* improved SIMD optimization of pooling layer, optimized average pooling
* cleaned up convolution layer implementation
* made activation layer "attacheable" to all other layers, including fully connected and addition layer.
* fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
* greatly optimized permutation layer, which improved SSD performance
* parallelized element-wise binary/ternary/... ops (sum, prod, max)
* also, added missing copyrights to many of the layer implementation files
* temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders
Fixed snprintf for VS 2013 (#8816)
* Fixed snprintf for VS 2013
* snprintf: removed declaration from header, changed implementation
* cv_snprintf corrected according to comments
* update snprintf patch
* avoid link error (move the implementation of software version to header)
* make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp)
* fix error on 32bit x86
Parallelize Canny with custom gradient (#8694)
* New Canny implementation. Restructuring code in parallelCanny class. Align mag buffer and map.
* Fix warnings.
* Missing SIMD check added.
* Replaced local trailingZeros in contours.cpp. Use alignSize in canny.cpp
* Fix warnings in alignSize and allocate just minimum extra columns.
* Fix another warning in map.create.
* Exchange for loop by do loop to avoid double check at the beginning.
Define extra SIMD CANNY_CHECK to avoid unnecessary continue.
* Correct the existing documented T-API functions to match the doxygen format.
* docs: fix comments style
* T-API documentation: minor formatting changes
- persistence.cpp code expects special sizeof value for passed structures
- this assumption is lead to memory corruption problems
- fixed/workarounded test to prevent memory corruption on Linux 32-bit systems
Updated integrations for:
cv::split
cv::merge
cv::insertChannel
cv::extractChannel
cv::Mat::convertTo - now with scaled conversions support
cv::LUT - disabled due to performance issues
Mat::copyTo
Mat::setTo
cv::flip
cv::copyMakeBorder - currently disabled
cv::polarToCart
cv::pow - ipp pow function was removed due to performance issues
cv::hal::magnitude32f/64f - disabled for <= SSE42, poor performance
cv::countNonZero
cv::minMaxIdx
cv::norm
cv::canny - new integration. Disabled for threaded;
cv::cornerHarris
cv::boxFilter
cv::bilateralFilter
cv::integral
Add support for std::array<T, N> (#8535)
* Add support for std::array<T, N>
* Add std::array<Mat, N> support
* Remove UMat constructor with std::array parameter
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f5469.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
This test case uses a matrix with more dimensions than columns. Without
the fix in
b45e784beb
this crashes with a segmentation fault, hangs or simply fails with wrong
values.
- use suffixes like '.avx.cpp'
- added CMake-generated files for '.simd.hpp' optimization approach
- wrap HAL intrinsic headers into separate namespaces for different build flags
- automatic vzeroupper insertion (via CV_INSTRUMENT_REGION macro)
`template<typename _Tp> inline const _Tp* Mat_<_Tp>::operator [](int y) const` does not support 3d matrix since it checks rows.
This operator[] shall check size.p[0] instead.
* Fix the documentation for Mat::diag(int).
Fix issue #8181
* Fix the documentation for Mat::diag(int).
Fix issue #8181.
* Add support for printing out cv::Complex.
* Remove extra spaces.
* cv::Complex is submitted as a new pull request.
Add support for printing out cv::Complex. (#8208)
* Add support for printing out cv::Complex.
* Conform to the format of std::complex.
* Remove extra spaces.
* Remove extra spaces.
Although both `cl_platform_info` and `cl_device_info` are defined as macro `cl_uint`, it needs to use `cl_platform_info` to get
the platform information.
- don't use undefined flag=0. It should be CONSTANT instead.
- don't allow 'UMat* m=NULL' argument (except LOCAL/CONSTANT flags).
This case is not handled well to provide NULL __global pointers.
It is better to use '-D' macro defines instead (at least for performance)
Fix typos in the documentation for AutoBuffer. (#8197)
* Allocate 1000 floats to match the documentation
Fix the documentation of `AutoBuffer`. By default, the following code
```.cpp
cv::AutoBuffer<float> m;
````
allocates only 264 floats. But the comment in the demonstration code says it allocates 1000 floats, which is
not correct.
* fix typo in the comment.
Append zero to trailing decimal place for FileStorage JSON write of a float or double value (#7952)
* Fix for FileStorage JSON write of a float or double value that has no fractional part; appends a zero character after the trailing decimal place to meet JSON standard.
* strlen return to size_t type rather than unnecessary cast to int
* moved BLAS/LAPACK detection scripts from opencv_contrib/dnn to the main repository.
* trying to fix the bug with undefined symbols sgesdd_ and dgesdd_
* removed extra whitespaces; disabled LAPACK on IOS
Currently, to select a submatrix of a N-dimensional matrix, it requires
two lines of code while only one line of code is required if using a 2D
array.
I added functionality to be able to select an N-dim submatrix using a
vector list instead of a Range pointer. This allows initializer lists to
be used for a one-line selection.
This allows for an N-dimensional array to be setup in one line instead of two when using C++11 initializer lists. cv::Mat(3, {zDim, yDim, xDim}, ...) can be used instead of having to create an int pointer to hold the size array.
Maximum depth limit var was added to the instrumentation structure;
Trace names output console output fix: improper tree formatting could happen;
Output in case of error was added;
Custom regions improvements;
Improved timing and weight calculation for parallel regions; New TC (threads counter) value to indicate how many different threads accessed particular node;
parallel_for, warnings fixes and ReturnAddress code from Alexander Alekhin;
* use hasSIMD128 rather than calling checkHardwareSupport
* add SIMD check in spartialgradient.cpp
* add SIMD check in stereosgbm.cpp
* add SIMD check in canny.cpp
A bug in ICC improperly identified the first parameter as "void*"
rather than the proper "volatile long*". This is scheduled to be
fixed in ICC in a future release.
This patch casts only to a "long*" to preserve backwards compatibility
with the ICC 16 and ICC 17 releases.
In YAML 1.0 the colon is mandatory. See http://yaml.org/spec/1.0/#id2558635.
This also allows prior releases to read YAML files created with the current version.
[GSOC] New camera model for stitching pipeline
* implement estimateAffine2D
estimates affine transformation using robust RANSAC method.
* uses RANSAC framework in calib3d
* includes accuracy test
* uses SVD decomposition for solving 3 point equation
* implement estimateAffinePartial2D
estimates limited affine transformation
* includes accuracy test
* stitching: add affine matcher
initial version of matcher that estimates affine transformation
* stitching: added affine transform estimator
initial version of estimator that simply chain transformations in homogeneous coordinates
* calib3d: rename estimateAffine3D test
test Calib3d_EstimateAffineTransform rename to Calib3d_EstimateAffine3D. This is more descriptive and prevents confusion with estimateAffine2D tests.
* added perf test for estimateAffine functions
tests both estimateAffine2D and estimateAffinePartial2D
* calib3d: compare error in square in estimateAffine2D
* incorporates fix from #6768
* rerun affine estimation on inliers
* stitching: new API for parallel feature finding
due to ABI breakage new functionality is added to `FeaturesFinder2`, `SurfFeaturesFinder2` and `OrbFeaturesFinder2`
* stitching: add tests for parallel feature find API
* perf test (about linear speed up)
* accuracy test compares results with serial version
* stitching: use dynamic_cast to overcome ABI issues
adding parallel API to FeaturesFinder breaks ABI. This commit uses dynamic_cast and hardcodes thread-safe finders to avoid breaking ABI.
This should be replaced by proper method similar to FeaturesMatcher on next ABI break.
* use estimateAffinePartial2D in AffineBestOf2NearestMatcher
* add constructor to AffineBestOf2NearestMatcher
* allows to choose between full affine transform and partial affine transform. Other params are the as for BestOf2NearestMatcher
* added protected field
* samples: stitching_detailed support affine estimator and matcher
* added new flags to choose matcher and estimator
* stitching: rework affine matcher
represent transformation in homogeneous coordinates
affine matcher: remove duplicite code
rework flow to get rid of duplicite code
affine matcher: do not center points to (0, 0)
it is not needed for affine model. it should not affect estimation in any way.
affine matcher: remove unneeded cv namespacing
* stitching: add stub bundle adjuster
* adds stub bundle adjuster that does nothing
* can be used in place of standard bundle adjusters to omit bundle adjusting step
* samples: stitching detailed, support no budle adjust
* uses new NoBundleAdjuster
* added affine warper
* uses R to get whole affine transformation and propagates rotation and translation to plane warper
* add affine warper factory class
* affine warper: compensate transformation
* samples: stitching_detailed add support for affine warper
* add Stitcher::create method
this method follows similar constructor methods and returns smart pointer. This allows constructing Stitcher according to OpenCV guidelines.
* supports multiple stitcher configurations (PANORAMA and SCANS) for convenient setup
* returns cv::Ptr
* stitcher: dynamicaly determine correct estimator
we need to use affine estimator for affine matcher
* preserves ABI (but add hints for ABI 4)
* uses dynamic_cast hack to inject correct estimator
* sample stitching: add support for multiple modes
shows how to use different configurations of stitcher easily (panorama stitching and scans affine model)
* stitcher: find features in parallel
use new FeatureFinder API to find features in parallel. Parallelized using TBB.
* stitching: disable parallel feature finding for OCL
it does not bring much speedup to run features finder in parallel when OpenCL is enabled, because finder needs to wait for OCL device.
Also, currently ORB is not thread-safe when OCL is enabled.
* stitching: move matcher tests
move matchers tests perf_stich.cpp -> perf_matchers.cpp
* stitching: add affine stiching integration test
test basic affine stitching (SCANS mode of stitcher) with images that have only translation between them
* enable surf for stitching tests
stitching.b12 test was failing with surf
investigated the issue, surf is producing good result. Transformation is only slightly different from ORB, so that resulting pano does not exactly match ORB's result. That caused sanity check to fail.
* added size checks similar to other tests
* sanity check will be applied only for ORB
* stitching: fix wrong estimator choice
if case was exactly wrong, estimators were chosen wrong
added logging for estimated transformation
* enable surf for matchers stitching tests
* enable SURF
* rework sanity checking. Check estimated transform instead of matches. Est. transform should be more stable and comparable between SURF and ORB.
* remove regression checking for VectorFeatures tests. It has a lot if data andtest is the same as previous except it test different vector size for performance, so sanity checking does not add any value here. Added basic sanity asserts instead.
* stitching tests: allow relative error for transform
* allows .01 relative error for estimated homography sanity check in stitching matchers tests
* fix VS warning
stitching tests: increase relative error
increase relative error to make it pass on all platforms (results are still good).
stitching test: allow bigger relative error
transformation can differ in small values (with small absolute difference, but large relative difference). transformation output still looks usable for all platforms. This difference affects only mac and windows, linux passes fine with small difference.
* stitching: add tests for affine matcher
uses s1, s2 images. added also new sanity data.
* stitching tests: use different data for matchers tests
this data should yeild more stable transformation (it has much more matches, especially for surf). Sanity data regenerated.
* stitching test: rework tests for matchers
* separated rotation and translations as they are different by scale.
* use appropriate absolute error for them separately. (relative error does not work for values near zero.)
* stitching: fix affine warper compensation
calculation of rotation and translation extracted for plane warper was wrong
* stitching test: enable surf for opencl integration tests
* enable SURF with correct guard (HAVE_OPENCV_XFEATURES2D)
* add OPENCL guard and correct namespace as usual for opencl tests
* stitching: add ocl accuracy test for affine warper
test consistent results with ocl on and off
* stitching: add affine warper ocl perf test
add affine warper to existing warper perf tests. Added new sanity data.
* stitching: do not overwrite inliers in affine matcher
* estimation is run second time on inliers only, inliers produces in second run will not be therefore correct for all matches
* calib3d: add Levenberg–Marquardt refining to estimateAffine2D* functions
this adds affine Levenberg–Marquardt refining to estimateAffine2D functions similar to what is done in findHomography.
implements Levenberg–Marquardt refinig for both full affine and partial affine transformations.
* stitching: remove reestimation step in affine matcher
reestimation step is not needed. estimateAffine2D* functions are running their own reestimation on inliers using the Levenberg-Marquardt algorithm, which is better than simply rerunning RANSAC on inliers.
* implement partial affine bundle adjuster
bundle adjuster that expect affine transform with 4DOF. Refines parameters for all cameras together.
stitching: fix bug in BundleAdjusterAffinePartial
* use the invers properly
* use static buffer for invers to speed it up
* samples: add affine bundle adjuster option to stitching_detailed
* add support for using affine bundle adjuster with 4DOF
* improve logging of initial intristics
* sttiching: add affine bundle adjuster test
* fix build warnings
* stitching: increase limit on sanity check
prevents spurious test failures on mac. values are still pretty fine.
* stitching: set affine bundle adjuster for SCANS mode
* fix bug with AffineBestOf2NearestMatcher (we want to select affine partial mode)
* select right bundle adjuster
* stitching: increase error bound for matcher tests
* this prevents failure on mac. tranformation is still ok.
* stitching: implement affine bundle adjuster
* implements affine bundle adjuster that is using full affine transform
* existing test case modified to test both affinePartial an full affine bundle adjuster
* add stitching tutorial
* show basic usage of stitching api (Stitcher class)
* stitching: add more integration test for affine stitching
* added new datasets to existing testcase
* removed unused include
* calib3d: move `haveCollinearPoints` to common header
* added comment to make that this also checks too close points
* calib3d: redone checkSubset for estimateAffine* callback
* use common function to check collinearity
* this also ensures that point will not be too close to each other
* calib3d: change estimateAffine* functions API
* more similar to `findHomography`, `findFundamentalMat`, `findEssentialMat` and similar
* follows standard recommended semantic INPUTS, OUTPUTS, FLAGS
* allows to disable refining
* supported LMEDS robust method (tests yet to come) along with RANSAC
* extended docs with some tips
* calib3d: rewrite estimateAffine2D test
* rewrite in googletest style
* parametrize to test both robust methods (RANSAC and LMEDS)
* get rid of boilerplate
* calib3d: rework estimateAffinePartial2D test
* rework in googletest style
* add testing for LMEDS
* calib3d: rework estimateAffine*2D perf test
* test for LMEDS speed
* test with/without Levenberg-Marquart
* remove sanity checking (this is covered by accuracy tests)
* calib3d: improve estimateAffine*2D tests
* test transformations in loop
* improves test by testing more potential transformations
* calib3d: rewrite kernels for estimateAffine*2D functions
* use analytical solution instead of SVD
* this version is faster especially for smaller amount of points
* calib3d: tune up perf of estimateAffine*2D functions
* avoid copying inliers
* avoid converting input points if not necessary
* check only `from` point for collinearity, as `to` does not affect stability of transform
* tutorials: add commands examples to stitching tutorials
* add some examples how to run stitcher sample code
* mention stitching_detailed.cpp
* calib3d: change computeError for estimateAffine*2D
* do error computing in floats instead of doubles
this have required precision + we were storing the result in float anyway. This make code faster and allows auto-vectorization by smart compilers.
* documentation: mention estimateAffine*2D function
* refer to new functions on appropriate places
* prefer estimateAffine*2D over estimateRigidTransform
* stitching: add camera models documentations
* mention camera models in module documentation to give user a better overview and reduce confusion
* use __GNUC_MINOR__ in correct place to check the version of GCC
* check processor support of FP16 at run time
* check compiler support of FP16 and pass correct compiler option
* rely on ENABLE_AVX on gcc since AVX is generated when mf16c is passed
* guard correctly using ifdef in case of various configuration
* use v_float16x4 correctly by including the right header file
- calculate ticksTotal instead of ticksMean
- local / global width is based on ticksTotal value
- added instrumentation for OpenCL program compilation
- added instrumentation for OpenCL kernel execution
Minor fix in MatAllocator::upload
Minor fix in MatAllocator::copy
Minor fix in setSize function
Minor fix in Mat::Mat
Minor fix in cvMatNDToMat function
Minor fix in _InputArray::getMatVector
Minor fix in _InputArray::getUMatVector
Minor fix in cv::hconcat
Minor fix in cv::vconcat
Minor fix in cv::setIdentity
Minor fix in cv::trace
Minor fix in transposeI_ template function
Minor fix in reduceC_ template function
Minor fix in sort_ template function
Minor fix in sortIdx_ template function
Minor fix in cvRange function
Minor fix in MatConstIterator::seek
Minor fix in SparseMat::create
Minor fix in SparseMat::copyTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::convertTo
Minor fix in SparseMat::ptr
Minor fix in SparseMat::resizeHashTab
Fixes indentation
* use v_float16x4 (universal intrinsic) instead of raw SSE/NEON implementation
* define v_load_f16/v_store_f16 since v_load can't be distinguished when short pointer passed
* brush up implementation on old compiler (guard correctly)
* add test for v_load_f16 and round trip conversion of v_float16x4
* fix conversion error
* use universal intrinsic for accumulate series using float/double
* accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
* add v_cvt_f64_high in both SSE/NEON
* add test for conversion v_cvt_f64_high in test_intrin.cpp
* improve some existing universal intrinsic by using new instructions in Aarch64
* add workaround for Android build in intrin_neon.hpp
* Added 2-channel ops to match existing 3-channel and 4-channel ops
* v_load_deinterleave() and v_store_interleave()
* Implements float32x4 only on SSE (but all types on NEON and CPP)
* Includes tests
* Will be used to vectorize 2D functions, such as estimateAffine2D()