2013-10-22 18:04:49 +08:00
/*M///////////////////////////////////////////////////////////////////////////////////////
//
// IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
//
// By downloading, copying, installing or using the software you agree to this license.
// If you do not agree to this license, do not download, install,
// copy or use the software.
//
//
// License Agreement
// For Open Source Computer Vision Library
//
// Copyright (C) 2013, OpenCV Foundation, all rights reserved.
// Third party copyrights are property of their respective owners.
//
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
//
// * Redistribution's of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
//
// * Redistribution's in binary form must reproduce the above copyright notice,
// this list of conditions and the following disclaimer in the documentation
// and/or other materials provided with the distribution.
//
// * The name of the copyright holders may not be used to endorse or promote products
// derived from this software without specific prior written permission.
//
// This software is provided by the copyright holders and contributors "as is" and
// any express or implied warranties, including, but not limited to, the implied
// warranties of merchantability and fitness for a particular purpose are disclaimed.
// In no event shall the OpenCV Foundation or contributors be liable for any direct,
// indirect, incidental, special, exemplary, or consequential damages
// (including, but not limited to, procurement of substitute goods or services;
// loss of use, data, or profits; or business interruption) however caused
// and on any theory of liability, whether in contract, strict liability,
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
//M*/
# include "precomp.hpp"
2020-08-31 17:30:06 +08:00
# ifndef HAVE_OPENCL
# include "ocl_disabled.impl.hpp"
# else // HAVE_OPENCL
2014-01-16 22:30:39 +08:00
# include <list>
2013-10-22 18:04:49 +08:00
# include <map>
2017-07-06 22:57:05 +08:00
# include <deque>
2017-09-07 04:15:47 +08:00
# include <set>
2013-12-25 18:39:21 +08:00
# include <string>
# include <sstream>
2017-10-12 19:23:45 +08:00
# include <fstream>
2016-09-16 15:03:42 +08:00
# if !(defined _MSC_VER) || (defined _MSC_VER && _MSC_VER > 1700)
2016-08-25 22:26:46 +08:00
# include <inttypes.h>
2016-09-16 15:03:42 +08:00
# endif
2013-10-22 18:04:49 +08:00
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
# include <opencv2/core/utils/configuration.private.hpp>
2018-10-01 21:28:17 +08:00
# include <opencv2/core/utils/logger.defines.hpp>
# undef CV_LOG_STRIP_LEVEL
# define CV_LOG_STRIP_LEVEL CV_LOG_LEVEL_DEBUG + 1
2017-10-12 19:23:45 +08:00
# include <opencv2/core/utils/logger.hpp>
2016-12-19 05:38:33 +08:00
# include "opencv2/core/ocl_genbase.hpp"
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
# include "opencl_kernels_core.hpp"
2016-12-19 05:38:33 +08:00
2017-10-12 19:23:45 +08:00
# include "opencv2/core/utils/lock.private.hpp"
# include "opencv2/core/utils/filesystem.hpp"
# include "opencv2/core/utils/filesystem.private.hpp"
2018-10-01 21:28:17 +08:00
# define CV__ALLOCATOR_STATS_LOG(...) CV_LOG_VERBOSE(NULL, 0, "OpenCL allocator: " << __VA_ARGS__)
# include "opencv2/core/utils/allocator_stats.impl.hpp"
# undef CV__ALLOCATOR_STATS_LOG
2017-10-12 19:23:45 +08:00
# define CV_OPENCL_ALWAYS_SHOW_BUILD_LOG 0
2021-09-04 09:34:02 +08:00
# define CV_OPENCL_SHOW_BUILD_OPTIONS 0
# define CV_OPENCL_SHOW_BUILD_KERNELS 0
2017-10-12 19:23:45 +08:00
# define CV_OPENCL_SHOW_RUN_KERNELS 0
2021-09-04 09:34:02 +08:00
# define CV_OPENCL_SYNC_RUN_KERNELS 0
2017-10-12 19:23:45 +08:00
# define CV_OPENCL_TRACE_CHECK 0
2017-11-01 23:18:54 +08:00
2017-10-12 19:23:45 +08:00
# define CV_OPENCL_VALIDATE_BINARY_PROGRAMS 1
2017-11-01 23:18:54 +08:00
2017-10-12 19:23:45 +08:00
# define CV_OPENCL_SHOW_SVM_ERROR_LOG 1
# define CV_OPENCL_SHOW_SVM_LOG 0
2014-02-27 16:51:40 +08:00
2014-01-16 22:30:39 +08:00
# include "opencv2/core/bufferpool.hpp"
# ifndef LOG_BUFFER_POOL
# if 0
# define LOG_BUFFER_POOL printf
# else
# define LOG_BUFFER_POOL(...)
# endif
# endif
2015-01-02 08:33:40 +08:00
# if CV_OPENCL_SHOW_SVM_LOG
// TODO add timestamp logging
# define CV_OPENCL_SVM_TRACE_P printf("line %d (ocl.cpp): ", __LINE__); printf
# else
# define CV_OPENCL_SVM_TRACE_P(...)
# endif
# if CV_OPENCL_SHOW_SVM_ERROR_LOG
// TODO add timestamp logging
# define CV_OPENCL_SVM_TRACE_ERROR_P printf("Error on line %d (ocl.cpp): ", __LINE__); printf
# else
# define CV_OPENCL_SVM_TRACE_ERROR_P(...)
# endif
2021-06-08 04:24:27 +08:00
# include "opencv2/core/opencl/runtime/opencl_clblas.hpp"
# include "opencv2/core/opencl/runtime/opencl_clfft.hpp"
2013-12-15 03:16:53 +08:00
2013-11-21 17:05:32 +08:00
# include "opencv2/core/opencl/runtime/opencl_core.hpp"
2017-11-01 23:18:54 +08:00
# ifdef HAVE_OPENCL_SVM
# include "opencv2/core/opencl/runtime/opencl_svm_20.hpp"
# include "opencv2/core/opencl/runtime/opencl_svm_hsa_extension.hpp"
# include "opencv2/core/opencl/opencl_svm.hpp"
# endif
2018-01-16 22:33:06 +08:00
# include "umatrix.hpp"
2017-11-01 23:18:54 +08:00
namespace cv { namespace ocl {
2017-12-03 01:48:30 +08:00
# define IMPLEMENT_REFCOUNTABLE() \
void addref ( ) { CV_XADD ( & refcount , 1 ) ; } \
void release ( ) { if ( CV_XADD ( & refcount , - 1 ) = = 1 & & ! cv : : __termination ) delete this ; } \
int refcount
2018-10-01 21:28:17 +08:00
static cv : : utils : : AllocatorStatistics opencl_allocator_stats ;
CV_EXPORTS cv : : utils : : AllocatorStatisticsInterface & getOpenCLAllocatorStatistics ( ) ;
cv : : utils : : AllocatorStatisticsInterface & getOpenCLAllocatorStatistics ( )
{
return opencl_allocator_stats ;
}
2017-11-03 17:23:18 +08:00
# ifndef _DEBUG
2014-06-19 19:18:52 +08:00
static bool isRaiseError ( )
{
static bool initialized = false ;
static bool value = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
value = cv : : utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_RAISE_ERROR " , false ) ;
2014-06-19 19:18:52 +08:00
initialized = true ;
}
return value ;
}
2014-02-01 19:07:03 +08:00
# endif
2021-12-28 20:43:42 +08:00
static void onOpenCLKernelBuildError ( )
{
// NB: no need to cache this value
bool value = cv : : utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_ABORT_ON_BUILD_ERROR " , false ) ;
if ( value )
{
fprintf ( stderr , " Abort on OpenCL kernel build failure! \n " ) ;
abort ( ) ;
}
}
2017-11-01 23:18:54 +08:00
# if CV_OPENCL_TRACE_CHECK
static inline
void traceOpenCLCheck ( cl_int status , const char * message )
{
std : : cout < < " OpenCV(OpenCL: " < < status < < " ): " < < message < < std : : endl < < std : : flush ;
}
# define CV_OCL_TRACE_CHECK_RESULT(status, message) traceOpenCLCheck(status, message)
# else
# define CV_OCL_TRACE_CHECK_RESULT(status, message) /* nothing */
2015-01-02 08:33:40 +08:00
# endif
2017-11-01 23:18:54 +08:00
# define CV_OCL_API_ERROR_MSG(check_result, msg) \
cv : : format ( " OpenCL error %s (%d) during call: %s " , getOpenCLErrorString ( check_result ) , check_result , msg )
# define CV_OCL_CHECK_RESULT(check_result, msg) \
do { \
CV_OCL_TRACE_CHECK_RESULT ( check_result , msg ) ; \
if ( check_result ! = CL_SUCCESS ) \
{ \
2019-08-21 05:27:36 +08:00
static_assert ( std : : is_convertible < decltype ( msg ) , const char * > : : value , " msg of CV_OCL_CHECK_RESULT must be const char* " ) ; \
2017-11-01 23:18:54 +08:00
cv : : String error_msg = CV_OCL_API_ERROR_MSG ( check_result , msg ) ; \
CV_Error ( Error : : OpenCLApiCallError , error_msg ) ; \
} \
} while ( 0 )
# define CV_OCL_CHECK_(expr, check_result) do { expr; CV_OCL_CHECK_RESULT(check_result, #expr); } while (0)
# define CV_OCL_CHECK(expr) do { cl_int __cl_result = (expr); CV_OCL_CHECK_RESULT(__cl_result, #expr); } while (0)
# ifdef _DEBUG
# define CV_OCL_DBG_CHECK_RESULT(check_result, msg) CV_OCL_CHECK_RESULT(check_result, msg)
# define CV_OCL_DBG_CHECK(expr) CV_OCL_CHECK(expr)
# define CV_OCL_DBG_CHECK_(expr, check_result) CV_OCL_CHECK_(expr, check_result)
# else
# define CV_OCL_DBG_CHECK_RESULT(check_result, msg) \
do { \
CV_OCL_TRACE_CHECK_RESULT ( check_result , msg ) ; \
if ( check_result ! = CL_SUCCESS & & isRaiseError ( ) ) \
{ \
2019-08-21 05:27:36 +08:00
static_assert ( std : : is_convertible < decltype ( msg ) , const char * > : : value , " msg of CV_OCL_DBG_CHECK_RESULT must be const char* " ) ; \
2017-11-01 23:18:54 +08:00
cv : : String error_msg = CV_OCL_API_ERROR_MSG ( check_result , msg ) ; \
CV_Error ( Error : : OpenCLApiCallError , error_msg ) ; \
} \
} while ( 0 )
# define CV_OCL_DBG_CHECK_(expr, check_result) do { expr; CV_OCL_DBG_CHECK_RESULT(check_result, #expr); } while (0)
# define CV_OCL_DBG_CHECK(expr) do { cl_int __cl_result = (expr); CV_OCL_DBG_CHECK_RESULT(__cl_result, #expr); } while (0)
# endif
2013-10-22 18:04:49 +08:00
2017-10-12 19:23:45 +08:00
static const bool CV_OPENCL_CACHE_ENABLE = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_CACHE_ENABLE " , true ) ;
static const bool CV_OPENCL_CACHE_WRITE = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_CACHE_WRITE " , true ) ;
static const bool CV_OPENCL_CACHE_LOCK_ENABLE = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_CACHE_LOCK_ENABLE " , true ) ;
2017-11-24 17:52:29 +08:00
static const bool CV_OPENCL_CACHE_CLEANUP = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_CACHE_CLEANUP " , true ) ;
2017-10-12 19:23:45 +08:00
# if CV_OPENCL_VALIDATE_BINARY_PROGRAMS
static const bool CV_OPENCL_VALIDATE_BINARY_PROGRAMS_VALUE = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_VALIDATE_BINARY_PROGRAMS " , false ) ;
# endif
2017-12-22 17:28:41 +08:00
// Option to disable calls clEnqueueReadBufferRect / clEnqueueWriteBufferRect / clEnqueueCopyBufferRect
static const bool CV_OPENCL_DISABLE_BUFFER_RECT_OPERATIONS = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_DISABLE_BUFFER_RECT_OPERATIONS " ,
# ifdef __APPLE__
true
# else
false
# endif
) ;
2022-05-26 20:30:41 +08:00
static String getBuildExtraOptions ( )
2018-09-21 23:04:18 +08:00
{
static String param_buildExtraOptions ;
static bool initialized = false ;
if ( ! initialized )
{
param_buildExtraOptions = utils : : getConfigurationParameterString ( " OPENCV_OPENCL_BUILD_EXTRA_OPTIONS " , " " ) ;
initialized = true ;
if ( ! param_buildExtraOptions . empty ( ) )
CV_LOG_WARNING ( NULL , " OpenCL: using extra build options: ' " < < param_buildExtraOptions < < " ' " ) ;
}
return param_buildExtraOptions ;
}
2019-09-24 18:03:29 +08:00
static const bool CV_OPENCL_ENABLE_MEM_USE_HOST_PTR = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_ENABLE_MEM_USE_HOST_PTR " , true ) ;
static const size_t CV_OPENCL_ALIGNMENT_MEM_USE_HOST_PTR = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_ALIGNMENT_MEM_USE_HOST_PTR " , 4 ) ;
2017-10-12 19:23:45 +08:00
2013-10-25 20:46:03 +08:00
struct UMat2D
{
2013-11-19 00:48:00 +08:00
UMat2D ( const UMat & m )
2013-10-25 20:46:03 +08:00
{
2013-11-22 22:56:03 +08:00
offset = ( int ) m . offset ;
step = ( int ) m . step ;
2013-10-25 20:46:03 +08:00
rows = m . rows ;
cols = m . cols ;
}
2013-11-22 22:56:03 +08:00
int offset ;
int step ;
2013-10-25 20:46:03 +08:00
int rows ;
int cols ;
} ;
struct UMat3D
{
2013-11-19 00:48:00 +08:00
UMat3D ( const UMat & m )
2013-10-25 20:46:03 +08:00
{
2013-11-22 22:56:03 +08:00
offset = ( int ) m . offset ;
step = ( int ) m . step . p [ 1 ] ;
slicestep = ( int ) m . step . p [ 0 ] ;
slices = ( int ) m . size . p [ 0 ] ;
2013-10-25 20:46:03 +08:00
rows = m . size . p [ 1 ] ;
cols = m . size . p [ 2 ] ;
}
2013-11-22 22:56:03 +08:00
int offset ;
int slicestep ;
int step ;
2013-10-25 20:46:03 +08:00
int slices ;
int rows ;
int cols ;
} ;
2013-10-22 18:04:49 +08:00
// Computes 64-bit "cyclic redundancy check" sum, as specified in ECMA-182
static uint64 crc64 ( const uchar * data , size_t size , uint64 crc0 = 0 )
{
static uint64 table [ 256 ] ;
static bool initialized = false ;
if ( ! initialized )
{
for ( int i = 0 ; i < 256 ; i + + )
{
uint64 c = i ;
for ( int j = 0 ; j < 8 ; j + + )
c = ( ( c & 1 ) ? CV_BIG_UINT ( 0xc96c5795d7870f42 ) : 0 ) ^ ( c > > 1 ) ;
table [ i ] = c ;
}
initialized = true ;
}
uint64 crc = ~ crc0 ;
for ( size_t idx = 0 ; idx < size ; idx + + )
crc = table [ ( uchar ) crc ^ data [ idx ] ] ^ ( crc > > 8 ) ;
return ~ crc ;
}
2020-08-31 17:30:06 +08:00
# if OPENCV_HAVE_FILESYSTEM_SUPPORT
2017-10-12 19:23:45 +08:00
struct OpenCLBinaryCacheConfigurator
{
cv : : String cache_path_ ;
cv : : String cache_lock_filename_ ;
cv : : Ptr < utils : : fs : : FileLock > cache_lock_ ;
typedef std : : map < std : : string , std : : string > ContextCacheType ;
ContextCacheType prepared_contexts_ ;
2017-11-24 17:52:29 +08:00
Mutex mutex_prepared_contexts_ ;
2017-10-12 19:23:45 +08:00
OpenCLBinaryCacheConfigurator ( )
{
CV_LOG_DEBUG ( NULL , " Initializing OpenCL cache configuration... " ) ;
if ( ! CV_OPENCL_CACHE_ENABLE )
{
CV_LOG_INFO ( NULL , " OpenCL cache is disabled " ) ;
return ;
}
cache_path_ = utils : : fs : : getCacheDirectory ( " opencl_cache " , " OPENCV_OPENCL_CACHE_DIR " ) ;
if ( cache_path_ . empty ( ) )
{
CV_LOG_INFO ( NULL , " Specify OPENCV_OPENCL_CACHE_DIR configuration parameter to enable OpenCL cache " ) ;
}
do
{
try
{
if ( cache_path_ . empty ( ) )
break ;
if ( cache_path_ = = " disabled " )
break ;
if ( ! utils : : fs : : createDirectories ( cache_path_ ) )
{
CV_LOG_DEBUG ( NULL , " Can't use OpenCL cache directory: " < < cache_path_ ) ;
clear ( ) ;
break ;
}
if ( CV_OPENCL_CACHE_LOCK_ENABLE )
{
cache_lock_filename_ = cache_path_ + " .lock " ;
if ( ! utils : : fs : : exists ( cache_lock_filename_ ) )
{
CV_LOG_DEBUG ( NULL , " Creating lock file... ( " < < cache_lock_filename_ < < " ) " ) ;
std : : ofstream lock_filename ( cache_lock_filename_ . c_str ( ) , std : : ios : : out ) ;
if ( ! lock_filename . is_open ( ) )
{
CV_LOG_WARNING ( NULL , " Can't create lock file for OpenCL program cache: " < < cache_lock_filename_ ) ;
break ;
}
}
try
{
cache_lock_ = makePtr < utils : : fs : : FileLock > ( cache_lock_filename_ . c_str ( ) ) ;
CV_LOG_VERBOSE ( NULL , 0 , " Checking cache lock... ( " < < cache_lock_filename_ < < " ) " ) ;
{
utils : : shared_lock_guard < utils : : fs : : FileLock > lock ( * cache_lock_ ) ;
}
CV_LOG_VERBOSE ( NULL , 0 , " Checking cache lock... Done! " ) ;
}
catch ( const cv : : Exception & e )
{
CV_LOG_WARNING ( NULL , " Can't create OpenCL program cache lock: " < < cache_lock_filename_ < < std : : endl < < e . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_WARNING ( NULL , " Can't create OpenCL program cache lock: " < < cache_lock_filename_ ) ;
}
}
else
{
if ( CV_OPENCL_CACHE_WRITE )
{
CV_LOG_WARNING ( NULL , " OpenCL cache lock is disabled while cache write is allowed "
" (not safe for multiprocess environment) " ) ;
}
else
{
CV_LOG_INFO ( NULL , " OpenCL cache lock is disabled " ) ;
}
}
}
catch ( const cv : : Exception & e )
{
CV_LOG_WARNING ( NULL , " Can't prepare OpenCL program cache: " < < cache_path_ < < std : : endl < < e . what ( ) ) ;
clear ( ) ;
}
} while ( 0 ) ;
if ( ! cache_path_ . empty ( ) )
{
if ( cache_lock_ . empty ( ) & & CV_OPENCL_CACHE_LOCK_ENABLE )
{
CV_LOG_WARNING ( NULL , " Initialized OpenCL cache directory, but interprocess synchronization lock is not available. "
" Consider to disable OpenCL cache: OPENCV_OPENCL_CACHE_DIR=disabled " ) ;
}
else
{
CV_LOG_INFO ( NULL , " Successfully initialized OpenCL cache directory: " < < cache_path_ ) ;
}
}
}
void clear ( )
{
cache_path_ . clear ( ) ;
cache_lock_filename_ . clear ( ) ;
cache_lock_ . release ( ) ;
}
2017-11-24 17:52:29 +08:00
std : : string prepareCacheDirectoryForContext ( const std : : string & ctx_prefix ,
const std : : string & cleanup_prefix )
2017-10-12 19:23:45 +08:00
{
if ( cache_path_ . empty ( ) )
return std : : string ( ) ;
2017-11-24 17:52:29 +08:00
AutoLock lock ( mutex_prepared_contexts_ ) ;
ContextCacheType : : iterator found_it = prepared_contexts_ . find ( ctx_prefix ) ;
if ( found_it ! = prepared_contexts_ . end ( ) )
return found_it - > second ;
2017-10-12 19:23:45 +08:00
CV_LOG_INFO ( NULL , " Preparing OpenCL cache configuration for context: " < < ctx_prefix ) ;
std : : string target_directory = cache_path_ + ctx_prefix + " / " ;
bool result = utils : : fs : : isDirectory ( target_directory ) ;
if ( ! result )
{
try
{
CV_LOG_VERBOSE ( NULL , 0 , " Creating directory: " < < target_directory ) ;
if ( utils : : fs : : createDirectories ( target_directory ) )
{
result = true ;
}
else
{
CV_LOG_WARNING ( NULL , " Can't create directory: " < < target_directory ) ;
}
}
catch ( const cv : : Exception & e )
{
CV_LOG_ERROR ( NULL , " Can't create OpenCL program cache directory for context: " < < target_directory < < std : : endl < < e . what ( ) ) ;
}
}
target_directory = result ? target_directory : std : : string ( ) ;
prepared_contexts_ . insert ( std : : pair < std : : string , std : : string > ( ctx_prefix , target_directory ) ) ;
2017-11-24 17:52:29 +08:00
if ( result & & CV_OPENCL_CACHE_CLEANUP & & CV_OPENCL_CACHE_WRITE & & ! cleanup_prefix . empty ( ) )
{
try
{
std : : vector < String > entries ;
utils : : fs : : glob_relative ( cache_path_ , cleanup_prefix + " * " , entries , false , true ) ;
std : : vector < String > remove_entries ;
for ( size_t i = 0 ; i < entries . size ( ) ; i + + )
{
const String & name = entries [ i ] ;
if ( 0 = = name . find ( cleanup_prefix ) )
{
if ( 0 = = name . find ( ctx_prefix ) )
continue ; // skip current
remove_entries . push_back ( name ) ;
}
}
if ( ! remove_entries . empty ( ) )
{
CV_LOG_WARNING ( NULL , ( remove_entries . size ( ) = = 1
? " Detected OpenCL cache directory for other version of OpenCL device. "
: " Detected OpenCL cache directories for other versions of OpenCL device. " )
< < " We assume that these directories are obsolete after OpenCL runtime/drivers upgrade. " ) ;
CV_LOG_WARNING ( NULL , " Trying to remove these directories... " ) ;
for ( size_t i = 0 ; i < remove_entries . size ( ) ; i + + )
{
CV_LOG_WARNING ( NULL , " - " < < remove_entries [ i ] ) ;
}
2017-11-24 22:34:02 +08:00
CV_LOG_WARNING ( NULL , " Note: You can disable this behavior via this option: OPENCV_OPENCL_CACHE_CLEANUP=0 " ) ;
2017-11-24 17:52:29 +08:00
for ( size_t i = 0 ; i < remove_entries . size ( ) ; i + + )
{
const String & name = remove_entries [ i ] ;
cv : : String path = utils : : fs : : join ( cache_path_ , name ) ;
try
{
utils : : fs : : remove_all ( path ) ;
CV_LOG_WARNING ( NULL , " Removed: " < < path ) ;
}
catch ( const cv : : Exception & e )
{
CV_LOG_ERROR ( NULL , " Exception during removal of obsolete OpenCL cache directory: " < < path < < std : : endl < < e . what ( ) ) ;
}
}
}
}
catch ( . . . )
{
CV_LOG_WARNING ( NULL , " Can't check for obsolete OpenCL cache directories " ) ;
}
}
2017-10-12 19:23:45 +08:00
2017-11-24 17:52:29 +08:00
CV_LOG_VERBOSE ( NULL , 1 , " Result: " < < ( target_directory . empty ( ) ? std : : string ( " Failed " ) : target_directory ) ) ;
2017-10-12 19:23:45 +08:00
return target_directory ;
}
static OpenCLBinaryCacheConfigurator & getSingletonInstance ( )
{
CV_SINGLETON_LAZY_INIT_REF ( OpenCLBinaryCacheConfigurator , new OpenCLBinaryCacheConfigurator ( ) ) ;
}
} ;
class BinaryProgramFile
{
enum { MAX_ENTRIES = 64 } ;
typedef unsigned int uint32_t ;
struct CV_DECL_ALIGNED ( 4 ) FileHeader
{
uint32_t sourceSignatureSize ;
//char sourceSignature[];
} ;
struct CV_DECL_ALIGNED ( 4 ) FileTable
{
uint32_t numberOfEntries ;
//uint32_t firstEntryOffset[];
} ;
struct CV_DECL_ALIGNED ( 4 ) FileEntry
{
uint32_t nextEntryFileOffset ; // 0 for the last entry in chain
uint32_t keySize ;
uint32_t dataSize ;
//char key[];
//char data[];
} ;
const std : : string fileName_ ;
const char * const sourceSignature_ ;
const size_t sourceSignatureSize_ ;
std : : fstream f ;
uint32_t entryOffsets [ MAX_ENTRIES ] ;
uint32_t getHash ( const std : : string & options )
{
uint64 hash = crc64 ( ( const uchar * ) options . c_str ( ) , options . size ( ) , 0 ) ;
return hash & ( MAX_ENTRIES - 1 ) ;
}
inline size_t getFileSize ( )
{
size_t pos = ( size_t ) f . tellg ( ) ;
f . seekg ( 0 , std : : fstream : : end ) ;
size_t fileSize = ( size_t ) f . tellg ( ) ;
f . seekg ( pos , std : : fstream : : beg ) ;
return fileSize ;
}
inline uint32_t readUInt32 ( )
{
uint32_t res = 0 ;
f . read ( ( char * ) & res , sizeof ( uint32_t ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
return res ;
}
inline void writeUInt32 ( const uint32_t value )
{
uint32_t v = value ;
f . write ( ( char * ) & v , sizeof ( uint32_t ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
inline void seekReadAbsolute ( size_t pos )
{
f . seekg ( pos , std : : fstream : : beg ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
inline void seekReadRelative ( size_t pos )
{
f . seekg ( pos , std : : fstream : : cur ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
inline void seekWriteAbsolute ( size_t pos )
{
f . seekp ( pos , std : : fstream : : beg ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
void clearFile ( )
{
f . close ( ) ;
if ( 0 ! = remove ( fileName_ . c_str ( ) ) )
CV_LOG_ERROR ( NULL , " Can't remove: " < < fileName_ ) ;
return ;
}
public :
BinaryProgramFile ( const std : : string & fileName , const char * sourceSignature )
: fileName_ ( fileName ) , sourceSignature_ ( sourceSignature ) , sourceSignatureSize_ ( sourceSignature_ ? strlen ( sourceSignature_ ) : 0 )
{
CV_StaticAssert ( sizeof ( uint32_t ) = = 4 , " " ) ;
CV_Assert ( sourceSignature_ ! = NULL ) ;
CV_Assert ( sourceSignatureSize_ > 0 ) ;
memset ( entryOffsets , 0 , sizeof ( entryOffsets ) ) ;
f . rdbuf ( ) - > pubsetbuf ( 0 , 0 ) ; // disable buffering
f . open ( fileName_ . c_str ( ) , std : : ios : : in | std : : ios : : out | std : : ios : : binary ) ;
if ( f . is_open ( ) & & getFileSize ( ) > 0 )
{
bool isValid = false ;
try
{
uint32_t fileSourceSignatureSize = readUInt32 ( ) ;
if ( fileSourceSignatureSize = = sourceSignatureSize_ )
{
cv : : AutoBuffer < char > fileSourceSignature ( fileSourceSignatureSize + 1 ) ;
2018-06-11 06:42:00 +08:00
f . read ( fileSourceSignature . data ( ) , fileSourceSignatureSize ) ;
2017-10-12 19:23:45 +08:00
if ( f . eof ( ) )
{
CV_LOG_ERROR ( NULL , " Unexpected EOF " ) ;
}
2018-06-11 06:42:00 +08:00
else if ( memcmp ( sourceSignature , fileSourceSignature . data ( ) , fileSourceSignatureSize ) = = 0 )
2017-10-12 19:23:45 +08:00
{
isValid = true ;
}
}
if ( ! isValid )
{
CV_LOG_ERROR ( NULL , " Source code signature/hash mismatch (program source code has been changed/updated) " ) ;
}
}
catch ( const cv : : Exception & e )
{
CV_LOG_ERROR ( NULL , " Can't open binary program file: " < < fileName < < " : " < < e . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_ERROR ( NULL , " Can't open binary program file: " < < fileName < < " : Unknown error " ) ;
}
if ( ! isValid )
{
clearFile ( ) ;
}
else
{
seekReadAbsolute ( 0 ) ;
}
}
}
bool read ( const std : : string & key , std : : vector < char > & buf )
{
if ( ! f . is_open ( ) )
return false ;
size_t fileSize = getFileSize ( ) ;
if ( fileSize = = 0 )
{
CV_LOG_ERROR ( NULL , " Invalid file (empty): " < < fileName_ ) ;
clearFile ( ) ;
return false ;
}
seekReadAbsolute ( 0 ) ;
// bypass FileHeader
uint32_t fileSourceSignatureSize = readUInt32 ( ) ;
CV_Assert ( fileSourceSignatureSize > 0 ) ;
seekReadRelative ( fileSourceSignatureSize ) ;
uint32_t numberOfEntries = readUInt32 ( ) ;
CV_Assert ( numberOfEntries > 0 ) ;
if ( numberOfEntries ! = MAX_ENTRIES )
{
CV_LOG_ERROR ( NULL , " Invalid file: " < < fileName_ ) ;
clearFile ( ) ;
return false ;
}
f . read ( ( char * ) & entryOffsets [ 0 ] , sizeof ( entryOffsets ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
uint32_t entryNum = getHash ( key ) ;
uint32_t entryOffset = entryOffsets [ entryNum ] ;
FileEntry entry ;
while ( entryOffset > 0 )
{
seekReadAbsolute ( entryOffset ) ;
//CV_StaticAssert(sizeof(entry) == sizeof(uint32_t) * 3, "");
f . read ( ( char * ) & entry , sizeof ( entry ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
cv : : AutoBuffer < char > fileKey ( entry . keySize + 1 ) ;
if ( key . size ( ) = = entry . keySize )
{
if ( entry . keySize > 0 )
{
2018-06-11 06:42:00 +08:00
f . read ( fileKey . data ( ) , entry . keySize ) ;
2017-10-12 19:23:45 +08:00
CV_Assert ( ! f . fail ( ) ) ;
}
2018-06-11 06:42:00 +08:00
if ( memcmp ( fileKey . data ( ) , key . c_str ( ) , entry . keySize ) = = 0 )
2017-10-12 19:23:45 +08:00
{
buf . resize ( entry . dataSize ) ;
f . read ( & buf [ 0 ] , entry . dataSize ) ;
CV_Assert ( ! f . fail ( ) ) ;
seekReadAbsolute ( 0 ) ;
CV_LOG_VERBOSE ( NULL , 0 , " Read... " ) ;
return true ;
}
}
if ( entry . nextEntryFileOffset = = 0 )
break ;
entryOffset = entry . nextEntryFileOffset ;
}
return false ;
}
bool write ( const std : : string & key , std : : vector < char > & buf )
{
if ( ! f . is_open ( ) )
{
f . open ( fileName_ . c_str ( ) , std : : ios : : in | std : : ios : : out | std : : ios : : binary ) ;
if ( ! f . is_open ( ) )
{
f . open ( fileName_ . c_str ( ) , std : : ios : : out | std : : ios : : binary ) ;
if ( ! f . is_open ( ) )
{
CV_LOG_ERROR ( NULL , " Can't create file: " < < fileName_ ) ;
return false ;
}
}
}
size_t fileSize = getFileSize ( ) ;
if ( fileSize = = 0 )
{
// Write header
seekWriteAbsolute ( 0 ) ;
writeUInt32 ( ( uint32_t ) sourceSignatureSize_ ) ;
f . write ( sourceSignature_ , sourceSignatureSize_ ) ;
CV_Assert ( ! f . fail ( ) ) ;
writeUInt32 ( MAX_ENTRIES ) ;
memset ( entryOffsets , 0 , sizeof ( entryOffsets ) ) ;
f . write ( ( char * ) entryOffsets , sizeof ( entryOffsets ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
f . flush ( ) ;
CV_Assert ( ! f . fail ( ) ) ;
f . close ( ) ;
f . open ( fileName_ . c_str ( ) , std : : ios : : in | std : : ios : : out | std : : ios : : binary ) ;
CV_Assert ( f . is_open ( ) ) ;
fileSize = getFileSize ( ) ;
}
seekReadAbsolute ( 0 ) ;
// bypass FileHeader
uint32_t fileSourceSignatureSize = readUInt32 ( ) ;
CV_Assert ( fileSourceSignatureSize = = sourceSignatureSize_ ) ;
seekReadRelative ( fileSourceSignatureSize ) ;
uint32_t numberOfEntries = readUInt32 ( ) ;
CV_Assert ( numberOfEntries > 0 ) ;
if ( numberOfEntries ! = MAX_ENTRIES )
{
CV_LOG_ERROR ( NULL , " Invalid file: " < < fileName_ ) ;
clearFile ( ) ;
return false ;
}
size_t tableEntriesOffset = ( size_t ) f . tellg ( ) ;
f . read ( ( char * ) & entryOffsets [ 0 ] , sizeof ( entryOffsets ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
uint32_t entryNum = getHash ( key ) ;
uint32_t entryOffset = entryOffsets [ entryNum ] ;
FileEntry entry ;
while ( entryOffset > 0 )
{
seekReadAbsolute ( entryOffset ) ;
//CV_StaticAssert(sizeof(entry) == sizeof(uint32_t) * 3, "");
f . read ( ( char * ) & entry , sizeof ( entry ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
cv : : AutoBuffer < char > fileKey ( entry . keySize + 1 ) ;
if ( key . size ( ) = = entry . keySize )
{
if ( entry . keySize > 0 )
{
2018-06-11 06:42:00 +08:00
f . read ( fileKey . data ( ) , entry . keySize ) ;
2017-10-12 19:23:45 +08:00
CV_Assert ( ! f . fail ( ) ) ;
}
2018-06-11 06:42:00 +08:00
if ( 0 = = memcmp ( fileKey . data ( ) , key . c_str ( ) , entry . keySize ) )
2017-10-12 19:23:45 +08:00
{
// duplicate
CV_LOG_VERBOSE ( NULL , 0 , " Duplicate key ignored: " < < fileName_ ) ;
return false ;
}
}
if ( entry . nextEntryFileOffset = = 0 )
break ;
entryOffset = entry . nextEntryFileOffset ;
}
seekReadAbsolute ( 0 ) ;
if ( entryOffset > 0 )
{
seekWriteAbsolute ( entryOffset ) ;
entry . nextEntryFileOffset = ( uint32_t ) fileSize ;
f . write ( ( char * ) & entry , sizeof ( entry ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
else
{
entryOffsets [ entryNum ] = ( uint32_t ) fileSize ;
seekWriteAbsolute ( tableEntriesOffset ) ;
f . write ( ( char * ) entryOffsets , sizeof ( entryOffsets ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
}
seekWriteAbsolute ( fileSize ) ;
entry . nextEntryFileOffset = 0 ;
entry . dataSize = ( uint32_t ) buf . size ( ) ;
entry . keySize = ( uint32_t ) key . size ( ) ;
f . write ( ( char * ) & entry , sizeof ( entry ) ) ;
CV_Assert ( ! f . fail ( ) ) ;
f . write ( key . c_str ( ) , entry . keySize ) ;
CV_Assert ( ! f . fail ( ) ) ;
f . write ( & buf [ 0 ] , entry . dataSize ) ;
CV_Assert ( ! f . fail ( ) ) ;
f . flush ( ) ;
CV_Assert ( ! f . fail ( ) ) ;
CV_LOG_VERBOSE ( NULL , 0 , " Write... ( " < < buf . size ( ) < < " bytes) " ) ;
return true ;
}
} ;
# endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
2020-08-12 02:13:52 +08:00
struct OpenCLExecutionContext : : Impl
{
ocl : : Context context_ ;
int device_ ; // device index in context
ocl : : Queue queue_ ;
int useOpenCL_ ;
protected :
Impl ( ) = delete ;
void _init_device ( cl_device_id deviceID )
{
CV_Assert ( deviceID ) ;
int ndevices = ( int ) context_ . ndevices ( ) ;
CV_Assert ( ndevices > 0 ) ;
bool found = false ;
for ( int i = 0 ; i < ndevices ; i + + )
{
ocl : : Device d = context_ . device ( i ) ;
cl_device_id dhandle = ( cl_device_id ) d . ptr ( ) ;
if ( dhandle = = deviceID )
{
device_ = i ;
found = true ;
break ;
}
}
CV_Assert ( found & & " OpenCL device can't work with passed OpenCL context " ) ;
}
void _init_device ( const ocl : : Device & device )
{
CV_Assert ( device . ptr ( ) ) ;
int ndevices = ( int ) context_ . ndevices ( ) ;
CV_Assert ( ndevices > 0 ) ;
bool found = false ;
for ( int i = 0 ; i < ndevices ; i + + )
{
ocl : : Device d = context_ . device ( i ) ;
if ( d . getImpl ( ) = = device . getImpl ( ) )
{
device_ = i ;
found = true ;
break ;
}
}
CV_Assert ( found & & " OpenCL device can't work with passed OpenCL context " ) ;
}
public :
Impl ( cl_platform_id platformID , cl_context context , cl_device_id deviceID )
: device_ ( 0 ) , useOpenCL_ ( - 1 )
{
CV_UNUSED ( platformID ) ;
CV_Assert ( context ) ;
CV_Assert ( deviceID ) ;
context_ = Context : : fromHandle ( context ) ;
_init_device ( deviceID ) ;
queue_ = Queue ( context_ , context_ . device ( device_ ) ) ;
}
Impl ( const ocl : : Context & context , const ocl : : Device & device , const ocl : : Queue & queue )
: device_ ( 0 ) , useOpenCL_ ( - 1 )
{
CV_Assert ( context . ptr ( ) ) ;
CV_Assert ( device . ptr ( ) ) ;
context_ = context ;
_init_device ( device ) ;
queue_ = queue ;
}
Impl ( const ocl : : Context & context , const ocl : : Device & device )
: device_ ( 0 ) , useOpenCL_ ( - 1 )
{
CV_Assert ( context . ptr ( ) ) ;
CV_Assert ( device . ptr ( ) ) ;
context_ = context ;
_init_device ( device ) ;
queue_ = Queue ( context_ , context_ . device ( device_ ) ) ;
}
Impl ( const ocl : : Context & context , const int device , const ocl : : Queue & queue )
: context_ ( context )
, device_ ( device )
, queue_ ( queue )
, useOpenCL_ ( - 1 )
{
// nothing
}
Impl ( const Impl & other )
: context_ ( other . context_ )
, device_ ( other . device_ )
, queue_ ( other . queue_ )
, useOpenCL_ ( - 1 )
{
// nothing
}
inline bool useOpenCL ( ) const { return const_cast < Impl * > ( this ) - > useOpenCL ( ) ; }
bool useOpenCL ( )
{
if ( useOpenCL_ < 0 )
{
try
{
useOpenCL_ = 0 ;
if ( ! context_ . empty ( ) & & context_ . ndevices ( ) > 0 )
{
const Device & d = context_ . device ( device_ ) ;
useOpenCL_ = d . available ( ) ;
}
}
catch ( const cv : : Exception & )
{
// nothing
}
if ( ! useOpenCL_ )
CV_LOG_INFO ( NULL , " OpenCL: can't use OpenCL execution context " ) ;
}
return useOpenCL_ > 0 ;
}
void setUseOpenCL ( bool flag )
{
if ( ! flag )
useOpenCL_ = 0 ;
else
useOpenCL_ = - 1 ;
}
static const std : : shared_ptr < Impl > & getInitializedExecutionContext ( )
{
CV_TRACE_FUNCTION ( ) ;
CV_LOG_INFO ( NULL , " OpenCL: initializing thread execution context " ) ;
static bool initialized = false ;
static std : : shared_ptr < Impl > g_primaryExecutionContext ;
if ( ! initialized )
{
cv : : AutoLock lock ( getInitializationMutex ( ) ) ;
if ( ! initialized )
{
CV_LOG_INFO ( NULL , " OpenCL: creating new execution context... " ) ;
try
{
Context c = ocl : : Context : : create ( std : : string ( ) ) ;
if ( c . ndevices ( ) )
{
int deviceId = 0 ;
auto & d = c . device ( deviceId ) ;
if ( d . available ( ) )
{
auto q = ocl : : Queue ( c , d ) ;
if ( ! q . ptr ( ) )
{
CV_LOG_ERROR ( NULL , " OpenCL: Can't create default OpenCL queue " ) ;
}
else
{
g_primaryExecutionContext = std : : make_shared < Impl > ( c , deviceId , q ) ;
CV_LOG_INFO ( NULL , " OpenCL: device= " < < d . name ( ) ) ;
}
}
else
{
CV_LOG_ERROR ( NULL , " OpenCL: OpenCL device is not available (CL_DEVICE_AVAILABLE returns false) " ) ;
}
}
else
{
CV_LOG_INFO ( NULL , " OpenCL: context is not available/disabled " ) ;
}
}
catch ( const std : : exception & e )
{
CV_LOG_INFO ( NULL , " OpenCL: Can't initialize OpenCL context/device/queue: " < < e . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_WARNING ( NULL , " OpenCL: Can't initialize OpenCL context/device/queue: unknown C++ exception " ) ;
}
initialized = true ;
}
}
return g_primaryExecutionContext ;
}
} ;
Context & OpenCLExecutionContext : : getContext ( ) const
{
CV_Assert ( p ) ;
return p - > context_ ;
}
Device & OpenCLExecutionContext : : getDevice ( ) const
{
CV_Assert ( p ) ;
return p - > context_ . device ( p - > device_ ) ;
}
Queue & OpenCLExecutionContext : : getQueue ( ) const
{
CV_Assert ( p ) ;
return p - > queue_ ;
}
bool OpenCLExecutionContext : : useOpenCL ( ) const
{
if ( p )
return p - > useOpenCL ( ) ;
return false ;
}
void OpenCLExecutionContext : : setUseOpenCL ( bool flag )
{
CV_Assert ( p ) ;
p - > setUseOpenCL ( flag ) ;
}
/* static */
OpenCLExecutionContext & OpenCLExecutionContext : : getCurrent ( )
{
CV_TRACE_FUNCTION ( ) ;
CoreTLSData & data = getCoreTlsData ( ) ;
OpenCLExecutionContext & c = data . oclExecutionContext ;
if ( ! data . oclExecutionContextInitialized )
{
data . oclExecutionContextInitialized = true ;
if ( c . empty ( ) & & haveOpenCL ( ) )
c . p = Impl : : getInitializedExecutionContext ( ) ;
}
return c ;
}
/* static */
OpenCLExecutionContext & OpenCLExecutionContext : : getCurrentRef ( )
{
CV_TRACE_FUNCTION ( ) ;
CoreTLSData & data = getCoreTlsData ( ) ;
OpenCLExecutionContext & c = data . oclExecutionContext ;
return c ;
}
void OpenCLExecutionContext : : bind ( ) const
{
CV_TRACE_FUNCTION ( ) ;
CV_Assert ( p ) ;
CoreTLSData & data = getCoreTlsData ( ) ;
data . oclExecutionContext = * this ;
data . oclExecutionContextInitialized = true ;
data . useOpenCL = p - > useOpenCL_ ; // propagate "-1", avoid call useOpenCL()
}
OpenCLExecutionContext OpenCLExecutionContext : : cloneWithNewQueue ( ) const
{
CV_TRACE_FUNCTION ( ) ;
CV_Assert ( p ) ;
const Queue q ( getContext ( ) , getDevice ( ) ) ;
return cloneWithNewQueue ( q ) ;
}
OpenCLExecutionContext OpenCLExecutionContext : : cloneWithNewQueue ( const ocl : : Queue & q ) const
{
CV_TRACE_FUNCTION ( ) ;
CV_Assert ( p ) ;
CV_Assert ( q . ptr ( ) ! = NULL ) ;
OpenCLExecutionContext c ;
c . p = std : : make_shared < Impl > ( p - > context_ , p - > device_ , q ) ;
return c ;
}
/* static */
OpenCLExecutionContext OpenCLExecutionContext : : create ( const Context & context , const Device & device , const ocl : : Queue & queue )
{
CV_TRACE_FUNCTION ( ) ;
if ( ! haveOpenCL ( ) )
CV_Error ( cv : : Error : : OpenCLApiCallError , " OpenCL runtime is not available! " ) ;
CV_Assert ( ! context . empty ( ) ) ;
CV_Assert ( context . ptr ( ) ) ;
CV_Assert ( ! device . empty ( ) ) ;
CV_Assert ( device . ptr ( ) ) ;
OpenCLExecutionContext ctx ;
ctx . p = std : : make_shared < OpenCLExecutionContext : : Impl > ( context , device , queue ) ;
return ctx ;
}
/* static */
OpenCLExecutionContext OpenCLExecutionContext : : create ( const Context & context , const Device & device )
{
CV_TRACE_FUNCTION ( ) ;
if ( ! haveOpenCL ( ) )
CV_Error ( cv : : Error : : OpenCLApiCallError , " OpenCL runtime is not available! " ) ;
CV_Assert ( ! context . empty ( ) ) ;
CV_Assert ( context . ptr ( ) ) ;
CV_Assert ( ! device . empty ( ) ) ;
CV_Assert ( device . ptr ( ) ) ;
OpenCLExecutionContext ctx ;
ctx . p = std : : make_shared < OpenCLExecutionContext : : Impl > ( context , device ) ;
return ctx ;
}
void OpenCLExecutionContext : : release ( )
{
CV_TRACE_FUNCTION ( ) ;
p . reset ( ) ;
}
2021-01-26 06:34:41 +08:00
2017-11-24 22:34:02 +08:00
// true if we have initialized OpenCL subsystem with available platforms
2021-01-24 09:24:32 +08:00
static bool g_isOpenCLInitialized = false ;
static bool g_isOpenCLAvailable = false ;
2017-11-24 22:34:02 +08:00
2013-10-22 18:04:49 +08:00
bool haveOpenCL ( )
{
2017-11-24 22:34:02 +08:00
CV_TRACE_FUNCTION ( ) ;
2014-02-03 19:12:27 +08:00
2013-12-15 03:16:53 +08:00
if ( ! g_isOpenCLInitialized )
2013-11-21 17:05:32 +08:00
{
2017-11-24 22:34:02 +08:00
CV_TRACE_REGION ( " Init_OpenCL_Runtime " ) ;
const char * envPath = getenv ( " OPENCV_OPENCL_RUNTIME " ) ;
if ( envPath )
{
if ( cv : : String ( envPath ) = = " disabled " )
{
g_isOpenCLAvailable = false ;
g_isOpenCLInitialized = true ;
2020-08-12 02:13:52 +08:00
return false ;
2017-11-24 22:34:02 +08:00
}
}
2020-08-12 02:13:52 +08:00
cv : : AutoLock lock ( getInitializationMutex ( ) ) ;
2017-11-24 22:34:02 +08:00
CV_LOG_INFO ( NULL , " Initialize OpenCL runtime... " ) ;
2013-12-15 03:16:53 +08:00
try
2013-11-21 17:05:32 +08:00
{
2013-12-15 03:16:53 +08:00
cl_uint n = 0 ;
g_isOpenCLAvailable = : : clGetPlatformIDs ( 0 , NULL , & n ) = = CL_SUCCESS ;
2021-01-24 09:24:32 +08:00
g_isOpenCLAvailable & = n > 0 ;
2020-08-12 02:13:52 +08:00
CV_LOG_INFO ( NULL , " OpenCL: found " < < n < < " platforms " ) ;
2013-11-21 17:05:32 +08:00
}
2013-12-15 03:16:53 +08:00
catch ( . . . )
{
g_isOpenCLAvailable = false ;
2013-11-21 17:05:32 +08:00
}
2013-12-15 03:16:53 +08:00
g_isOpenCLInitialized = true ;
2013-11-21 17:05:32 +08:00
}
return g_isOpenCLAvailable ;
2013-10-22 18:04:49 +08:00
}
bool useOpenCL ( )
{
2019-10-13 19:14:41 +08:00
CoreTLSData & data = getCoreTlsData ( ) ;
if ( data . useOpenCL < 0 )
2014-07-03 16:18:19 +08:00
{
2018-11-09 00:46:25 +08:00
try
2014-07-03 16:18:19 +08:00
{
2020-08-12 02:13:52 +08:00
data . useOpenCL = 0 ;
if ( haveOpenCL ( ) )
{
auto c = OpenCLExecutionContext : : getCurrent ( ) ;
data . useOpenCL = c . useOpenCL ( ) ;
}
2014-07-03 16:18:19 +08:00
}
2018-11-09 00:46:25 +08:00
catch ( . . . )
2014-07-03 16:18:19 +08:00
{
2020-08-12 02:13:52 +08:00
CV_LOG_INFO ( NULL , " OpenCL: can't initialize thread OpenCL execution context " ) ;
2014-07-03 16:18:19 +08:00
}
}
2019-10-13 19:14:41 +08:00
return data . useOpenCL > 0 ;
2013-10-22 18:04:49 +08:00
}
2017-11-24 22:34:02 +08:00
bool isOpenCLActivated ( )
{
2021-01-24 09:24:32 +08:00
if ( ! g_isOpenCLAvailable )
2017-11-24 22:34:02 +08:00
return false ; // prevent unnecessary OpenCL activation via useOpenCL()->haveOpenCL() calls
return useOpenCL ( ) ;
}
2013-10-25 20:46:03 +08:00
void setUseOpenCL ( bool flag )
{
2017-11-24 22:34:02 +08:00
CV_TRACE_FUNCTION ( ) ;
2019-10-13 19:14:41 +08:00
CoreTLSData & data = getCoreTlsData ( ) ;
2020-08-12 02:13:52 +08:00
auto & c = OpenCLExecutionContext : : getCurrentRef ( ) ;
if ( ! c . empty ( ) )
2013-10-25 20:46:03 +08:00
{
2020-08-12 02:13:52 +08:00
c . setUseOpenCL ( flag ) ;
data . useOpenCL = c . useOpenCL ( ) ;
2017-11-24 22:34:02 +08:00
}
2020-08-12 02:13:52 +08:00
else
2017-11-24 22:34:02 +08:00
{
2020-08-12 02:13:52 +08:00
if ( ! flag )
data . useOpenCL = 0 ;
else
data . useOpenCL = - 1 ; // enabled by default (if context is not initialized)
2013-10-25 20:46:03 +08:00
}
}
2020-08-12 02:13:52 +08:00
2013-12-15 03:16:53 +08:00
# ifdef HAVE_CLAMDBLAS
class AmdBlasHelper
{
public :
static AmdBlasHelper & getInstance ( )
{
2015-06-23 19:31:01 +08:00
CV_SINGLETON_LAZY_INIT_REF ( AmdBlasHelper , new AmdBlasHelper ( ) )
2013-12-15 03:16:53 +08:00
}
bool isAvailable ( ) const
{
return g_isAmdBlasAvailable ;
}
~ AmdBlasHelper ( )
{
2021-06-08 04:24:27 +08:00
// Do not tear down clBLAS.
// The user application may still use clBLAS even after OpenCV is unloaded.
/*try
2013-12-15 03:16:53 +08:00
{
2021-06-08 04:24:27 +08:00
clblasTeardown ( ) ;
2013-12-15 03:16:53 +08:00
}
2021-06-08 04:24:27 +08:00
catch ( . . . ) { } */
2013-12-15 03:16:53 +08:00
}
protected :
AmdBlasHelper ( )
{
if ( ! g_isAmdBlasInitialized )
{
2015-06-23 19:31:01 +08:00
AutoLock lock ( getInitializationMutex ( ) ) ;
2013-12-15 03:16:53 +08:00
2015-06-23 19:31:01 +08:00
if ( ! g_isAmdBlasInitialized )
2013-12-15 03:16:53 +08:00
{
2015-06-23 19:31:01 +08:00
if ( haveOpenCL ( ) )
2013-12-15 03:16:53 +08:00
{
2015-06-23 19:31:01 +08:00
try
{
2021-06-08 04:24:27 +08:00
g_isAmdBlasAvailable = clblasSetup ( ) = = clblasSuccess ;
2015-06-23 19:31:01 +08:00
}
catch ( . . . )
{
g_isAmdBlasAvailable = false ;
}
2013-12-15 03:16:53 +08:00
}
2015-06-23 19:31:01 +08:00
else
2013-12-15 03:16:53 +08:00
g_isAmdBlasAvailable = false ;
2015-06-23 19:31:01 +08:00
g_isAmdBlasInitialized = true ;
}
2013-12-15 03:16:53 +08:00
}
}
private :
static bool g_isAmdBlasInitialized ;
static bool g_isAmdBlasAvailable ;
} ;
bool AmdBlasHelper : : g_isAmdBlasAvailable = false ;
bool AmdBlasHelper : : g_isAmdBlasInitialized = false ;
bool haveAmdBlas ( )
{
return AmdBlasHelper : : getInstance ( ) . isAvailable ( ) ;
}
# else
bool haveAmdBlas ( )
{
return false ;
}
# endif
2013-12-17 21:36:26 +08:00
# ifdef HAVE_CLAMDFFT
class AmdFftHelper
{
public :
static AmdFftHelper & getInstance ( )
{
2015-06-23 19:31:01 +08:00
CV_SINGLETON_LAZY_INIT_REF ( AmdFftHelper , new AmdFftHelper ( ) )
2013-12-17 21:36:26 +08:00
}
bool isAvailable ( ) const
{
return g_isAmdFftAvailable ;
}
~ AmdFftHelper ( )
{
2021-06-08 04:24:27 +08:00
// Do not tear down clFFT.
// The user application may still use clFFT even after OpenCV is unloaded.
/*try
2013-12-17 21:36:26 +08:00
{
2021-06-08 04:24:27 +08:00
clfftTeardown ( ) ;
2013-12-17 21:36:26 +08:00
}
2021-06-08 04:24:27 +08:00
catch ( . . . ) { } */
2013-12-17 21:36:26 +08:00
}
protected :
AmdFftHelper ( )
{
if ( ! g_isAmdFftInitialized )
{
2015-06-23 19:31:01 +08:00
AutoLock lock ( getInitializationMutex ( ) ) ;
2013-12-17 21:36:26 +08:00
2015-06-23 19:31:01 +08:00
if ( ! g_isAmdFftInitialized )
2013-12-17 21:36:26 +08:00
{
2015-06-23 19:31:01 +08:00
if ( haveOpenCL ( ) )
2013-12-17 21:36:26 +08:00
{
2015-06-23 19:31:01 +08:00
try
{
cl_uint major , minor , patch ;
2021-06-08 04:24:27 +08:00
CV_Assert ( clfftInitSetupData ( & setupData ) = = CLFFT_SUCCESS ) ;
2014-08-29 18:23:18 +08:00
2015-06-23 19:31:01 +08:00
// it throws exception in case AmdFft binaries are not found
2021-06-08 04:24:27 +08:00
CV_Assert ( clfftGetVersion ( & major , & minor , & patch ) = = CLFFT_SUCCESS ) ;
2015-06-23 19:31:01 +08:00
g_isAmdFftAvailable = true ;
}
catch ( const Exception & )
{
g_isAmdFftAvailable = false ;
}
2013-12-17 21:36:26 +08:00
}
2015-06-23 19:31:01 +08:00
else
2013-12-17 21:36:26 +08:00
g_isAmdFftAvailable = false ;
2015-06-23 19:31:01 +08:00
g_isAmdFftInitialized = true ;
}
2013-12-17 21:36:26 +08:00
}
}
private :
2021-06-08 04:24:27 +08:00
static clfftSetupData setupData ;
2013-12-17 21:36:26 +08:00
static bool g_isAmdFftInitialized ;
static bool g_isAmdFftAvailable ;
} ;
2021-06-08 04:24:27 +08:00
clfftSetupData AmdFftHelper : : setupData ;
2013-12-17 21:36:26 +08:00
bool AmdFftHelper : : g_isAmdFftAvailable = false ;
bool AmdFftHelper : : g_isAmdFftInitialized = false ;
bool haveAmdFft ( )
{
return AmdFftHelper : : getInstance ( ) . isAvailable ( ) ;
}
# else
bool haveAmdFft ( )
{
return false ;
}
# endif
2015-01-02 08:33:40 +08:00
bool haveSVM ( )
{
# ifdef HAVE_OPENCL_SVM
return true ;
# else
return false ;
# endif
}
2014-02-01 00:23:01 +08:00
void finish ( )
2013-10-22 18:04:49 +08:00
{
Queue : : getDefault ( ) . finish ( ) ;
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////////// Platform /////////////////////////////////////////////
2013-10-22 18:04:49 +08:00
struct Platform : : Impl
{
Impl ( )
{
refcount = 1 ;
handle = 0 ;
initialized = false ;
}
~ Impl ( ) { }
void init ( )
{
if ( ! initialized )
{
//cl_uint num_entries
cl_uint n = 0 ;
2014-02-01 19:07:03 +08:00
if ( clGetPlatformIDs ( 1 , & handle , & n ) ! = CL_SUCCESS | | n = = 0 )
2013-10-22 18:04:49 +08:00
handle = 0 ;
if ( handle ! = 0 )
{
char buf [ 1000 ] ;
size_t len = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetPlatformInfo ( handle , CL_PLATFORM_VENDOR , sizeof ( buf ) , buf , & len ) ) ;
2013-10-22 18:04:49 +08:00
buf [ len ] = ' \0 ' ;
vendor = String ( buf ) ;
}
initialized = true ;
}
}
IMPLEMENT_REFCOUNTABLE ( ) ;
cl_platform_id handle ;
String vendor ;
bool initialized ;
} ;
2021-02-20 21:16:47 +08:00
Platform : : Platform ( ) CV_NOEXCEPT
2013-10-22 18:04:49 +08:00
{
p = 0 ;
}
Platform : : ~ Platform ( )
{
if ( p )
p - > release ( ) ;
}
Platform : : Platform ( const Platform & pl )
{
p = ( Impl * ) pl . p ;
if ( p )
p - > addref ( ) ;
}
Platform & Platform : : operator = ( const Platform & pl )
{
Impl * newp = ( Impl * ) pl . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Platform : : Platform ( Platform & & pl ) CV_NOEXCEPT
{
p = pl . p ;
pl . p = nullptr ;
}
Platform & Platform : : operator = ( Platform & & pl ) CV_NOEXCEPT
{
if ( this ! = & pl ) {
if ( p )
p - > release ( ) ;
p = pl . p ;
pl . p = nullptr ;
}
return * this ;
}
2013-10-22 18:04:49 +08:00
void * Platform : : ptr ( ) const
{
return p ? p - > handle : 0 ;
}
Platform & Platform : : getDefault ( )
{
2020-08-12 02:13:52 +08:00
CV_LOG_ONCE_WARNING ( NULL , " OpenCL: Platform::getDefault() is deprecated and will be removed. Use cv::ocl::getPlatfomsInfo() for enumeration of available platforms " ) ;
2013-10-22 18:04:49 +08:00
static Platform p ;
if ( ! p . p )
{
p . p = new Impl ;
p . p - > init ( ) ;
}
return p ;
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////// Device ////////////////////////////////////////////
2020-12-14 03:03:11 +08:00
// Version has format:
2014-02-01 19:07:03 +08:00
// OpenCL<space><major_version.minor_version><space><vendor-specific information>
// by specification
// http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetDeviceInfo.html
// http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetDeviceInfo.html
2020-12-14 03:03:11 +08:00
// https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/clGetPlatformInfo.html
// https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clGetPlatformInfo.html
static void parseOpenCLVersion ( const String & version , int & major , int & minor )
2014-02-01 19:07:03 +08:00
{
major = minor = 0 ;
2020-12-14 03:03:11 +08:00
if ( 10 > = version . length ( ) )
2014-02-01 19:07:03 +08:00
return ;
2020-12-14 03:03:11 +08:00
const char * pstr = version . c_str ( ) ;
2014-02-01 19:07:03 +08:00
if ( 0 ! = strncmp ( pstr , " OpenCL " , 7 ) )
return ;
2020-12-14 03:03:11 +08:00
size_t ppos = version . find ( ' . ' , 7 ) ;
2014-02-01 19:07:03 +08:00
if ( String : : npos = = ppos )
return ;
2020-12-14 03:03:11 +08:00
String temp = version . substr ( 7 , ppos - 7 ) ;
2014-02-01 19:07:03 +08:00
major = atoi ( temp . c_str ( ) ) ;
2020-12-14 03:03:11 +08:00
temp = version . substr ( ppos + 1 ) ;
2014-02-01 19:07:03 +08:00
minor = atoi ( temp . c_str ( ) ) ;
}
2013-10-22 18:04:49 +08:00
struct Device : : Impl
{
Impl ( void * d )
2020-08-12 02:13:52 +08:00
: refcount ( 1 )
, handle ( 0 )
{
try
{
cl_device_id device = ( cl_device_id ) d ;
_init ( device ) ;
CV_OCL_CHECK ( clRetainDevice ( device ) ) ; // increment reference counter on success only
}
catch ( . . . )
{
throw ;
}
}
void _init ( cl_device_id d )
2013-10-22 18:04:49 +08:00
{
handle = ( cl_device_id ) d ;
2014-01-31 18:00:05 +08:00
name_ = getStrProp ( CL_DEVICE_NAME ) ;
version_ = getStrProp ( CL_DEVICE_VERSION ) ;
2017-09-07 04:15:47 +08:00
extensions_ = getStrProp ( CL_DEVICE_EXTENSIONS ) ;
2014-01-31 18:00:05 +08:00
doubleFPConfig_ = getProp < cl_device_fp_config , int > ( CL_DEVICE_DOUBLE_FP_CONFIG ) ;
2021-06-21 11:46:32 +08:00
halfFPConfig_ = getProp < cl_device_fp_config , int > ( CL_DEVICE_HALF_FP_CONFIG ) ;
2014-01-31 18:00:05 +08:00
hostUnifiedMemory_ = getBoolProp ( CL_DEVICE_HOST_UNIFIED_MEMORY ) ;
maxComputeUnits_ = getProp < cl_uint , int > ( CL_DEVICE_MAX_COMPUTE_UNITS ) ;
maxWorkGroupSize_ = getProp < size_t , size_t > ( CL_DEVICE_MAX_WORK_GROUP_SIZE ) ;
type_ = getProp < cl_device_type , int > ( CL_DEVICE_TYPE ) ;
driverVersion_ = getStrProp ( CL_DRIVER_VERSION ) ;
2017-12-01 19:02:20 +08:00
addressBits_ = getProp < cl_uint , int > ( CL_DEVICE_ADDRESS_BITS ) ;
2014-02-01 19:07:03 +08:00
String deviceVersion_ = getStrProp ( CL_DEVICE_VERSION ) ;
2020-12-14 03:03:11 +08:00
parseOpenCLVersion ( deviceVersion_ , deviceVersionMajor_ , deviceVersionMinor_ ) ;
2014-03-05 15:25:37 +08:00
2017-09-07 04:15:47 +08:00
size_t pos = 0 ;
while ( pos < extensions_ . size ( ) )
{
size_t pos2 = extensions_ . find ( ' ' , pos ) ;
if ( pos2 = = String : : npos )
pos2 = extensions_ . size ( ) ;
if ( pos2 > pos )
{
std : : string extensionName = extensions_ . substr ( pos , pos2 - pos ) ;
extensions_set_ . insert ( extensionName ) ;
}
pos = pos2 + 1 ;
}
Merge pull request #8104 from insoow:master
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91b6c4aec170f691c5d2e6c47d6520d7.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f54695dc293598addce9b9d7e345225bede.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
2017-04-19 17:57:54 +08:00
intelSubgroupsSupport_ = isExtensionSupported ( " cl_intel_subgroups " ) ;
2014-03-05 19:04:44 +08:00
vendorName_ = getStrProp ( CL_DEVICE_VENDOR ) ;
if ( vendorName_ = = " Advanced Micro Devices, Inc. " | |
vendorName_ = = " AMD " )
vendorID_ = VENDOR_AMD ;
2023-04-24 21:01:53 +08:00
else if ( vendorName_ = = " Intel(R) Corporation " | | vendorName_ = = " Intel " | | vendorName_ = = " Intel Inc. " | | strstr ( name_ . c_str ( ) , " Iris " ) ! = 0 )
2014-03-05 19:04:44 +08:00
vendorID_ = VENDOR_INTEL ;
else if ( vendorName_ = = " NVIDIA Corporation " )
vendorID_ = VENDOR_NVIDIA ;
2014-03-05 15:25:37 +08:00
else
2014-03-05 19:04:44 +08:00
vendorID_ = UNKNOWN_VENDOR ;
2017-12-03 01:48:30 +08:00
2018-12-18 02:31:49 +08:00
const size_t CV_OPENCL_DEVICE_MAX_WORK_GROUP_SIZE = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_DEVICE_MAX_WORK_GROUP_SIZE " , 0 ) ;
if ( CV_OPENCL_DEVICE_MAX_WORK_GROUP_SIZE > 0 )
{
const size_t new_maxWorkGroupSize = std : : min ( maxWorkGroupSize_ , CV_OPENCL_DEVICE_MAX_WORK_GROUP_SIZE ) ;
if ( new_maxWorkGroupSize ! = maxWorkGroupSize_ )
CV_LOG_WARNING ( NULL , " OpenCL: using workgroup size: " < < new_maxWorkGroupSize < < " (was " < < maxWorkGroupSize_ < < " ) " ) ;
maxWorkGroupSize_ = new_maxWorkGroupSize ;
}
2017-12-03 01:48:30 +08:00
#if 0
if ( isExtensionSupported ( " cl_khr_spir " ) )
{
# ifndef CL_DEVICE_SPIR_VERSIONS
# define CL_DEVICE_SPIR_VERSIONS 0x40E0
# endif
cv : : String spir_versions = getStrProp ( CL_DEVICE_SPIR_VERSIONS ) ;
std : : cout < < spir_versions < < std : : endl ;
}
# endif
2013-10-22 18:04:49 +08:00
}
2020-08-12 02:13:52 +08:00
~ Impl ( )
{
# ifdef _WIN32
if ( ! cv : : __termination )
# endif
{
if ( handle )
{
CV_OCL_CHECK ( clReleaseDevice ( handle ) ) ;
handle = 0 ;
}
}
}
2013-10-22 18:04:49 +08:00
template < typename _TpCL , typename _TpOut >
_TpOut getProp ( cl_device_info prop ) const
{
_TpCL temp = _TpCL ( ) ;
size_t sz = 0 ;
2014-02-01 19:07:03 +08:00
return clGetDeviceInfo ( handle , prop , sizeof ( temp ) , & temp , & sz ) = = CL_SUCCESS & &
2013-10-22 18:04:49 +08:00
sz = = sizeof ( temp ) ? _TpOut ( temp ) : _TpOut ( ) ;
}
2013-10-23 03:34:16 +08:00
bool getBoolProp ( cl_device_info prop ) const
{
cl_bool temp = CL_FALSE ;
size_t sz = 0 ;
2014-02-01 19:07:03 +08:00
return clGetDeviceInfo ( handle , prop , sizeof ( temp ) , & temp , & sz ) = = CL_SUCCESS & &
2013-10-23 03:34:16 +08:00
sz = = sizeof ( temp ) ? temp ! = 0 : false ;
}
2013-10-22 18:04:49 +08:00
String getStrProp ( cl_device_info prop ) const
{
2017-12-03 01:48:30 +08:00
char buf [ 4096 ] ;
2013-10-22 18:04:49 +08:00
size_t sz = 0 ;
2014-02-01 19:07:03 +08:00
return clGetDeviceInfo ( handle , prop , sizeof ( buf ) - 16 , buf , & sz ) = = CL_SUCCESS & &
2013-10-22 18:04:49 +08:00
sz < sizeof ( buf ) ? String ( buf ) : String ( ) ;
}
2017-09-07 04:15:47 +08:00
bool isExtensionSupported ( const std : : string & extensionName ) const
Merge pull request #8104 from insoow:master
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91b6c4aec170f691c5d2e6c47d6520d7.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f54695dc293598addce9b9d7e345225bede.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
2017-04-19 17:57:54 +08:00
{
2017-09-07 04:15:47 +08:00
return extensions_set_ . count ( extensionName ) > 0 ;
Merge pull request #8104 from insoow:master
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91b6c4aec170f691c5d2e6c47d6520d7.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f54695dc293598addce9b9d7e345225bede.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
2017-04-19 17:57:54 +08:00
}
2013-10-22 18:04:49 +08:00
IMPLEMENT_REFCOUNTABLE ( ) ;
2017-09-07 04:15:47 +08:00
2013-10-22 18:04:49 +08:00
cl_device_id handle ;
2014-01-31 18:00:05 +08:00
String name_ ;
String version_ ;
2017-09-07 04:15:47 +08:00
std : : string extensions_ ;
2014-01-31 18:00:05 +08:00
int doubleFPConfig_ ;
2021-06-21 11:46:32 +08:00
int halfFPConfig_ ;
2014-01-31 18:00:05 +08:00
bool hostUnifiedMemory_ ;
int maxComputeUnits_ ;
size_t maxWorkGroupSize_ ;
int type_ ;
2017-12-01 19:02:20 +08:00
int addressBits_ ;
2014-02-01 19:07:03 +08:00
int deviceVersionMajor_ ;
int deviceVersionMinor_ ;
2014-01-31 18:00:05 +08:00
String driverVersion_ ;
2014-03-05 19:04:44 +08:00
String vendorName_ ;
int vendorID_ ;
Merge pull request #8104 from insoow:master
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91b6c4aec170f691c5d2e6c47d6520d7.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f54695dc293598addce9b9d7e345225bede.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
2017-04-19 17:57:54 +08:00
bool intelSubgroupsSupport_ ;
2017-09-07 04:15:47 +08:00
std : : set < std : : string > extensions_set_ ;
2013-10-22 18:04:49 +08:00
} ;
2021-02-20 21:16:47 +08:00
Device : : Device ( ) CV_NOEXCEPT
2013-10-22 18:04:49 +08:00
{
p = 0 ;
}
Device : : Device ( void * d )
{
p = 0 ;
set ( d ) ;
}
Device : : Device ( const Device & d )
{
p = d . p ;
if ( p )
p - > addref ( ) ;
}
Device & Device : : operator = ( const Device & d )
{
Impl * newp = ( Impl * ) d . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Device : : Device ( Device & & d ) CV_NOEXCEPT
{
p = d . p ;
d . p = nullptr ;
}
Device & Device : : operator = ( Device & & d ) CV_NOEXCEPT
{
if ( this ! = & d ) {
if ( p )
p - > release ( ) ;
p = d . p ;
d . p = nullptr ;
}
return * this ;
}
2013-10-22 18:04:49 +08:00
Device : : ~ Device ( )
{
if ( p )
p - > release ( ) ;
}
void Device : : set ( void * d )
{
if ( p )
p - > release ( ) ;
p = new Impl ( d ) ;
2020-08-12 02:13:52 +08:00
if ( p - > handle )
{
CV_OCL_CHECK ( clReleaseDevice ( ( cl_device_id ) d ) ) ;
}
}
Device Device : : fromHandle ( void * d )
{
Device device ( d ) ;
return device ;
2013-10-22 18:04:49 +08:00
}
void * Device : : ptr ( ) const
{
return p ? p - > handle : 0 ;
}
String Device : : name ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > name_ : String ( ) ; }
2013-10-22 18:04:49 +08:00
String Device : : extensions ( ) const
2017-09-07 04:15:47 +08:00
{ return p ? String ( p - > extensions_ ) : String ( ) ; }
bool Device : : isExtensionSupported ( const String & extensionName ) const
{ return p ? p - > isExtensionSupported ( extensionName ) : false ; }
2013-10-22 18:04:49 +08:00
2014-01-22 14:08:42 +08:00
String Device : : version ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > version_ : String ( ) ; }
2014-01-22 14:08:42 +08:00
2014-03-05 19:04:44 +08:00
String Device : : vendorName ( ) const
{ return p ? p - > vendorName_ : String ( ) ; }
int Device : : vendorID ( ) const
{ return p ? p - > vendorID_ : 0 ; }
2013-10-22 18:04:49 +08:00
String Device : : OpenCL_C_Version ( ) const
{ return p ? p - > getStrProp ( CL_DEVICE_OPENCL_C_VERSION ) : String ( ) ; }
String Device : : OpenCLVersion ( ) const
2017-02-16 17:50:57 +08:00
{ return p ? p - > getStrProp ( CL_DEVICE_VERSION ) : String ( ) ; }
2013-10-22 18:04:49 +08:00
2014-02-01 19:07:03 +08:00
int Device : : deviceVersionMajor ( ) const
{ return p ? p - > deviceVersionMajor_ : 0 ; }
int Device : : deviceVersionMinor ( ) const
{ return p ? p - > deviceVersionMinor_ : 0 ; }
2014-01-16 15:57:57 +08:00
2013-10-22 18:04:49 +08:00
String Device : : driverVersion ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > driverVersion_ : String ( ) ; }
2013-10-22 18:04:49 +08:00
int Device : : type ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > type_ : 0 ; }
2013-10-22 18:04:49 +08:00
int Device : : addressBits ( ) const
2017-12-01 19:02:20 +08:00
{ return p ? p - > addressBits_ : 0 ; }
2013-10-22 18:04:49 +08:00
bool Device : : available ( ) const
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_AVAILABLE ) : false ; }
2013-10-22 18:04:49 +08:00
bool Device : : compilerAvailable ( ) const
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_COMPILER_AVAILABLE ) : false ; }
2013-10-22 18:04:49 +08:00
bool Device : : linkerAvailable ( ) const
2013-11-21 17:05:32 +08:00
# ifdef CL_VERSION_1_2
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_LINKER_AVAILABLE ) : false ; }
2013-11-21 17:05:32 +08:00
# else
{ CV_REQUIRE_OPENCL_1_2_ERROR ; }
# endif
2013-10-22 18:04:49 +08:00
int Device : : doubleFPConfig ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > doubleFPConfig_ : 0 ; }
2013-10-22 18:04:49 +08:00
int Device : : singleFPConfig ( ) const
{ return p ? p - > getProp < cl_device_fp_config , int > ( CL_DEVICE_SINGLE_FP_CONFIG ) : 0 ; }
int Device : : halfFPConfig ( ) const
2021-06-21 11:46:32 +08:00
{ return p ? p - > halfFPConfig_ : 0 ; }
2013-10-22 18:04:49 +08:00
bool Device : : endianLittle ( ) const
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_ENDIAN_LITTLE ) : false ; }
2013-10-22 18:04:49 +08:00
bool Device : : errorCorrectionSupport ( ) const
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_ERROR_CORRECTION_SUPPORT ) : false ; }
2013-10-22 18:04:49 +08:00
int Device : : executionCapabilities ( ) const
{ return p ? p - > getProp < cl_device_exec_capabilities , int > ( CL_DEVICE_EXECUTION_CAPABILITIES ) : 0 ; }
size_t Device : : globalMemCacheSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_GLOBAL_MEM_CACHE_SIZE ) : 0 ; }
int Device : : globalMemCacheType ( ) const
{ return p ? p - > getProp < cl_device_mem_cache_type , int > ( CL_DEVICE_GLOBAL_MEM_CACHE_TYPE ) : 0 ; }
int Device : : globalMemCacheLineSize ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE ) : 0 ; }
size_t Device : : globalMemSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_GLOBAL_MEM_SIZE ) : 0 ; }
size_t Device : : localMemSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_LOCAL_MEM_SIZE ) : 0 ; }
int Device : : localMemType ( ) const
{ return p ? p - > getProp < cl_device_local_mem_type , int > ( CL_DEVICE_LOCAL_MEM_TYPE ) : 0 ; }
bool Device : : hostUnifiedMemory ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > hostUnifiedMemory_ : false ; }
2013-10-22 18:04:49 +08:00
bool Device : : imageSupport ( ) const
2013-10-23 03:34:16 +08:00
{ return p ? p - > getBoolProp ( CL_DEVICE_IMAGE_SUPPORT ) : false ; }
2013-10-22 18:04:49 +08:00
2014-04-15 07:09:17 +08:00
bool Device : : imageFromBufferSupport ( ) const
{
2017-09-07 04:15:47 +08:00
return p ? p - > isExtensionSupported ( " cl_khr_image2d_from_buffer " ) : false ;
2014-04-15 07:09:17 +08:00
}
uint Device : : imagePitchAlignment ( ) const
{
# ifdef CL_DEVICE_IMAGE_PITCH_ALIGNMENT
return p ? p - > getProp < cl_uint , uint > ( CL_DEVICE_IMAGE_PITCH_ALIGNMENT ) : 0 ;
# else
return 0 ;
# endif
}
uint Device : : imageBaseAddressAlignment ( ) const
{
# ifdef CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT
return p ? p - > getProp < cl_uint , uint > ( CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT ) : 0 ;
# else
return 0 ;
# endif
}
2013-10-22 18:04:49 +08:00
size_t Device : : image2DMaxWidth ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE2D_MAX_WIDTH ) : 0 ; }
size_t Device : : image2DMaxHeight ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE2D_MAX_HEIGHT ) : 0 ; }
size_t Device : : image3DMaxWidth ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE3D_MAX_WIDTH ) : 0 ; }
size_t Device : : image3DMaxHeight ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE3D_MAX_HEIGHT ) : 0 ; }
size_t Device : : image3DMaxDepth ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE3D_MAX_DEPTH ) : 0 ; }
size_t Device : : imageMaxBufferSize ( ) const
2013-11-21 17:05:32 +08:00
# ifdef CL_VERSION_1_2
2013-10-22 18:04:49 +08:00
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE_MAX_BUFFER_SIZE ) : 0 ; }
2013-11-21 17:05:32 +08:00
# else
{ CV_REQUIRE_OPENCL_1_2_ERROR ; }
# endif
2013-10-22 18:04:49 +08:00
size_t Device : : imageMaxArraySize ( ) const
2013-11-21 17:05:32 +08:00
# ifdef CL_VERSION_1_2
2013-10-22 18:04:49 +08:00
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_IMAGE_MAX_ARRAY_SIZE ) : 0 ; }
2013-11-21 17:05:32 +08:00
# else
{ CV_REQUIRE_OPENCL_1_2_ERROR ; }
# endif
2013-10-22 18:04:49 +08:00
Merge pull request #8104 from insoow:master
Gemm kernels for Intel GPU (#8104)
* Fix an issue with Kernel object reset release when consecutive Kernel::run calls
Kernel::run launch OCL gpu kernels and set a event callback function
to decreate the ref count of UMat or remove UMat when the lauched workloads
are completed. However, for some OCL kernels requires multiple call of
Kernel::run function with some kernel parameter changes (e.g., input
and output buffer offset) to get the final computation result.
In the case, the current implementation requires unnecessary
synchronization and cleanupMat.
This fix requires the user to specify whether there will be more work or not.
If there is no remaining computation, the Kernel::run will reset the
kernel object
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* GEMM kernel optimization for Intel GEN
The optimized kernels uses cl_intel_subgroups extension for better
performance.
Note: This optimized kernels will be part of ISAAC in a code generation
way under MIT license.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Fix API compatibility error
This patch fixes a OCV API compatibility error. The error was reported
due to the interface changes of Kernel::run. To resolve the issue,
An overloaded function of Kernel::run is added. It take a flag indicating
whether there are more work to be done with the kernel object without
releasing resources related to it.
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Renaming intel_gpu_gemm.cpp to intel_gpu_gemm.inl.hpp
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* Revert "Fix API compatibility error"
This reverts commit 2ef427db91b6c4aec170f691c5d2e6c47d6520d7.
Conflicts:
modules/core/src/intel_gpu_gemm.inl.hpp
* Revert "Fix an issue with Kernel object reset release when consecutive Kernel::run calls"
This reverts commit cc7f9f54695dc293598addce9b9d7e345225bede.
* Fix the case of uninitialization D
When C is null and beta is non-zero, D is used without initialization.
This resloves the issue
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* fix potential output error due to 0 * nan
Signed-off-by: Woo, Insoo <insoo.woo@intel.com>
* whitespace fix, eliminate non-ASCII symbols
* fix build warning
2017-04-19 17:57:54 +08:00
bool Device : : intelSubgroupsSupport ( ) const
{ return p ? p - > intelSubgroupsSupport_ : false ; }
2013-10-22 18:04:49 +08:00
int Device : : maxClockFrequency ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_CLOCK_FREQUENCY ) : 0 ; }
int Device : : maxComputeUnits ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > maxComputeUnits_ : 0 ; }
2013-10-22 18:04:49 +08:00
int Device : : maxConstantArgs ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_CONSTANT_ARGS ) : 0 ; }
size_t Device : : maxConstantBufferSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE ) : 0 ; }
size_t Device : : maxMemAllocSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_MAX_MEM_ALLOC_SIZE ) : 0 ; }
size_t Device : : maxParameterSize ( ) const
{ return p ? p - > getProp < cl_ulong , size_t > ( CL_DEVICE_MAX_PARAMETER_SIZE ) : 0 ; }
int Device : : maxReadImageArgs ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_READ_IMAGE_ARGS ) : 0 ; }
int Device : : maxWriteImageArgs ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_WRITE_IMAGE_ARGS ) : 0 ; }
int Device : : maxSamplers ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_SAMPLERS ) : 0 ; }
size_t Device : : maxWorkGroupSize ( ) const
2014-01-31 18:00:05 +08:00
{ return p ? p - > maxWorkGroupSize_ : 0 ; }
2013-10-22 18:04:49 +08:00
int Device : : maxWorkItemDims ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS ) : 0 ; }
void Device : : maxWorkItemSizes ( size_t * sizes ) const
{
if ( p )
{
const int MAX_DIMS = 32 ;
size_t retsz = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetDeviceInfo ( p - > handle , CL_DEVICE_MAX_WORK_ITEM_SIZES ,
MAX_DIMS * sizeof ( sizes [ 0 ] ) , & sizes [ 0 ] , & retsz ) ) ;
2013-10-22 18:04:49 +08:00
}
}
int Device : : memBaseAddrAlign ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_MEM_BASE_ADDR_ALIGN ) : 0 ; }
int Device : : nativeVectorWidthChar ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR ) : 0 ; }
int Device : : nativeVectorWidthShort ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT ) : 0 ; }
int Device : : nativeVectorWidthInt ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_INT ) : 0 ; }
int Device : : nativeVectorWidthLong ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG ) : 0 ; }
int Device : : nativeVectorWidthFloat ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT ) : 0 ; }
int Device : : nativeVectorWidthDouble ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE ) : 0 ; }
int Device : : nativeVectorWidthHalf ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF ) : 0 ; }
int Device : : preferredVectorWidthChar ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR ) : 0 ; }
int Device : : preferredVectorWidthShort ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT ) : 0 ; }
int Device : : preferredVectorWidthInt ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT ) : 0 ; }
int Device : : preferredVectorWidthLong ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG ) : 0 ; }
int Device : : preferredVectorWidthFloat ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT ) : 0 ; }
int Device : : preferredVectorWidthDouble ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE ) : 0 ; }
int Device : : preferredVectorWidthHalf ( ) const
{ return p ? p - > getProp < cl_uint , int > ( CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF ) : 0 ; }
size_t Device : : printfBufferSize ( ) const
2013-11-21 17:05:32 +08:00
# ifdef CL_VERSION_1_2
2013-10-22 18:04:49 +08:00
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_PRINTF_BUFFER_SIZE ) : 0 ; }
2013-11-21 17:05:32 +08:00
# else
{ CV_REQUIRE_OPENCL_1_2_ERROR ; }
# endif
2013-10-22 18:04:49 +08:00
size_t Device : : profilingTimerResolution ( ) const
{ return p ? p - > getProp < size_t , size_t > ( CL_DEVICE_PROFILING_TIMER_RESOLUTION ) : 0 ; }
const Device & Device : : getDefault ( )
{
2020-08-12 02:13:52 +08:00
auto & c = OpenCLExecutionContext : : getCurrent ( ) ;
if ( ! c . empty ( ) )
{
return c . getDevice ( ) ;
}
static Device dummy ;
return dummy ;
2013-10-22 18:04:49 +08:00
}
2014-02-01 19:07:03 +08:00
////////////////////////////////////// Context ///////////////////////////////////////////////////
2013-10-22 18:04:49 +08:00
2013-12-25 18:39:21 +08:00
template < typename Functor , typename ObjectType >
inline cl_int getStringInfo ( Functor f , ObjectType obj , cl_uint name , std : : string & param )
{
: : size_t required ;
cl_int err = f ( obj , name , 0 , NULL , & required ) ;
if ( err ! = CL_SUCCESS )
return err ;
param . clear ( ) ;
if ( required > 0 )
{
2013-12-25 22:41:24 +08:00
AutoBuffer < char > buf ( required + 1 ) ;
2018-06-11 06:42:00 +08:00
char * ptr = buf . data ( ) ; // cleanup is not needed
2013-12-25 22:41:24 +08:00
err = f ( obj , name , required , ptr , NULL ) ;
2013-12-25 18:39:21 +08:00
if ( err ! = CL_SUCCESS )
return err ;
2013-12-25 22:41:24 +08:00
param = ptr ;
2013-12-25 18:39:21 +08:00
}
return CL_SUCCESS ;
2014-01-18 05:30:29 +08:00
}
2013-12-25 18:39:21 +08:00
2014-02-01 19:07:03 +08:00
static void split ( const std : : string & s , char delim , std : : vector < std : : string > & elems )
{
2013-12-25 22:41:24 +08:00
elems . clear ( ) ;
if ( s . size ( ) = = 0 )
return ;
std : : istringstream ss ( s ) ;
2013-12-25 18:39:21 +08:00
std : : string item ;
2013-12-25 22:41:24 +08:00
while ( ! ss . eof ( ) )
{
std : : getline ( ss , item , delim ) ;
2013-12-25 18:39:21 +08:00
elems . push_back ( item ) ;
}
}
// Layout: <Platform>:<CPU|GPU|ACCELERATOR|nothing=GPU/CPU>:<deviceName>
// Sample: AMD:GPU:
// Sample: AMD:GPU:Tahiti
// Sample: :GPU|CPU: = '' = ':' = '::'
static bool parseOpenCLDeviceConfiguration ( const std : : string & configurationStr ,
std : : string & platform , std : : vector < std : : string > & deviceTypes , std : : string & deviceNameOrID )
{
2013-12-25 22:41:24 +08:00
std : : vector < std : : string > parts ;
split ( configurationStr , ' : ' , parts ) ;
if ( parts . size ( ) > 3 )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
CV_LOG_ERROR ( NULL , " OpenCL: Invalid configuration string for OpenCL device: " < < configurationStr ) ;
2013-12-25 22:41:24 +08:00
return false ;
}
if ( parts . size ( ) > 2 )
deviceNameOrID = parts [ 2 ] ;
if ( parts . size ( ) > 1 )
{
split ( parts [ 1 ] , ' | ' , deviceTypes ) ;
2013-12-25 18:39:21 +08:00
}
2013-12-25 22:41:24 +08:00
if ( parts . size ( ) > 0 )
2013-12-25 18:39:21 +08:00
{
2013-12-25 22:41:24 +08:00
platform = parts [ 0 ] ;
2013-12-25 18:39:21 +08:00
}
return true ;
}
2019-07-25 04:12:09 +08:00
# if defined WINRT || defined _WIN32_WCE
2020-08-12 02:13:52 +08:00
static cl_device_id selectOpenCLDevice ( const char * configuration = NULL )
2014-05-06 04:59:39 +08:00
{
2020-08-12 02:13:52 +08:00
CV_UNUSED ( configuration )
2014-05-06 04:59:39 +08:00
return NULL ;
}
# else
2020-08-12 02:13:52 +08:00
static cl_device_id selectOpenCLDevice ( const char * configuration = NULL )
2013-12-25 18:39:21 +08:00
{
2014-02-01 19:07:03 +08:00
std : : string platform , deviceName ;
2013-12-25 18:39:21 +08:00
std : : vector < std : : string > deviceTypes ;
2014-02-01 19:07:03 +08:00
2020-08-12 02:13:52 +08:00
if ( ! configuration )
configuration = getenv ( " OPENCV_OPENCL_DEVICE " ) ;
2014-11-05 23:41:39 +08:00
if ( configuration & &
( strcmp ( configuration , " disabled " ) = = 0 | |
! parseOpenCLDeviceConfiguration ( std : : string ( configuration ) , platform , deviceTypes , deviceName )
) )
2014-02-01 19:07:03 +08:00
return NULL ;
2013-12-25 18:39:21 +08:00
bool isID = false ;
int deviceID = - 1 ;
if ( deviceName . length ( ) = = 1 )
// We limit ID range to 0..9, because we want to write:
// - '2500' to mean i5-2500
// - '8350' to mean AMD FX-8350
// - '650' to mean GeForce 650
// To extend ID range change condition to '> 0'
{
isID = true ;
for ( size_t i = 0 ; i < deviceName . length ( ) ; i + + )
{
if ( ! isdigit ( deviceName [ i ] ) )
{
isID = false ;
break ;
}
}
if ( isID )
{
deviceID = atoi ( deviceName . c_str ( ) ) ;
2014-02-01 19:07:03 +08:00
if ( deviceID < 0 )
return NULL ;
2013-12-25 18:39:21 +08:00
}
}
std : : vector < cl_platform_id > platforms ;
2013-12-25 22:41:24 +08:00
{
cl_uint numPlatforms = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetPlatformIDs ( 0 , NULL , & numPlatforms ) ) ;
2014-02-01 19:07:03 +08:00
2013-12-25 22:41:24 +08:00
if ( numPlatforms = = 0 )
return NULL ;
platforms . resize ( ( size_t ) numPlatforms ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetPlatformIDs ( numPlatforms , & platforms [ 0 ] , & numPlatforms ) ) ;
2013-12-25 22:41:24 +08:00
platforms . resize ( numPlatforms ) ;
}
2013-12-25 18:39:21 +08:00
if ( platform . length ( ) > 0 )
{
2021-09-10 23:59:56 +08:00
for ( std : : vector < cl_platform_id > : : iterator currentPlatform = platforms . begin ( ) ; currentPlatform ! = platforms . end ( ) ; )
2013-12-25 18:39:21 +08:00
{
std : : string name ;
2021-09-10 23:59:56 +08:00
CV_OCL_DBG_CHECK ( getStringInfo ( clGetPlatformInfo , * currentPlatform , CL_PLATFORM_NAME , name ) ) ;
2013-12-25 18:39:21 +08:00
if ( name . find ( platform ) ! = std : : string : : npos )
{
2021-09-10 23:59:56 +08:00
+ + currentPlatform ;
}
else
{
currentPlatform = platforms . erase ( currentPlatform ) ;
2013-12-25 18:39:21 +08:00
}
}
2021-09-10 23:59:56 +08:00
if ( platforms . size ( ) = = 0 )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
CV_LOG_ERROR ( NULL , " OpenCL: Can't find OpenCL platform by name: " < < platform ) ;
2013-12-25 18:39:21 +08:00
goto not_found ;
}
}
if ( deviceTypes . size ( ) = = 0 )
{
if ( ! isID )
{
deviceTypes . push_back ( " GPU " ) ;
2014-07-01 18:24:14 +08:00
if ( configuration )
deviceTypes . push_back ( " CPU " ) ;
2013-12-25 18:39:21 +08:00
}
else
deviceTypes . push_back ( " ALL " ) ;
}
for ( size_t t = 0 ; t < deviceTypes . size ( ) ; t + + )
{
int deviceType = 0 ;
2014-03-24 19:20:00 +08:00
std : : string tempStrDeviceType = deviceTypes [ t ] ;
2018-08-23 23:17:04 +08:00
std : : transform ( tempStrDeviceType . begin ( ) , tempStrDeviceType . end ( ) , tempStrDeviceType . begin ( ) , details : : char_tolower ) ;
2014-03-24 19:20:00 +08:00
if ( tempStrDeviceType = = " gpu " | | tempStrDeviceType = = " dgpu " | | tempStrDeviceType = = " igpu " )
2013-12-25 18:39:21 +08:00
deviceType = Device : : TYPE_GPU ;
2014-03-24 19:20:00 +08:00
else if ( tempStrDeviceType = = " cpu " )
2013-12-25 18:39:21 +08:00
deviceType = Device : : TYPE_CPU ;
2014-03-24 19:20:00 +08:00
else if ( tempStrDeviceType = = " accelerator " )
2013-12-25 18:39:21 +08:00
deviceType = Device : : TYPE_ACCELERATOR ;
2014-03-24 19:20:00 +08:00
else if ( tempStrDeviceType = = " all " )
2013-12-25 18:39:21 +08:00
deviceType = Device : : TYPE_ALL ;
else
{
2020-08-12 02:13:52 +08:00
CV_LOG_ERROR ( NULL , " OpenCL: Unsupported device type for OpenCL device (GPU, CPU, ACCELERATOR): " < < deviceTypes [ t ] ) ;
2013-12-25 18:39:21 +08:00
goto not_found ;
}
2021-09-10 23:59:56 +08:00
std : : vector < cl_device_id > devices ;
for ( std : : vector < cl_platform_id > : : iterator currentPlatform = platforms . begin ( ) ; currentPlatform ! = platforms . end ( ) ; + + currentPlatform )
2013-12-25 18:39:21 +08:00
{
cl_uint count = 0 ;
2021-09-10 23:59:56 +08:00
cl_int status = clGetDeviceIDs ( * currentPlatform , deviceType , 0 , NULL , & count ) ;
2017-11-01 23:18:54 +08:00
if ( ! ( status = = CL_SUCCESS | | status = = CL_DEVICE_NOT_FOUND ) )
{
CV_OCL_DBG_CHECK_RESULT ( status , " clGetDeviceIDs get count " ) ;
}
2013-12-25 18:39:21 +08:00
if ( count = = 0 )
continue ;
size_t base = devices . size ( ) ;
devices . resize ( base + count ) ;
2021-09-10 23:59:56 +08:00
status = clGetDeviceIDs ( * currentPlatform , deviceType , count , & devices [ base ] , & count ) ;
2017-11-01 23:18:54 +08:00
if ( ! ( status = = CL_SUCCESS | | status = = CL_DEVICE_NOT_FOUND ) )
{
CV_OCL_DBG_CHECK_RESULT ( status , " clGetDeviceIDs get IDs " ) ;
}
2013-12-25 18:39:21 +08:00
}
for ( size_t i = ( isID ? deviceID : 0 ) ;
( isID ? ( i = = ( size_t ) deviceID ) : true ) & & ( i < devices . size ( ) ) ;
i + + )
{
std : : string name ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( getStringInfo ( clGetDeviceInfo , devices [ i ] , CL_DEVICE_NAME , name ) ) ;
2014-03-21 20:56:12 +08:00
cl_bool useGPU = true ;
2014-03-24 19:20:00 +08:00
if ( tempStrDeviceType = = " dgpu " | | tempStrDeviceType = = " igpu " )
2014-03-21 20:56:12 +08:00
{
cl_bool isIGPU = CL_FALSE ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetDeviceInfo ( devices [ i ] , CL_DEVICE_HOST_UNIFIED_MEMORY , sizeof ( isIGPU ) , & isIGPU , NULL ) ) ;
2014-03-24 19:20:00 +08:00
useGPU = tempStrDeviceType = = " dgpu " ? ! isIGPU : isIGPU ;
2014-03-21 20:56:12 +08:00
}
if ( ( isID | | name . find ( deviceName ) ! = std : : string : : npos ) & & useGPU )
2013-12-25 18:39:21 +08:00
{
// TODO check for OpenCL 1.1
return devices [ i ] ;
}
}
}
2014-02-01 19:07:03 +08:00
2013-12-25 18:39:21 +08:00
not_found :
2014-11-05 23:41:39 +08:00
if ( ! configuration )
return NULL ; // suppress messages on stderr
2020-08-12 02:13:52 +08:00
std : : ostringstream msg ;
msg < < " ERROR: Requested OpenCL device not found, check configuration: ' " < < configuration < < " ' " < < std : : endl
< < " Platform: " < < ( platform . length ( ) = = 0 ? " any " : platform ) < < std : : endl
< < " Device types: " ;
2013-12-25 18:39:21 +08:00
for ( size_t t = 0 ; t < deviceTypes . size ( ) ; t + + )
2020-08-12 02:13:52 +08:00
msg < < ' ' < < deviceTypes [ t ] ;
msg < < std : : endl < < " Device name: " < < ( deviceName . length ( ) = = 0 ? " any " : deviceName ) ;
2014-02-01 19:07:03 +08:00
2020-08-12 02:13:52 +08:00
CV_LOG_ERROR ( NULL , msg . str ( ) ) ;
2013-12-25 18:39:21 +08:00
return NULL ;
}
2014-05-06 04:59:39 +08:00
# endif
2013-12-25 18:39:21 +08:00
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
namespace svm {
enum AllocatorFlags { // don't use first 16 bits
OPENCL_SVM_COARSE_GRAIN_BUFFER = 1 < < 16 , // clSVMAlloc + SVM map/unmap
OPENCL_SVM_FINE_GRAIN_BUFFER = 2 < < 16 , // clSVMAlloc
OPENCL_SVM_FINE_GRAIN_SYSTEM = 3 < < 16 , // direct access
OPENCL_SVM_BUFFER_MASK = 3 < < 16 ,
OPENCL_SVM_BUFFER_MAP = 4 < < 16
} ;
static bool checkForceSVMUmatUsage ( )
{
static bool initialized = false ;
static bool force = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
force = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_SVM_FORCE_UMAT_USAGE " , false ) ;
2015-01-02 08:33:40 +08:00
initialized = true ;
}
return force ;
}
static bool checkDisableSVMUMatUsage ( )
{
static bool initialized = false ;
static bool force = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
force = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_SVM_DISABLE_UMAT_USAGE " , false ) ;
2015-01-02 08:33:40 +08:00
initialized = true ;
}
return force ;
}
static bool checkDisableSVM ( )
{
static bool initialized = false ;
static bool force = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
force = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_SVM_DISABLE " , false ) ;
2015-01-02 08:33:40 +08:00
initialized = true ;
}
return force ;
}
// see SVMCapabilities
static unsigned int getSVMCapabilitiesMask ( )
{
static bool initialized = false ;
static unsigned int mask = 0 ;
if ( ! initialized )
{
const char * envValue = getenv ( " OPENCV_OPENCL_SVM_CAPABILITIES_MASK " ) ;
if ( envValue = = NULL )
{
return ~ 0U ; // all bits 1
}
mask = atoi ( envValue ) ;
initialized = true ;
}
return mask ;
}
} // namespace
# endif
2016-12-19 05:38:33 +08:00
static size_t getProgramCountLimit ( )
{
static bool initialized = false ;
static size_t count = 0 ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
count = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_PROGRAM_CACHE " , 0 ) ;
2016-12-19 05:38:33 +08:00
initialized = true ;
}
return count ;
}
2020-08-12 02:13:52 +08:00
static int g_contextId = 0 ;
class OpenCLBufferPoolImpl ;
class OpenCLSVMBufferPoolImpl ;
2014-02-01 00:23:01 +08:00
struct Context : : Impl
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
static Context : : Impl * get ( Context & context ) { return context . p ; }
2020-08-12 02:13:52 +08:00
typedef std : : deque < Context : : Impl * > container_t ;
static container_t & getGlobalContainer ( )
2013-12-02 05:50:24 +08:00
{
2020-09-03 15:41:03 +08:00
// never delete this container (Impl lifetime is greater due to TLS storage)
static container_t * g_contexts = new container_t ( ) ;
return * g_contexts ;
2020-08-12 02:13:52 +08:00
}
protected :
Impl ( const std : : string & configuration_ )
: refcount ( 1 )
, contextId ( CV_XADD ( & g_contextId , 1 ) )
, configuration ( configuration_ )
, handle ( 0 )
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
2020-08-12 02:13:52 +08:00
, svmInitialized ( false )
2015-01-02 08:33:40 +08:00
# endif
2020-08-12 02:13:52 +08:00
{
if ( ! haveOpenCL ( ) )
CV_Error ( cv : : Error : : OpenCLApiCallError , " OpenCL runtime is not available! " ) ;
cv : : AutoLock lock ( cv : : getInitializationMutex ( ) ) ;
auto & container = getGlobalContainer ( ) ;
container . resize ( std : : max ( container . size ( ) , ( size_t ) contextId + 1 ) ) ;
container [ contextId ] = this ;
2015-01-02 08:33:40 +08:00
}
2020-08-12 02:13:52 +08:00
~ Impl ( )
2015-01-02 08:33:40 +08:00
{
2020-08-12 02:13:52 +08:00
# ifdef _WIN32
if ( ! cv : : __termination )
# endif
{
if ( handle )
{
CV_OCL_DBG_CHECK ( clReleaseContext ( handle ) ) ;
handle = NULL ;
}
devices . clear ( ) ;
}
2021-05-15 00:48:50 +08:00
userContextStorage . clear ( ) ;
2020-08-12 02:13:52 +08:00
{
cv : : AutoLock lock ( cv : : getInitializationMutex ( ) ) ;
auto & container = getGlobalContainer ( ) ;
2020-09-03 15:41:03 +08:00
CV_CheckLT ( ( size_t ) contextId , container . size ( ) , " " ) ;
2020-08-12 02:13:52 +08:00
container [ contextId ] = NULL ;
}
2013-12-02 05:50:24 +08:00
}
2020-08-12 02:13:52 +08:00
void init_device_list ( )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
CV_Assert ( handle ) ;
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
cl_uint ndevices = 0 ;
CV_OCL_CHECK ( clGetContextInfo ( handle , CL_CONTEXT_NUM_DEVICES , sizeof ( ndevices ) , & ndevices , NULL ) ) ;
CV_Assert ( ndevices > 0 ) ;
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
cv : : AutoBuffer < cl_device_id > cl_devices ( ndevices ) ;
size_t devices_ret_size = 0 ;
CV_OCL_CHECK ( clGetContextInfo ( handle , CL_CONTEXT_DEVICES , cl_devices . size ( ) * sizeof ( cl_device_id ) , & cl_devices [ 0 ] , & devices_ret_size ) ) ;
CV_CheckEQ ( devices_ret_size , cl_devices . size ( ) * sizeof ( cl_device_id ) , " " ) ;
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
devices . clear ( ) ;
for ( unsigned i = 0 ; i < ndevices ; i + + )
{
devices . emplace_back ( Device : : fromHandle ( cl_devices [ i ] ) ) ;
}
}
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
void __init_buffer_pools ( ) ; // w/o synchronization
void _init_buffer_pools ( ) const
{
if ( ! bufferPool_ )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
cv : : AutoLock lock ( cv : : getInitializationMutex ( ) ) ;
if ( ! bufferPool_ )
{
const_cast < Impl * > ( this ) - > __init_buffer_pools ( ) ;
}
}
}
public :
static Impl * findContext ( const std : : string & configuration )
{
CV_TRACE_FUNCTION ( ) ;
cv : : AutoLock lock ( cv : : getInitializationMutex ( ) ) ;
auto & container = getGlobalContainer ( ) ;
if ( configuration . empty ( ) & & ! container . empty ( ) )
return container [ 0 ] ;
for ( auto it = container . begin ( ) ; it ! = container . end ( ) ; + + it )
{
Impl * i = * it ;
if ( i & & i - > configuration = = configuration )
{
return i ;
}
}
return NULL ;
}
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
static Impl * findOrCreateContext ( const std : : string & configuration_ )
{
CV_TRACE_FUNCTION ( ) ;
std : : string configuration = configuration_ ;
if ( configuration_ . empty ( ) )
{
const char * c = getenv ( " OPENCV_OPENCL_DEVICE " ) ;
if ( c )
configuration = c ;
}
Impl * impl = findContext ( configuration ) ;
if ( impl )
{
CV_LOG_INFO ( NULL , " OpenCL: reuse context@ " < < impl - > contextId < < " for configuration: " < < configuration )
2020-11-25 08:53:41 +08:00
impl - > addref ( ) ;
2020-08-12 02:13:52 +08:00
return impl ;
}
2013-12-25 18:39:21 +08:00
2020-08-12 02:13:52 +08:00
cl_device_id d = selectOpenCLDevice ( configuration . empty ( ) ? NULL : configuration . c_str ( ) ) ;
if ( d = = NULL )
return NULL ;
2014-02-01 19:07:03 +08:00
2020-08-12 02:13:52 +08:00
impl = new Impl ( configuration ) ;
try
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
impl - > createFromDevice ( d ) ;
if ( impl - > handle )
return impl ;
delete impl ;
return NULL ;
}
catch ( . . . )
{
delete impl ;
throw ;
2013-12-25 18:39:21 +08:00
}
}
2020-08-12 02:13:52 +08:00
static Impl * findOrCreateContext ( cl_context h )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
CV_TRACE_FUNCTION ( ) ;
2013-10-22 18:04:49 +08:00
2020-08-12 02:13:52 +08:00
CV_Assert ( h ) ;
std : : string configuration = cv : : format ( " @ctx-%p " , ( void * ) h ) ;
Impl * impl = findContext ( configuration ) ;
if ( impl )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
CV_LOG_INFO ( NULL , " OpenCL: reuse context@ " < < impl - > contextId < < " for configuration: " < < configuration )
impl - > addref ( ) ;
return impl ;
}
2013-10-22 18:04:49 +08:00
2020-08-12 02:13:52 +08:00
impl = new Impl ( configuration ) ;
try
{
CV_OCL_CHECK ( clRetainContext ( h ) ) ;
impl - > handle = h ;
impl - > init_device_list ( ) ;
return impl ;
}
catch ( . . . )
2019-10-30 23:24:32 +08:00
{
2020-08-12 02:13:52 +08:00
delete impl ;
throw ;
2019-10-30 23:24:32 +08:00
}
2020-08-12 02:13:52 +08:00
}
2019-10-30 23:24:32 +08:00
2020-08-12 02:13:52 +08:00
static Impl * findOrCreateContext ( const ocl : : Device & device )
{
CV_TRACE_FUNCTION ( ) ;
2014-02-01 19:07:03 +08:00
2020-08-12 02:13:52 +08:00
CV_Assert ( ! device . empty ( ) ) ;
cl_device_id d = ( cl_device_id ) device . ptr ( ) ;
CV_Assert ( d ) ;
2013-10-22 18:04:49 +08:00
2020-08-12 02:13:52 +08:00
std : : string configuration = cv : : format ( " @dev-%p " , ( void * ) d ) ;
Impl * impl = findContext ( configuration ) ;
if ( impl )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
CV_LOG_INFO ( NULL , " OpenCL: reuse context@ " < < impl - > contextId < < " for configuration: " < < configuration )
impl - > addref ( ) ;
return impl ;
2013-10-22 18:04:49 +08:00
}
2020-08-12 02:13:52 +08:00
impl = new Impl ( configuration ) ;
try
{
impl - > createFromDevice ( d ) ;
CV_Assert ( impl - > handle ) ;
return impl ;
}
catch ( . . . )
{
delete impl ;
throw ;
}
}
void setDefault ( )
{
CV_TRACE_FUNCTION ( ) ;
cl_device_id d = selectOpenCLDevice ( ) ;
if ( d = = NULL )
2013-10-22 18:04:49 +08:00
return ;
2020-08-12 02:13:52 +08:00
createFromDevice ( d ) ;
}
void createFromDevice ( cl_device_id d )
{
CV_TRACE_FUNCTION ( ) ;
CV_Assert ( handle = = NULL ) ;
cl_platform_id pl = NULL ;
CV_OCL_DBG_CHECK ( clGetDeviceInfo ( d , CL_DEVICE_PLATFORM , sizeof ( cl_platform_id ) , & pl , NULL ) ) ;
cl_context_properties prop [ ] =
{
CL_CONTEXT_PLATFORM , ( cl_context_properties ) pl ,
0
} ;
2013-10-22 18:04:49 +08:00
// !!! in the current implementation force the number of devices to 1 !!!
2020-08-12 02:13:52 +08:00
cl_uint nd = 1 ;
cl_int status ;
handle = clCreateContext ( prop , nd , & d , 0 , 0 , & status ) ;
CV_OCL_DBG_CHECK_RESULT ( status , " clCreateContext " ) ;
2013-10-22 18:04:49 +08:00
2020-08-12 02:13:52 +08:00
bool ok = handle ! = 0 & & status = = CL_SUCCESS ;
2013-10-22 18:04:49 +08:00
if ( ok )
{
devices . resize ( nd ) ;
2020-08-12 02:13:52 +08:00
devices [ 0 ] . set ( d ) ;
2013-10-22 18:04:49 +08:00
}
2020-08-12 02:13:52 +08:00
else
2014-02-01 19:07:03 +08:00
handle = NULL ;
2013-10-22 18:04:49 +08:00
}
2017-12-03 01:48:30 +08:00
Program getProg ( const ProgramSource & src , const String & buildflags , String & errmsg ) ;
2013-10-22 18:04:49 +08:00
2017-08-25 08:42:11 +08:00
void unloadProg ( Program & prog )
{
cv : : AutoLock lock ( program_cache_mutex ) ;
for ( CacheList : : iterator i = cacheList . begin ( ) ; i ! = cacheList . end ( ) ; + + i )
{
phash_t : : iterator it = phash . find ( * i ) ;
if ( it ! = phash . end ( ) )
{
if ( it - > second . ptr ( ) = = prog . ptr ( ) )
{
phash . erase ( * i ) ;
cacheList . erase ( i ) ;
return ;
}
}
}
}
2016-12-19 05:38:33 +08:00
2017-11-24 17:52:29 +08:00
std : : string & getPrefixString ( )
2017-10-12 19:23:45 +08:00
{
if ( prefix . empty ( ) )
{
2018-01-13 00:03:16 +08:00
cv : : AutoLock lock ( program_cache_mutex ) ;
if ( prefix . empty ( ) )
2017-10-12 19:23:45 +08:00
{
2018-01-13 00:03:16 +08:00
CV_Assert ( ! devices . empty ( ) ) ;
const Device & d = devices [ 0 ] ;
int bits = d . addressBits ( ) ;
if ( bits > 0 & & bits ! = 64 )
prefix = cv : : format ( " %d-bit-- " , bits ) ;
prefix + = d . vendorName ( ) + " -- " + d . name ( ) + " -- " + d . driverVersion ( ) ;
// sanitize chars
for ( size_t i = 0 ; i < prefix . size ( ) ; i + + )
2017-10-12 19:23:45 +08:00
{
2018-01-13 00:03:16 +08:00
char c = prefix [ i ] ;
if ( ! ( ( c > = ' 0 ' & & c < = ' 9 ' ) | | ( c > = ' a ' & & c < = ' z ' ) | | ( c > = ' A ' & & c < = ' Z ' ) | | c = = ' _ ' | | c = = ' - ' ) )
{
prefix [ i ] = ' _ ' ;
}
2017-10-12 19:23:45 +08:00
}
}
}
return prefix ;
}
2017-11-24 17:52:29 +08:00
std : : string & getPrefixBase ( )
{
if ( prefix_base . empty ( ) )
{
2018-01-13 00:03:16 +08:00
cv : : AutoLock lock ( program_cache_mutex ) ;
if ( prefix_base . empty ( ) )
2017-11-24 17:52:29 +08:00
{
2018-01-13 00:03:16 +08:00
const Device & d = devices [ 0 ] ;
int bits = d . addressBits ( ) ;
if ( bits > 0 & & bits ! = 64 )
prefix_base = cv : : format ( " %d-bit-- " , bits ) ;
prefix_base + = d . vendorName ( ) + " -- " + d . name ( ) + " -- " ;
// sanitize chars
for ( size_t i = 0 ; i < prefix_base . size ( ) ; i + + )
2017-11-24 17:52:29 +08:00
{
2018-01-13 00:03:16 +08:00
char c = prefix_base [ i ] ;
if ( ! ( ( c > = ' 0 ' & & c < = ' 9 ' ) | | ( c > = ' a ' & & c < = ' z ' ) | | ( c > = ' A ' & & c < = ' Z ' ) | | c = = ' _ ' | | c = = ' - ' ) )
{
prefix_base [ i ] = ' _ ' ;
}
2017-11-24 17:52:29 +08:00
}
}
}
return prefix_base ;
}
2013-10-22 18:04:49 +08:00
IMPLEMENT_REFCOUNTABLE ( ) ;
2020-08-12 02:13:52 +08:00
const int contextId ; // global unique ID
const std : : string configuration ;
2013-10-22 18:04:49 +08:00
cl_context handle ;
std : : vector < Device > devices ;
2017-10-12 19:23:45 +08:00
std : : string prefix ;
2017-11-24 17:52:29 +08:00
std : : string prefix_base ;
2017-10-12 19:23:45 +08:00
2016-12-19 05:38:33 +08:00
cv : : Mutex program_cache_mutex ;
typedef std : : map < std : : string , Program > phash_t ;
2013-10-22 18:04:49 +08:00
phash_t phash ;
2016-12-19 05:38:33 +08:00
typedef std : : list < cv : : String > CacheList ;
CacheList cacheList ;
2015-01-02 08:33:40 +08:00
2020-08-12 02:13:52 +08:00
std : : shared_ptr < OpenCLBufferPoolImpl > bufferPool_ ;
std : : shared_ptr < OpenCLBufferPoolImpl > bufferPoolHostPtr_ ;
OpenCLBufferPoolImpl & getBufferPool ( ) const
{
_init_buffer_pools ( ) ;
CV_DbgAssert ( bufferPool_ ) ;
return * bufferPool_ . get ( ) ;
}
OpenCLBufferPoolImpl & getBufferPoolHostPtr ( ) const
{
_init_buffer_pools ( ) ;
CV_DbgAssert ( bufferPoolHostPtr_ ) ;
return * bufferPoolHostPtr_ . get ( ) ;
}
2021-05-15 00:48:50 +08:00
std : : map < std : : type_index , std : : shared_ptr < UserContext > > userContextStorage ;
cv : : Mutex userContextMutex ;
void setUserContext ( std : : type_index typeId , const std : : shared_ptr < UserContext > & userContext ) {
cv : : AutoLock lock ( userContextMutex ) ;
userContextStorage [ typeId ] = userContext ;
}
std : : shared_ptr < UserContext > getUserContext ( std : : type_index typeId ) {
cv : : AutoLock lock ( userContextMutex ) ;
auto it = userContextStorage . find ( typeId ) ;
if ( it ! = userContextStorage . end ( ) )
return it - > second ;
else
return nullptr ;
2020-10-19 05:22:06 +08:00
}
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
bool svmInitialized ;
bool svmAvailable ;
bool svmEnabled ;
svm : : SVMCapabilities svmCapabilities ;
svm : : SVMFunctions svmFunctions ;
void svmInit ( )
{
CV_Assert ( handle ! = NULL ) ;
const Device & device = devices [ 0 ] ;
cl_device_svm_capabilities deviceCaps = 0 ;
CV_Assert ( ( ( void ) 0 , CL_DEVICE_SVM_CAPABILITIES = = CL_DEVICE_SVM_CAPABILITIES_AMD ) ) ; // Check assumption
cl_int status = clGetDeviceInfo ( ( cl_device_id ) device . ptr ( ) , CL_DEVICE_SVM_CAPABILITIES , sizeof ( deviceCaps ) , & deviceCaps , NULL ) ;
if ( status ! = CL_SUCCESS )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " CL_DEVICE_SVM_CAPABILITIES via clGetDeviceInfo failed: %d \n " , status ) ;
goto noSVM ;
}
CV_OPENCL_SVM_TRACE_P ( " CL_DEVICE_SVM_CAPABILITIES returned: 0x%x \n " , ( int ) deviceCaps ) ;
CV_Assert ( ( ( void ) 0 , CL_DEVICE_SVM_COARSE_GRAIN_BUFFER = = CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_AMD ) ) ; // Check assumption
svmCapabilities . value_ =
( ( deviceCaps & CL_DEVICE_SVM_COARSE_GRAIN_BUFFER ) ? svm : : SVMCapabilities : : SVM_COARSE_GRAIN_BUFFER : 0 ) |
( ( deviceCaps & CL_DEVICE_SVM_FINE_GRAIN_BUFFER ) ? svm : : SVMCapabilities : : SVM_FINE_GRAIN_BUFFER : 0 ) |
( ( deviceCaps & CL_DEVICE_SVM_FINE_GRAIN_SYSTEM ) ? svm : : SVMCapabilities : : SVM_FINE_GRAIN_SYSTEM : 0 ) |
( ( deviceCaps & CL_DEVICE_SVM_ATOMICS ) ? svm : : SVMCapabilities : : SVM_ATOMICS : 0 ) ;
svmCapabilities . value_ & = svm : : getSVMCapabilitiesMask ( ) ;
if ( svmCapabilities . value_ = = 0 )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " svmCapabilities is empty \n " ) ;
goto noSVM ;
}
try
{
// Try OpenCL 2.0
CV_OPENCL_SVM_TRACE_P ( " Try SVM from OpenCL 2.0 ... \n " ) ;
void * ptr = clSVMAlloc ( handle , CL_MEM_READ_WRITE , 100 , 0 ) ;
if ( ! ptr )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " clSVMAlloc returned NULL... \n " ) ;
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsBadArg , " clSVMAlloc returned NULL " ) ;
2015-01-02 08:33:40 +08:00
}
try
{
bool error = false ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
if ( CL_SUCCESS ! = clEnqueueSVMMap ( q , CL_TRUE , CL_MAP_WRITE , ptr , 100 , 0 , NULL , NULL ) )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " clEnqueueSVMMap failed... \n " ) ;
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsBadArg , " clEnqueueSVMMap FAILED " ) ;
2015-01-02 08:33:40 +08:00
}
clFinish ( q ) ;
try
{
( ( int * ) ptr ) [ 0 ] = 100 ;
}
catch ( . . . )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " SVM buffer access test FAILED \n " ) ;
error = true ;
}
if ( CL_SUCCESS ! = clEnqueueSVMUnmap ( q , ptr , 0 , NULL , NULL ) )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " clEnqueueSVMUnmap failed... \n " ) ;
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsBadArg , " clEnqueueSVMUnmap FAILED " ) ;
2015-01-02 08:33:40 +08:00
}
clFinish ( q ) ;
if ( error )
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsBadArg , " OpenCL SVM buffer access test was FAILED " ) ;
2015-01-02 08:33:40 +08:00
}
}
catch ( . . . )
{
CV_OPENCL_SVM_TRACE_ERROR_P ( " OpenCL SVM buffer access test was FAILED \n " ) ;
clSVMFree ( handle , ptr ) ;
throw ;
}
clSVMFree ( handle , ptr ) ;
svmFunctions . fn_clSVMAlloc = clSVMAlloc ;
svmFunctions . fn_clSVMFree = clSVMFree ;
svmFunctions . fn_clSetKernelArgSVMPointer = clSetKernelArgSVMPointer ;
//svmFunctions.fn_clSetKernelExecInfo = clSetKernelExecInfo;
//svmFunctions.fn_clEnqueueSVMFree = clEnqueueSVMFree;
svmFunctions . fn_clEnqueueSVMMemcpy = clEnqueueSVMMemcpy ;
svmFunctions . fn_clEnqueueSVMMemFill = clEnqueueSVMMemFill ;
svmFunctions . fn_clEnqueueSVMMap = clEnqueueSVMMap ;
svmFunctions . fn_clEnqueueSVMUnmap = clEnqueueSVMUnmap ;
}
catch ( . . . )
{
CV_OPENCL_SVM_TRACE_P ( " clSVMAlloc failed, trying HSA extension... \n " ) ;
try
{
// Try HSA extension
String extensions = device . extensions ( ) ;
if ( extensions . find ( " cl_amd_svm " ) = = String : : npos )
{
CV_OPENCL_SVM_TRACE_P ( " Device extension doesn't have cl_amd_svm: %s \n " , extensions . c_str ( ) ) ;
goto noSVM ;
}
cl_platform_id p = NULL ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( status = clGetDeviceInfo ( ( cl_device_id ) device . ptr ( ) , CL_DEVICE_PLATFORM , sizeof ( cl_platform_id ) , & p , NULL ) ) ;
2015-01-02 08:33:40 +08:00
svmFunctions . fn_clSVMAlloc = ( clSVMAllocAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clSVMAllocAMD " ) ;
svmFunctions . fn_clSVMFree = ( clSVMFreeAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clSVMFreeAMD " ) ;
svmFunctions . fn_clSetKernelArgSVMPointer = ( clSetKernelArgSVMPointerAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clSetKernelArgSVMPointerAMD " ) ;
//svmFunctions.fn_clSetKernelExecInfo = (clSetKernelExecInfoAMD_fn)clGetExtensionFunctionAddressForPlatform(p, "clSetKernelExecInfoAMD");
//svmFunctions.fn_clEnqueueSVMFree = (clEnqueueSVMFreeAMD_fn)clGetExtensionFunctionAddressForPlatform(p, "clEnqueueSVMFreeAMD");
svmFunctions . fn_clEnqueueSVMMemcpy = ( clEnqueueSVMMemcpyAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clEnqueueSVMMemcpyAMD " ) ;
svmFunctions . fn_clEnqueueSVMMemFill = ( clEnqueueSVMMemFillAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clEnqueueSVMMemFillAMD " ) ;
svmFunctions . fn_clEnqueueSVMMap = ( clEnqueueSVMMapAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clEnqueueSVMMapAMD " ) ;
svmFunctions . fn_clEnqueueSVMUnmap = ( clEnqueueSVMUnmapAMD_fn ) clGetExtensionFunctionAddressForPlatform ( p , " clEnqueueSVMUnmapAMD " ) ;
CV_Assert ( svmFunctions . isValid ( ) ) ;
}
catch ( . . . )
{
CV_OPENCL_SVM_TRACE_P ( " Something is totally wrong \n " ) ;
goto noSVM ;
}
}
svmAvailable = true ;
svmEnabled = ! svm : : checkDisableSVM ( ) ;
svmInitialized = true ;
CV_OPENCL_SVM_TRACE_P ( " OpenCV OpenCL SVM support initialized \n " ) ;
return ;
noSVM :
CV_OPENCL_SVM_TRACE_P ( " OpenCL SVM is not detected \n " ) ;
svmAvailable = false ;
svmEnabled = false ;
svmCapabilities . value_ = 0 ;
svmInitialized = true ;
svmFunctions . fn_clSVMAlloc = NULL ;
return ;
}
2020-08-12 02:13:52 +08:00
std : : shared_ptr < OpenCLSVMBufferPoolImpl > bufferPoolSVM_ ;
OpenCLSVMBufferPoolImpl & getBufferPoolSVM ( ) const
{
_init_buffer_pools ( ) ;
CV_DbgAssert ( bufferPoolSVM_ ) ;
return * bufferPoolSVM_ . get ( ) ;
}
2015-01-02 08:33:40 +08:00
# endif
2017-12-05 18:26:38 +08:00
friend class Program ;
2013-10-22 18:04:49 +08:00
} ;
2021-02-20 21:16:47 +08:00
Context : : Context ( ) CV_NOEXCEPT
2013-10-22 18:04:49 +08:00
{
p = 0 ;
}
2020-08-12 02:13:52 +08:00
Context : : ~ Context ( )
{
release ( ) ;
}
// deprecated
2014-02-01 00:23:01 +08:00
Context : : Context ( int dtype )
2013-10-22 18:04:49 +08:00
{
p = 0 ;
create ( dtype ) ;
}
2020-08-12 02:13:52 +08:00
void Context : : release ( )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
if ( p )
2013-12-25 18:39:21 +08:00
{
2020-08-12 02:13:52 +08:00
p - > release ( ) ;
p = NULL ;
2013-12-25 18:39:21 +08:00
}
}
2020-08-12 02:13:52 +08:00
bool Context : : create ( )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
release ( ) ;
if ( ! haveOpenCL ( ) )
2013-10-22 18:04:49 +08:00
return false ;
2020-08-12 02:13:52 +08:00
p = Impl : : findOrCreateContext ( std : : string ( ) ) ;
2020-09-03 15:41:03 +08:00
if ( p & & p - > handle )
2020-08-12 02:13:52 +08:00
return true ;
release ( ) ;
return false ;
2013-10-22 18:04:49 +08:00
}
2020-08-12 02:13:52 +08:00
// deprecated
bool Context : : create ( int dtype )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
if ( ! haveOpenCL ( ) )
return false ;
release ( ) ;
if ( dtype = = CL_DEVICE_TYPE_DEFAULT | | ( unsigned ) dtype = = ( unsigned ) CL_DEVICE_TYPE_ALL )
2013-12-27 18:02:03 +08:00
{
2020-08-12 02:13:52 +08:00
p = Impl : : findOrCreateContext ( " " ) ;
}
else if ( dtype = = CL_DEVICE_TYPE_GPU )
{
p = Impl : : findOrCreateContext ( " :GPU: " ) ;
2013-12-27 18:02:03 +08:00
}
2020-08-12 02:13:52 +08:00
else if ( dtype = = CL_DEVICE_TYPE_CPU )
{
p = Impl : : findOrCreateContext ( " :CPU: " ) ;
}
else
{
CV_LOG_ERROR ( NULL , " OpenCL: Can't recognize OpenCV device type= " < < dtype ) ;
}
if ( p & & ! p - > handle )
{
release ( ) ;
}
return p ! = 0 ;
2013-10-22 18:04:49 +08:00
}
2014-02-01 00:23:01 +08:00
Context : : Context ( const Context & c )
2013-10-22 18:04:49 +08:00
{
p = ( Impl * ) c . p ;
if ( p )
p - > addref ( ) ;
}
2014-02-01 00:23:01 +08:00
Context & Context : : operator = ( const Context & c )
2013-10-22 18:04:49 +08:00
{
Impl * newp = ( Impl * ) c . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Context : : Context ( Context & & c ) CV_NOEXCEPT
{
p = c . p ;
c . p = nullptr ;
}
Context & Context : : operator = ( Context & & c ) CV_NOEXCEPT
{
if ( this ! = & c ) {
if ( p )
p - > release ( ) ;
p = c . p ;
c . p = nullptr ;
}
return * this ;
}
2014-02-01 00:23:01 +08:00
void * Context : : ptr ( ) const
2013-10-22 18:04:49 +08:00
{
2013-12-27 18:02:03 +08:00
return p = = NULL ? NULL : p - > handle ;
2013-10-22 18:04:49 +08:00
}
2014-02-01 00:23:01 +08:00
size_t Context : : ndevices ( ) const
2013-10-22 18:04:49 +08:00
{
return p ? p - > devices . size ( ) : 0 ;
}
2020-08-12 02:13:52 +08:00
Device & Context : : device ( size_t idx ) const
2013-10-22 18:04:49 +08:00
{
static Device dummy ;
return ! p | | idx > = p - > devices . size ( ) ? dummy : p - > devices [ idx ] ;
}
2014-02-01 00:23:01 +08:00
Context & Context : : getDefault ( bool initialize )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
auto & c = OpenCLExecutionContext : : getCurrent ( ) ;
if ( ! c . empty ( ) )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
auto & ctx = c . getContext ( ) ;
return ctx ;
2013-10-22 18:04:49 +08:00
}
2020-08-12 02:13:52 +08:00
CV_UNUSED ( initialize ) ;
static Context dummy ;
return dummy ;
2013-10-22 18:04:49 +08:00
}
2014-02-01 00:23:01 +08:00
Program Context : : getProg ( const ProgramSource & prog ,
2013-10-22 18:04:49 +08:00
const String & buildopts , String & errmsg )
{
return p ? p - > getProg ( prog , buildopts , errmsg ) : Program ( ) ;
}
2017-08-25 08:42:11 +08:00
void Context : : unloadProg ( Program & prog )
{
if ( p )
p - > unloadProg ( prog ) ;
}
2015-01-02 08:33:40 +08:00
2020-08-12 02:13:52 +08:00
/* static */
Context Context : : fromHandle ( void * context )
{
Context ctx ;
ctx . p = Impl : : findOrCreateContext ( ( cl_context ) context ) ;
return ctx ;
}
/* static */
Context Context : : fromDevice ( const ocl : : Device & device )
{
Context ctx ;
ctx . p = Impl : : findOrCreateContext ( device ) ;
return ctx ;
}
/* static */
Context Context : : create ( const std : : string & configuration )
{
Context ctx ;
ctx . p = Impl : : findOrCreateContext ( configuration ) ;
return ctx ;
}
2021-05-15 00:48:50 +08:00
void * Context : : getOpenCLContextProperty ( int propertyId ) const
{
if ( p = = NULL )
return nullptr ;
: : size_t size = 0 ;
CV_OCL_CHECK ( clGetContextInfo ( p - > handle , CL_CONTEXT_PROPERTIES , 0 , NULL , & size ) ) ;
std : : vector < cl_context_properties > prop ( size / sizeof ( cl_context_properties ) , ( cl_context_properties ) 0 ) ;
CV_OCL_CHECK ( clGetContextInfo ( p - > handle , CL_CONTEXT_PROPERTIES , size , prop . data ( ) , NULL ) ) ;
for ( size_t i = 0 ; i < prop . size ( ) ; i + = 2 )
{
if ( prop [ i ] = = ( cl_context_properties ) propertyId )
{
CV_LOG_DEBUG ( NULL , " OpenCL: found context property= " < < propertyId < < " ) => " < < ( void * ) prop [ i + 1 ] ) ;
return ( void * ) prop [ i + 1 ] ;
}
}
return nullptr ;
}
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
bool Context : : useSVM ( ) const
{
Context : : Impl * i = p ;
CV_Assert ( i ) ;
if ( ! i - > svmInitialized )
i - > svmInit ( ) ;
return i - > svmEnabled ;
}
void Context : : setUseSVM ( bool enabled )
{
Context : : Impl * i = p ;
CV_Assert ( i ) ;
if ( ! i - > svmInitialized )
i - > svmInit ( ) ;
if ( enabled & & ! i - > svmAvailable )
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsError , " OpenCL Shared Virtual Memory (SVM) is not supported by OpenCL device " ) ;
2015-01-02 08:33:40 +08:00
}
i - > svmEnabled = enabled ;
}
# else
bool Context : : useSVM ( ) const { return false ; }
void Context : : setUseSVM ( bool enabled ) { CV_Assert ( ! enabled ) ; }
# endif
# ifdef HAVE_OPENCL_SVM
namespace svm {
const SVMCapabilities getSVMCapabilitites ( const ocl : : Context & context )
{
Context : : Impl * i = context . p ;
CV_Assert ( i ) ;
if ( ! i - > svmInitialized )
i - > svmInit ( ) ;
return i - > svmCapabilities ;
}
CV_EXPORTS const SVMFunctions * getSVMFunctions ( const ocl : : Context & context )
{
Context : : Impl * i = context . p ;
CV_Assert ( i ) ;
CV_Assert ( i - > svmInitialized ) ; // getSVMCapabilitites() must be called first
CV_Assert ( i - > svmFunctions . fn_clSVMAlloc ! = NULL ) ;
return & i - > svmFunctions ;
}
CV_EXPORTS bool useSVM ( UMatUsageFlags usageFlags )
{
if ( checkForceSVMUmatUsage ( ) )
return true ;
if ( checkDisableSVMUMatUsage ( ) )
return false ;
if ( ( usageFlags & USAGE_ALLOCATE_SHARED_MEMORY ) ! = 0 )
return true ;
return false ; // don't use SVM by default
}
} // namespace cv::ocl::svm
# endif // HAVE_OPENCL_SVM
2021-05-15 00:48:50 +08:00
Context : : UserContext : : ~ UserContext ( )
{
}
void Context : : setUserContext ( std : : type_index typeId , const std : : shared_ptr < Context : : UserContext > & userContext )
{
CV_Assert ( p ) ;
p - > setUserContext ( typeId , userContext ) ;
}
std : : shared_ptr < Context : : UserContext > Context : : getUserContext ( std : : type_index typeId )
{
CV_Assert ( p ) ;
return p - > getUserContext ( typeId ) ;
}
2015-01-02 08:33:40 +08:00
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
static void get_platform_name ( cl_platform_id id , String & name )
{
// get platform name string length
size_t sz = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetPlatformInfo ( id , CL_PLATFORM_NAME , 0 , 0 , & sz ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
// get platform name string
AutoBuffer < char > buf ( sz + 1 ) ;
2018-06-11 06:42:00 +08:00
CV_OCL_CHECK ( clGetPlatformInfo ( id , CL_PLATFORM_NAME , sz , buf . data ( ) , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
// just in case, ensure trailing zero for ASCIIZ string
buf [ sz ] = 0 ;
2018-06-11 06:42:00 +08:00
name = buf . data ( ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
}
/*
// Attaches OpenCL context to OpenCV
*/
void attachContext ( const String & platformName , void * platformID , void * context , void * deviceID )
{
2020-08-12 02:13:52 +08:00
auto ctx = OpenCLExecutionContext : : create ( platformName , platformID , context , deviceID ) ;
ctx . bind ( ) ;
}
/* static */
OpenCLExecutionContext OpenCLExecutionContext : : create (
const std : : string & platformName , void * platformID , void * context , void * deviceID
)
{
if ( ! haveOpenCL ( ) )
CV_Error ( cv : : Error : : OpenCLApiCallError , " OpenCL runtime is not available! " ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
2020-08-12 02:13:52 +08:00
cl_uint cnt = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetPlatformIDs ( 0 , 0 , & cnt ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
if ( cnt = = 0 )
2020-08-12 02:13:52 +08:00
CV_Error ( cv : : Error : : OpenCLApiCallError , " No OpenCL platform available! " ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
std : : vector < cl_platform_id > platforms ( cnt ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetPlatformIDs ( cnt , & platforms [ 0 ] , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
bool platformAvailable = false ;
// check if external platformName contained in list of available platforms in OpenCV
for ( unsigned int i = 0 ; i < cnt ; i + + )
{
String availablePlatformName ;
get_platform_name ( platforms [ i ] , availablePlatformName ) ;
// external platform is found in the list of available platforms
if ( platformName = = availablePlatformName )
{
platformAvailable = true ;
break ;
}
}
if ( ! platformAvailable )
2018-04-24 00:02:39 +08:00
CV_Error ( cv : : Error : : OpenCLApiCallError , " No matched platforms available! " ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
// check if platformID corresponds to platformName
String actualPlatformName ;
get_platform_name ( ( cl_platform_id ) platformID , actualPlatformName ) ;
if ( platformName ! = actualPlatformName )
2018-04-24 00:02:39 +08:00
CV_Error ( cv : : Error : : OpenCLApiCallError , " No matched platforms available! " ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
2020-08-12 02:13:52 +08:00
OpenCLExecutionContext ctx ;
ctx . p = std : : make_shared < OpenCLExecutionContext : : Impl > ( ( cl_platform_id ) platformID , ( cl_context ) context , ( cl_device_id ) deviceID ) ;
CV_OCL_CHECK ( clReleaseContext ( ( cl_context ) context ) ) ;
CV_OCL_CHECK ( clReleaseDevice ( ( cl_device_id ) deviceID ) ) ;
return ctx ;
}
2015-01-02 08:33:40 +08:00
2020-08-12 02:13:52 +08:00
void initializeContextFromHandle ( Context & ctx , void * _platform , void * _context , void * _device )
2013-12-02 05:50:24 +08:00
{
2020-08-12 02:13:52 +08:00
// internal call, less checks
cl_platform_id platformID = ( cl_platform_id ) _platform ;
2013-12-02 05:50:24 +08:00
cl_context context = ( cl_context ) _context ;
2020-08-12 02:13:52 +08:00
cl_device_id deviceID = ( cl_device_id ) _device ;
2013-12-02 05:50:24 +08:00
2020-09-26 03:22:12 +08:00
std : : string platformName = PlatformInfo ( & platformID ) . name ( ) ;
2013-12-02 05:50:24 +08:00
2020-08-12 02:13:52 +08:00
auto clExecCtx = OpenCLExecutionContext : : create ( platformName , platformID , context , deviceID ) ;
CV_Assert ( ! clExecCtx . empty ( ) ) ;
ctx = clExecCtx . getContext ( ) ;
2013-12-02 05:50:24 +08:00
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////////// Queue /////////////////////////////////////////////
2013-12-02 05:50:24 +08:00
2013-10-22 18:04:49 +08:00
struct Queue : : Impl
{
2017-09-08 23:57:26 +08:00
inline void __init ( )
2013-10-22 18:04:49 +08:00
{
refcount = 1 ;
2017-09-08 23:57:26 +08:00
handle = 0 ;
isProfilingQueue_ = false ;
}
Impl ( cl_command_queue q )
{
__init ( ) ;
handle = q ;
cl_command_queue_properties props = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetCommandQueueInfo ( handle , CL_QUEUE_PROPERTIES , sizeof ( cl_command_queue_properties ) , & props , NULL ) ) ;
2017-09-08 23:57:26 +08:00
isProfilingQueue_ = ! ! ( props & CL_QUEUE_PROFILING_ENABLE ) ;
}
Impl ( cl_command_queue q , bool isProfilingQueue )
{
__init ( ) ;
handle = q ;
isProfilingQueue_ = isProfilingQueue ;
}
Impl ( const Context & c , const Device & d , bool withProfiling = false )
{
__init ( ) ;
2014-02-01 00:23:01 +08:00
const Context * pc = & c ;
2013-10-22 18:04:49 +08:00
cl_context ch = ( cl_context ) pc - > ptr ( ) ;
if ( ! ch )
{
2014-02-01 00:23:01 +08:00
pc = & Context : : getDefault ( ) ;
2013-10-22 18:04:49 +08:00
ch = ( cl_context ) pc - > ptr ( ) ;
}
cl_device_id dh = ( cl_device_id ) d . ptr ( ) ;
if ( ! dh )
dh = ( cl_device_id ) pc - > device ( 0 ) . ptr ( ) ;
cl_int retval = 0 ;
2017-09-08 23:57:26 +08:00
cl_command_queue_properties props = withProfiling ? CL_QUEUE_PROFILING_ENABLE : 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_ ( handle = clCreateCommandQueue ( ch , dh , props , & retval ) , retval ) ;
2017-09-08 23:57:26 +08:00
isProfilingQueue_ = withProfiling ;
2013-10-22 18:04:49 +08:00
}
~ Impl ( )
{
2013-12-11 04:31:34 +08:00
# ifdef _WIN32
if ( ! cv : : __termination )
# endif
2013-10-22 18:04:49 +08:00
{
2013-12-11 04:31:34 +08:00
if ( handle )
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( handle ) ) ;
CV_OCL_DBG_CHECK ( clReleaseCommandQueue ( handle ) ) ;
2014-02-01 19:07:03 +08:00
handle = NULL ;
2013-12-11 04:31:34 +08:00
}
2013-10-22 18:04:49 +08:00
}
}
2017-09-08 23:57:26 +08:00
const cv : : ocl : : Queue & getProfilingQueue ( const cv : : ocl : : Queue & self )
{
if ( isProfilingQueue_ )
return self ;
if ( profiling_queue_ . ptr ( ) )
return profiling_queue_ ;
cl_context ctx = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetCommandQueueInfo ( handle , CL_QUEUE_CONTEXT , sizeof ( cl_context ) , & ctx , NULL ) ) ;
2017-09-08 23:57:26 +08:00
cl_device_id device = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetCommandQueueInfo ( handle , CL_QUEUE_DEVICE , sizeof ( cl_device_id ) , & device , NULL ) ) ;
2017-09-08 23:57:26 +08:00
cl_int result = CL_SUCCESS ;
cl_command_queue_properties props = CL_QUEUE_PROFILING_ENABLE ;
cl_command_queue q = clCreateCommandQueue ( ctx , device , props , & result ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( result , " clCreateCommandQueue(with CL_QUEUE_PROFILING_ENABLE) " ) ;
2017-09-08 23:57:26 +08:00
Queue queue ;
queue . p = new Impl ( q , true ) ;
profiling_queue_ = queue ;
return profiling_queue_ ;
}
2013-10-22 18:04:49 +08:00
IMPLEMENT_REFCOUNTABLE ( ) ;
cl_command_queue handle ;
2017-09-08 23:57:26 +08:00
bool isProfilingQueue_ ;
cv : : ocl : : Queue profiling_queue_ ;
2013-10-22 18:04:49 +08:00
} ;
2021-02-20 21:16:47 +08:00
Queue : : Queue ( ) CV_NOEXCEPT
2013-10-22 18:04:49 +08:00
{
p = 0 ;
}
2014-02-01 00:23:01 +08:00
Queue : : Queue ( const Context & c , const Device & d )
2013-10-22 18:04:49 +08:00
{
p = 0 ;
create ( c , d ) ;
}
Queue : : Queue ( const Queue & q )
{
p = q . p ;
if ( p )
p - > addref ( ) ;
}
Queue & Queue : : operator = ( const Queue & q )
{
Impl * newp = ( Impl * ) q . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Queue : : Queue ( Queue & & q ) CV_NOEXCEPT
{
p = q . p ;
q . p = nullptr ;
}
Queue & Queue : : operator = ( Queue & & q ) CV_NOEXCEPT
{
if ( this ! = & q ) {
if ( p )
p - > release ( ) ;
p = q . p ;
q . p = nullptr ;
}
return * this ;
}
2013-10-22 18:04:49 +08:00
Queue : : ~ Queue ( )
{
if ( p )
p - > release ( ) ;
}
2014-02-01 00:23:01 +08:00
bool Queue : : create ( const Context & c , const Device & d )
2013-10-22 18:04:49 +08:00
{
if ( p )
p - > release ( ) ;
p = new Impl ( c , d ) ;
return p - > handle ! = 0 ;
}
void Queue : : finish ( )
{
if ( p & & p - > handle )
2014-02-01 19:07:03 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( p - > handle ) ) ;
2014-02-01 19:07:03 +08:00
}
2013-10-22 18:04:49 +08:00
}
2017-09-08 23:57:26 +08:00
const Queue & Queue : : getProfilingQueue ( ) const
{
CV_Assert ( p ) ;
return p - > getProfilingQueue ( * this ) ;
}
2013-10-22 18:04:49 +08:00
void * Queue : : ptr ( ) const
{
return p ? p - > handle : 0 ;
}
Queue & Queue : : getDefault ( )
{
2020-08-12 02:13:52 +08:00
auto & c = OpenCLExecutionContext : : getCurrent ( ) ;
if ( ! c . empty ( ) )
{
auto & q = c . getQueue ( ) ;
return q ;
}
static Queue dummy ;
return dummy ;
2013-10-22 18:04:49 +08:00
}
static cl_command_queue getQueue ( const Queue & q )
{
cl_command_queue qq = ( cl_command_queue ) q . ptr ( ) ;
if ( ! qq )
qq = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
return qq ;
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////////// KernelArg /////////////////////////////////////////////
2021-02-21 01:56:04 +08:00
KernelArg : : KernelArg ( ) CV_NOEXCEPT
2014-03-08 05:29:27 +08:00
: flags ( 0 ) , m ( 0 ) , obj ( 0 ) , sz ( 0 ) , wscale ( 1 ) , iwscale ( 1 )
2013-11-19 00:48:00 +08:00
{
}
2014-03-08 05:29:27 +08:00
KernelArg : : KernelArg ( int _flags , UMat * _m , int _wscale , int _iwscale , const void * _obj , size_t _sz )
: flags ( _flags ) , m ( _m ) , obj ( _obj ) , sz ( _sz ) , wscale ( _wscale ) , iwscale ( _iwscale )
2013-10-22 18:04:49 +08:00
{
2017-02-14 19:58:52 +08:00
CV_Assert ( _flags = = LOCAL | | _flags = = CONSTANT | | _m ! = NULL ) ;
2013-10-22 18:04:49 +08:00
}
KernelArg KernelArg : : Constant ( const Mat & m )
{
CV_Assert ( m . isContinuous ( ) ) ;
2014-08-13 19:08:27 +08:00
return KernelArg ( CONSTANT , 0 , 0 , 0 , m . ptr ( ) , m . total ( ) * m . elemSize ( ) ) ;
2013-10-22 18:04:49 +08:00
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////////// Kernel /////////////////////////////////////////////
2013-10-22 18:04:49 +08:00
struct Kernel : : Impl
{
2014-02-23 18:59:06 +08:00
Impl ( const char * kname , const Program & prog ) :
2020-09-18 11:45:27 +08:00
refcount ( 1 ) , handle ( NULL ) , isInProgress ( false ) , isAsyncRun ( false ) , nu ( 0 )
2013-10-22 18:04:49 +08:00
{
cl_program ph = ( cl_program ) prog . ptr ( ) ;
cl_int retval = 0 ;
2016-08-25 22:26:46 +08:00
name = kname ;
2017-11-01 23:18:54 +08:00
if ( ph )
{
handle = clCreateKernel ( ph , kname , & retval ) ;
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clCreateKernel('%s') " , kname ) . c_str ( ) ) ;
}
2013-10-25 20:46:03 +08:00
for ( int i = 0 ; i < MAX_ARRS ; i + + )
u [ i ] = 0 ;
2013-12-02 03:14:45 +08:00
haveTempDstUMats = false ;
2018-11-24 23:36:43 +08:00
haveTempSrcUMats = false ;
2013-10-25 20:46:03 +08:00
}
void cleanupUMats ( )
{
2021-08-25 00:56:25 +08:00
bool exceptionOccurred = false ;
2013-10-25 20:46:03 +08:00
for ( int i = 0 ; i < MAX_ARRS ; i + + )
2021-08-25 00:56:25 +08:00
{
2013-10-25 20:46:03 +08:00
if ( u [ i ] )
{
if ( CV_XADD ( & u [ i ] - > urefcount , - 1 ) = = 1 )
2017-07-06 22:57:05 +08:00
{
u [ i ] - > flags | = UMatData : : ASYNC_CLEANUP ;
2021-08-25 00:56:25 +08:00
try
{
u [ i ] - > currAllocator - > deallocate ( u [ i ] ) ;
}
catch ( const std : : exception & exc )
{
// limited by legacy before C++11, therefore log and
// remember some exception occurred to throw below
CV_LOG_ERROR ( NULL , " OCL: Unexpected C++ exception in OpenCL Kernel::Impl::cleanupUMats(): " < < exc . what ( ) ) ;
exceptionOccurred = true ;
}
2017-07-06 22:57:05 +08:00
}
2013-10-25 20:46:03 +08:00
u [ i ] = 0 ;
}
2021-08-25 00:56:25 +08:00
}
2013-10-25 20:46:03 +08:00
nu = 0 ;
2013-12-02 03:14:45 +08:00
haveTempDstUMats = false ;
2018-11-24 23:36:43 +08:00
haveTempSrcUMats = false ;
2021-08-25 00:56:25 +08:00
CV_Assert ( ! exceptionOccurred ) ;
2013-10-25 20:46:03 +08:00
}
2013-12-02 03:14:45 +08:00
void addUMat ( const UMat & m , bool dst )
2013-10-25 20:46:03 +08:00
{
CV_Assert ( nu < MAX_ARRS & & m . u & & m . u - > urefcount > 0 ) ;
u [ nu ] = m . u ;
CV_XADD ( & m . u - > urefcount , 1 ) ;
nu + + ;
2013-12-02 03:14:45 +08:00
if ( dst & & m . u - > tempUMat ( ) )
haveTempDstUMats = true ;
2018-11-24 23:36:43 +08:00
if ( m . u - > originalUMatData = = NULL & & m . u - > tempUMat ( ) )
haveTempSrcUMats = true ; // UMat is created on RAW memory (without proper lifetime management, even from Mat)
2013-10-22 18:04:49 +08:00
}
2013-10-25 20:46:03 +08:00
2021-01-16 01:58:57 +08:00
/// Preserve image lifetime (while it is specified as Kernel argument)
void registerImageArgument ( int arg , const Image2D & image )
2014-04-15 07:09:17 +08:00
{
2021-01-16 01:58:57 +08:00
CV_CheckGE ( arg , 0 , " " ) ;
if ( arg < ( int ) shadow_images . size ( ) & & shadow_images [ arg ] . ptr ( ) ! = image . ptr ( ) ) // TODO future: replace ptr => impl (more strong check)
{
CV_Check ( arg , ! isInProgress , " ocl::Kernel: clearing of pending Image2D arguments is not allowed " ) ;
}
shadow_images . reserve ( MAX_ARRS ) ;
shadow_images . resize ( std : : max ( shadow_images . size ( ) , ( size_t ) arg + 1 ) ) ;
shadow_images [ arg ] = image ;
2014-04-15 07:09:17 +08:00
}
2017-07-06 18:25:32 +08:00
void finit ( cl_event e )
2013-10-22 18:04:49 +08:00
{
2017-07-06 18:25:32 +08:00
CV_UNUSED ( e ) ;
isInProgress = false ;
2021-08-25 00:56:25 +08:00
try
{
cleanupUMats ( ) ;
}
catch ( . . . )
{
release ( ) ;
throw ;
}
2013-10-22 18:04:49 +08:00
release ( ) ;
}
2017-09-09 00:42:59 +08:00
bool run ( int dims , size_t _globalsize [ ] , size_t _localsize [ ] ,
bool sync , int64 * timeNS , const Queue & q ) ;
2013-10-22 18:04:49 +08:00
~ Impl ( )
{
if ( handle )
2017-11-01 23:18:54 +08:00
{
CV_OCL_DBG_CHECK ( clReleaseKernel ( handle ) ) ;
}
2013-10-22 18:04:49 +08:00
}
IMPLEMENT_REFCOUNTABLE ( ) ;
2016-08-25 22:26:46 +08:00
cv : : String name ;
2013-10-22 18:04:49 +08:00
cl_kernel handle ;
2013-10-25 20:46:03 +08:00
enum { MAX_ARRS = 16 } ;
UMatData * u [ MAX_ARRS ] ;
2017-07-06 18:25:32 +08:00
bool isInProgress ;
2020-09-18 11:45:27 +08:00
bool isAsyncRun ; // true if kernel was scheduled in async mode
2013-10-25 20:46:03 +08:00
int nu ;
2021-01-16 01:58:57 +08:00
std : : vector < Image2D > shadow_images ;
2013-12-02 03:14:45 +08:00
bool haveTempDstUMats ;
2018-11-24 23:36:43 +08:00
bool haveTempSrcUMats ;
2013-10-22 18:04:49 +08:00
} ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
} } // namespace cv::ocl
extern " C " {
2013-10-22 18:04:49 +08:00
2017-07-06 18:25:32 +08:00
static void CL_CALLBACK oclCleanupCallback ( cl_event e , cl_int , void * p )
2013-10-22 18:04:49 +08:00
{
2018-07-28 18:29:26 +08:00
try
{
( ( cv : : ocl : : Kernel : : Impl * ) p ) - > finit ( e ) ;
}
catch ( const cv : : Exception & exc )
{
CV_LOG_ERROR ( NULL , " OCL: Unexpected OpenCV exception in OpenCL callback: " < < exc . what ( ) ) ;
}
catch ( const std : : exception & exc )
{
CV_LOG_ERROR ( NULL , " OCL: Unexpected C++ exception in OpenCL callback: " < < exc . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_ERROR ( NULL , " OCL: Unexpected unknown C++ exception in OpenCL callback " ) ;
}
2013-10-22 18:04:49 +08:00
}
}
namespace cv { namespace ocl {
2021-02-20 21:16:47 +08:00
Kernel : : Kernel ( ) CV_NOEXCEPT
2013-10-22 18:04:49 +08:00
{
p = 0 ;
}
Kernel : : Kernel ( const char * kname , const Program & prog )
{
p = 0 ;
create ( kname , prog ) ;
}
2014-02-01 00:23:01 +08:00
Kernel : : Kernel ( const char * kname , const ProgramSource & src ,
2013-11-19 00:48:00 +08:00
const String & buildopts , String * errmsg )
2013-10-22 18:04:49 +08:00
{
p = 0 ;
create ( kname , src , buildopts , errmsg ) ;
}
Kernel : : Kernel ( const Kernel & k )
{
p = k . p ;
if ( p )
p - > addref ( ) ;
}
Kernel & Kernel : : operator = ( const Kernel & k )
{
Impl * newp = ( Impl * ) k . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Kernel : : Kernel ( Kernel & & k ) CV_NOEXCEPT
{
p = k . p ;
k . p = nullptr ;
}
Kernel & Kernel : : operator = ( Kernel & & k ) CV_NOEXCEPT
{
if ( this ! = & k ) {
if ( p )
p - > release ( ) ;
p = k . p ;
k . p = nullptr ;
}
return * this ;
}
2013-10-22 18:04:49 +08:00
Kernel : : ~ Kernel ( )
{
if ( p )
p - > release ( ) ;
}
bool Kernel : : create ( const char * kname , const Program & prog )
{
if ( p )
p - > release ( ) ;
p = new Impl ( kname , prog ) ;
if ( p - > handle = = 0 )
{
p - > release ( ) ;
p = 0 ;
}
2014-10-03 19:17:28 +08:00
# ifdef CV_OPENCL_RUN_ASSERT // check kernel compilation fails
CV_Assert ( p ) ;
# endif
2013-10-22 18:04:49 +08:00
return p ! = 0 ;
}
2014-02-01 00:23:01 +08:00
bool Kernel : : create ( const char * kname , const ProgramSource & src ,
2013-11-19 00:48:00 +08:00
const String & buildopts , String * errmsg )
2013-10-22 18:04:49 +08:00
{
if ( p )
{
p - > release ( ) ;
p = 0 ;
}
2013-11-19 00:48:00 +08:00
String tempmsg ;
if ( ! errmsg ) errmsg = & tempmsg ;
2017-12-03 01:48:30 +08:00
const Program prog = Context : : getDefault ( ) . getProg ( src , buildopts , * errmsg ) ;
2013-10-22 18:04:49 +08:00
return create ( kname , prog ) ;
}
void * Kernel : : ptr ( ) const
{
return p ? p - > handle : 0 ;
}
2013-11-19 00:48:00 +08:00
bool Kernel : : empty ( ) const
2013-10-22 18:04:49 +08:00
{
2013-11-19 00:48:00 +08:00
return ptr ( ) = = 0 ;
}
2020-12-15 08:41:35 +08:00
static cv : : String dumpValue ( size_t sz , const void * p )
{
2021-08-30 19:46:14 +08:00
if ( ! p )
return " NULL " ;
2021-09-04 09:34:02 +08:00
if ( sz = = 2 )
return cv : : format ( " %d / %uu / 0x%04x " , * ( short * ) p , * ( unsigned short * ) p , * ( short * ) p ) ;
2020-12-15 08:41:35 +08:00
if ( sz = = 4 )
return cv : : format ( " %d / %uu / 0x%08x / %g " , * ( int * ) p , * ( int * ) p , * ( int * ) p , * ( float * ) p ) ;
if ( sz = = 8 )
return cv : : format ( " %lld / %lluu / 0x%16llx / %g " , * ( long long * ) p , * ( long long * ) p , * ( long long * ) p , * ( double * ) p ) ;
return cv : : format ( " %p " , p ) ;
}
2013-11-19 00:48:00 +08:00
int Kernel : : set ( int i , const void * value , size_t sz )
{
2014-01-30 05:47:25 +08:00
if ( ! p | | ! p - > handle )
return - 1 ;
2014-03-07 15:15:54 +08:00
if ( i < 0 )
return i ;
2013-10-25 20:46:03 +08:00
if ( i = = 0 )
p - > cleanupUMats ( ) ;
2014-02-05 23:10:02 +08:00
cl_int retval = clSetKernelArg ( p - > handle , ( cl_uint ) i , sz , value ) ;
2020-12-15 08:41:35 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clSetKernelArg('%s', arg_index=%d, size=%d, value=%s) " , p - > name . c_str ( ) , ( int ) i , ( int ) sz , dumpValue ( sz , value ) . c_str ( ) ) . c_str ( ) ) ;
2014-02-05 23:10:02 +08:00
if ( retval ! = CL_SUCCESS )
2013-11-19 00:48:00 +08:00
return - 1 ;
return i + 1 ;
2013-10-22 18:04:49 +08:00
}
2014-01-16 15:57:57 +08:00
int Kernel : : set ( int i , const Image2D & image2D )
{
cl_mem h = ( cl_mem ) image2D . ptr ( ) ;
2021-01-16 01:58:57 +08:00
int res = set ( i , & h , sizeof ( h ) ) ;
if ( res > = 0 )
p - > registerImageArgument ( i , image2D ) ;
return res ;
2014-01-16 15:57:57 +08:00
}
2013-11-19 00:48:00 +08:00
int Kernel : : set ( int i , const UMat & m )
2013-10-22 18:04:49 +08:00
{
2017-08-09 16:54:55 +08:00
return set ( i , KernelArg ( KernelArg : : READ_WRITE , ( UMat * ) & m ) ) ;
2013-10-22 18:04:49 +08:00
}
2013-11-19 00:48:00 +08:00
int Kernel : : set ( int i , const KernelArg & arg )
2013-10-22 18:04:49 +08:00
{
2013-11-19 00:48:00 +08:00
if ( ! p | | ! p - > handle )
return - 1 ;
2014-03-07 15:15:54 +08:00
if ( i < 0 )
2019-01-30 19:48:53 +08:00
{
CV_LOG_ERROR ( NULL , cv : : format ( " OpenCL: Kernel(%s)::set(arg_index=%d): negative arg_index " ,
p - > name . c_str ( ) , ( int ) i ) ) ;
2014-03-07 15:15:54 +08:00
return i ;
2019-01-30 19:48:53 +08:00
}
2013-11-27 22:30:07 +08:00
if ( i = = 0 )
p - > cleanupUMats ( ) ;
2017-11-01 23:18:54 +08:00
cl_int status = 0 ;
2013-10-22 18:04:49 +08:00
if ( arg . m )
{
2018-09-21 23:12:35 +08:00
AccessFlag accessFlags = ( ( arg . flags & KernelArg : : READ_ONLY ) ? ACCESS_READ : static_cast < AccessFlag > ( 0 ) ) |
( ( arg . flags & KernelArg : : WRITE_ONLY ) ? ACCESS_WRITE : static_cast < AccessFlag > ( 0 ) ) ;
2013-11-28 03:30:29 +08:00
bool ptronly = ( arg . flags & KernelArg : : PTR_ONLY ) ! = 0 ;
2019-01-30 19:48:53 +08:00
if ( ptronly & & arg . m - > empty ( ) )
{
cl_mem h_null = ( cl_mem ) NULL ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) i , sizeof ( h_null ) , & h_null ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, cl_mem=NULL) " , p - > name . c_str ( ) , ( int ) i ) . c_str ( ) ) ;
return i + 1 ;
}
2013-11-19 00:48:00 +08:00
cl_mem h = ( cl_mem ) arg . m - > handle ( accessFlags ) ;
2014-01-30 05:47:25 +08:00
if ( ! h )
{
2019-01-30 19:48:53 +08:00
CV_LOG_ERROR ( NULL , cv : : format ( " OpenCL: Kernel(%s)::set(arg_index=%d, flags=%d): can't create cl_mem handle for passed UMat buffer (addr=%p) " ,
p - > name . c_str ( ) , ( int ) i , ( int ) arg . flags , arg . m ) ) ;
2014-01-30 05:47:25 +08:00
p - > release ( ) ;
p = 0 ;
return - 1 ;
}
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( arg . m - > u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
const Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
uchar * & svmDataPtr = ( uchar * & ) arg . m - > u - > handle ;
CV_OPENCL_SVM_TRACE_P ( " clSetKernelArgSVMPointer: %p \n " , svmDataPtr ) ;
# if 1 // TODO
2017-11-01 23:18:54 +08:00
status = svmFns - > fn_clSetKernelArgSVMPointer ( p - > handle , ( cl_uint ) i , svmDataPtr ) ;
2015-01-02 08:33:40 +08:00
# else
2017-11-01 23:18:54 +08:00
status = svmFns - > fn_clSetKernelArgSVMPointer ( p - > handle , ( cl_uint ) i , & svmDataPtr ) ;
2015-01-02 08:33:40 +08:00
# endif
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArgSVMPointer('%s', arg_index=%d, ptr=%p) " , p - > name . c_str ( ) , ( int ) i , ( void * ) svmDataPtr ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
}
else
# endif
{
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) i , sizeof ( h ) , & h ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, cl_mem=%p) " , p - > name . c_str ( ) , ( int ) i , ( void * ) h ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
}
2013-11-28 03:30:29 +08:00
if ( ptronly )
2015-01-02 08:33:40 +08:00
{
i + + ;
}
2013-11-28 03:30:29 +08:00
else if ( arg . m - > dims < = 2 )
2013-10-22 18:04:49 +08:00
{
2013-11-19 00:48:00 +08:00
UMat2D u2d ( * arg . m ) ;
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 1 ) , sizeof ( u2d . step ) , & u2d . step ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, step_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 1 ) , ( int ) u2d . step ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 2 ) , sizeof ( u2d . offset ) , & u2d . offset ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, offset_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 2 ) , ( int ) u2d . offset ) . c_str ( ) ) ;
2013-11-19 00:48:00 +08:00
i + = 3 ;
if ( ! ( arg . flags & KernelArg : : NO_SIZE ) )
{
2014-03-08 05:29:27 +08:00
int cols = u2d . cols * arg . wscale / arg . iwscale ;
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) i , sizeof ( u2d . rows ) , & u2d . rows ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, rows_value=%d) " , p - > name . c_str ( ) , ( int ) i , ( int ) u2d . rows ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 1 ) , sizeof ( cols ) , & cols ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, cols_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 1 ) , ( int ) cols ) . c_str ( ) ) ;
2013-11-19 00:48:00 +08:00
i + = 2 ;
}
2013-10-22 18:04:49 +08:00
}
else
{
2013-11-19 00:48:00 +08:00
UMat3D u3d ( * arg . m ) ;
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 1 ) , sizeof ( u3d . slicestep ) , & u3d . slicestep ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, slicestep_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 1 ) , ( int ) u3d . slicestep ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 2 ) , sizeof ( u3d . step ) , & u3d . step ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, step_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 2 ) , ( int ) u3d . step ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 3 ) , sizeof ( u3d . offset ) , & u3d . offset ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, offset_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 3 ) , ( int ) u3d . offset ) . c_str ( ) ) ;
2013-11-19 00:48:00 +08:00
i + = 4 ;
if ( ! ( arg . flags & KernelArg : : NO_SIZE ) )
{
2014-03-08 05:29:27 +08:00
int cols = u3d . cols * arg . wscale / arg . iwscale ;
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) i , sizeof ( u3d . slices ) , & u3d . slices ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, slices_value=%d) " , p - > name . c_str ( ) , ( int ) i , ( int ) u3d . slices ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 1 ) , sizeof ( u3d . rows ) , & u3d . rows ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, rows_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 1 ) , ( int ) u3d . rows ) . c_str ( ) ) ;
status = clSetKernelArg ( p - > handle , ( cl_uint ) ( i + 2 ) , sizeof ( u3d . cols ) , & cols ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, cols_value=%d) " , p - > name . c_str ( ) , ( int ) ( i + 2 ) , ( int ) cols ) . c_str ( ) ) ;
2013-11-19 00:48:00 +08:00
i + = 3 ;
}
2013-10-22 18:04:49 +08:00
}
2018-09-21 23:12:35 +08:00
p - > addUMat ( * arg . m , ! ! ( accessFlags & ACCESS_WRITE ) ) ;
2013-11-19 00:48:00 +08:00
return i ;
2013-10-22 18:04:49 +08:00
}
2017-11-01 23:18:54 +08:00
status = clSetKernelArg ( p - > handle , ( cl_uint ) i , arg . sz , arg . obj ) ;
CV_OCL_DBG_CHECK_RESULT ( status , cv : : format ( " clSetKernelArg('%s', arg_index=%d, size=%d, obj=%p) " , p - > name . c_str ( ) , ( int ) i , ( int ) arg . sz , ( void * ) arg . obj ) . c_str ( ) ) ;
2013-11-19 00:48:00 +08:00
return i + 1 ;
2013-10-22 18:04:49 +08:00
}
2013-12-01 07:12:19 +08:00
bool Kernel : : run ( int dims , size_t _globalsize [ ] , size_t _localsize [ ] ,
2013-10-25 20:46:03 +08:00
bool sync , const Queue & q )
2013-10-22 18:04:49 +08:00
{
2017-09-09 00:42:59 +08:00
if ( ! p )
2013-11-19 00:48:00 +08:00
return false ;
2013-11-27 22:30:07 +08:00
2017-09-19 00:04:46 +08:00
size_t globalsize [ CV_MAX_DIM ] = { 1 , 1 , 1 } ;
2013-12-01 07:12:19 +08:00
size_t total = 1 ;
2017-09-09 00:42:59 +08:00
CV_Assert ( _globalsize ! = NULL ) ;
2013-12-01 07:12:19 +08:00
for ( int i = 0 ; i < dims ; i + + )
{
2013-12-02 22:27:08 +08:00
size_t val = _localsize ? _localsize [ i ] :
2014-03-08 05:29:27 +08:00
dims = = 1 ? 64 : dims = = 2 ? ( i = = 0 ? 256 : 8 ) : dims = = 3 ? ( 8 > > ( int ) ( i > 0 ) ) : 1 ;
2013-12-02 00:58:30 +08:00
CV_Assert ( val > 0 ) ;
2013-12-01 07:12:19 +08:00
total * = _globalsize [ i ] ;
2018-09-26 21:27:00 +08:00
if ( _globalsize [ i ] = = 1 & & ! _localsize )
2017-09-19 00:04:46 +08:00
val = 1 ;
globalsize [ i ] = divUp ( _globalsize [ i ] , ( unsigned int ) val ) * val ;
2013-12-01 07:12:19 +08:00
}
2017-09-19 00:04:46 +08:00
CV_Assert ( total > 0 ) ;
2017-09-09 00:42:59 +08:00
return p - > run ( dims , globalsize , _localsize , sync , NULL , q ) ;
}
2021-09-05 00:27:51 +08:00
bool Kernel : : run_ ( int dims , size_t _globalsize [ ] , size_t _localsize [ ] ,
bool sync , const Queue & q )
{
CV_Assert ( p ) ;
return p - > run ( dims , _globalsize , _localsize , sync , NULL , q ) ;
}
2020-09-18 11:45:27 +08:00
static bool isRaiseErrorOnReuseAsyncKernel ( )
{
static bool initialized = false ;
static bool value = false ;
if ( ! initialized )
{
value = cv : : utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_RAISE_ERROR_REUSE_ASYNC_KERNEL " , false ) ;
initialized = true ;
}
return value ;
}
2017-09-09 00:42:59 +08:00
bool Kernel : : Impl : : run ( int dims , size_t globalsize [ ] , size_t localsize [ ] ,
bool sync , int64 * timeNS , const Queue & q )
{
2018-09-28 03:39:06 +08:00
CV_INSTRUMENT_REGION_OPENCL_RUN ( name . c_str ( ) ) ;
2017-09-09 00:42:59 +08:00
2020-09-18 11:45:27 +08:00
if ( ! handle )
{
CV_LOG_ERROR ( NULL , " OpenCL kernel has zero handle: " < < name ) ;
2017-09-09 00:42:59 +08:00
return false ;
2020-09-18 11:45:27 +08:00
}
if ( isAsyncRun )
{
CV_LOG_ERROR ( NULL , " OpenCL kernel can't be reused in async mode: " < < name ) ;
if ( isRaiseErrorOnReuseAsyncKernel ( ) )
CV_Assert ( 0 ) ;
return false ; // OpenCV 5.0: raise error
}
isAsyncRun = ! sync ;
if ( isInProgress )
{
CV_LOG_ERROR ( NULL , " Previous OpenCL kernel launch is not finished: " < < name ) ;
if ( isRaiseErrorOnReuseAsyncKernel ( ) )
CV_Assert ( 0 ) ;
return false ; // OpenCV 5.0: raise error
}
2017-09-09 00:42:59 +08:00
2021-09-04 09:34:02 +08:00
# if CV_OPENCL_SYNC_RUN_KERNELS
sync = true ;
# endif
2017-09-09 00:42:59 +08:00
cl_command_queue qq = getQueue ( q ) ;
if ( haveTempDstUMats )
sync = true ;
2018-11-24 23:36:43 +08:00
if ( haveTempSrcUMats )
sync = true ;
2017-09-09 00:42:59 +08:00
if ( timeNS )
2013-12-02 03:14:45 +08:00
sync = true ;
2017-07-06 18:25:32 +08:00
cl_event asyncEvent = 0 ;
2017-09-09 00:42:59 +08:00
cl_int retval = clEnqueueNDRangeKernel ( qq , handle , ( cl_uint ) dims ,
NULL , globalsize , localsize , 0 , 0 ,
( sync & & ! timeNS ) ? 0 : & asyncEvent ) ;
2017-11-01 23:18:54 +08:00
# if !CV_OPENCL_SHOW_RUN_KERNELS
2014-10-09 17:50:57 +08:00
if ( retval ! = CL_SUCCESS )
2017-11-01 23:18:54 +08:00
# endif
2014-10-09 17:50:57 +08:00
{
2018-08-06 23:16:21 +08:00
cv : : String msg = cv : : format ( " clEnqueueNDRangeKernel('%s', dims=%d, globalsize=%zux%zux%zu, localsize=%s) sync=%s " , name . c_str ( ) , ( int ) dims ,
2017-11-01 23:18:54 +08:00
globalsize [ 0 ] , ( dims > 1 ? globalsize [ 1 ] : 1 ) , ( dims > 2 ? globalsize [ 2 ] : 1 ) ,
2018-08-06 23:16:21 +08:00
( localsize ? cv : : format ( " %zux%zux%zu " , localsize [ 0 ] , ( dims > 1 ? localsize [ 1 ] : 1 ) , ( dims > 2 ? localsize [ 2 ] : 1 ) ) : cv : : String ( " NULL " ) ) . c_str ( ) ,
2017-11-01 23:18:54 +08:00
sync ? " true " : " false "
) ;
if ( retval ! = CL_SUCCESS )
{
msg = CV_OCL_API_ERROR_MSG ( retval , msg . c_str ( ) ) ;
}
# if CV_OPENCL_TRACE_CHECK
CV_OCL_TRACE_CHECK_RESULT ( retval , msg . c_str ( ) ) ;
# else
printf ( " %s \n " , msg . c_str ( ) ) ;
2014-10-09 17:50:57 +08:00
fflush ( stdout ) ;
# endif
2017-11-01 23:18:54 +08:00
}
2017-09-09 00:42:59 +08:00
if ( sync | | retval ! = CL_SUCCESS )
2013-10-22 18:04:49 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( qq ) ) ;
2017-09-09 00:42:59 +08:00
if ( timeNS )
{
if ( retval = = CL_SUCCESS )
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clWaitForEvents ( 1 , & asyncEvent ) ) ;
2017-09-09 00:42:59 +08:00
cl_ulong startTime , stopTime ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetEventProfilingInfo ( asyncEvent , CL_PROFILING_COMMAND_START , sizeof ( startTime ) , & startTime , NULL ) ) ;
CV_OCL_CHECK ( clGetEventProfilingInfo ( asyncEvent , CL_PROFILING_COMMAND_END , sizeof ( stopTime ) , & stopTime , NULL ) ) ;
2017-09-09 00:42:59 +08:00
* timeNS = ( int64 ) ( stopTime - startTime ) ;
}
else
{
* timeNS = - 1 ;
}
}
cleanupUMats ( ) ;
2013-10-22 18:04:49 +08:00
}
else
{
2017-09-09 00:42:59 +08:00
addref ( ) ;
isInProgress = true ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clSetEventCallback ( asyncEvent , CL_COMPLETE , oclCleanupCallback , this ) ) ;
2013-10-22 18:04:49 +08:00
}
2017-07-06 18:25:32 +08:00
if ( asyncEvent )
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clReleaseEvent ( asyncEvent ) ) ;
2014-02-01 19:07:03 +08:00
return retval = = CL_SUCCESS ;
2013-10-22 18:04:49 +08:00
}
2013-11-19 00:48:00 +08:00
bool Kernel : : runTask ( bool sync , const Queue & q )
2013-10-22 18:04:49 +08:00
{
2017-07-06 18:25:32 +08:00
if ( ! p | | ! p - > handle | | p - > isInProgress )
2013-11-19 00:48:00 +08:00
return false ;
2013-10-22 18:04:49 +08:00
cl_command_queue qq = getQueue ( q ) ;
2017-07-06 18:25:32 +08:00
cl_event asyncEvent = 0 ;
cl_int retval = clEnqueueTask ( qq , p - > handle , 0 , 0 , sync ? 0 : & asyncEvent ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clEnqueueTask('%s') sync=%s " , p - > name . c_str ( ) , sync ? " true " : " false " ) . c_str ( ) ) ;
if ( sync | | retval ! = CL_SUCCESS )
2013-10-22 18:04:49 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( qq ) ) ;
2013-10-25 20:46:03 +08:00
p - > cleanupUMats ( ) ;
2013-10-22 18:04:49 +08:00
}
else
{
p - > addref ( ) ;
2017-07-06 18:25:32 +08:00
p - > isInProgress = true ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clSetEventCallback ( asyncEvent , CL_COMPLETE , oclCleanupCallback , p ) ) ;
2013-10-22 18:04:49 +08:00
}
2017-07-06 18:25:32 +08:00
if ( asyncEvent )
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clReleaseEvent ( asyncEvent ) ) ;
2014-02-01 19:07:03 +08:00
return retval = = CL_SUCCESS ;
2013-10-22 18:04:49 +08:00
}
2017-09-09 00:42:59 +08:00
int64 Kernel : : runProfiling ( int dims , size_t globalsize [ ] , size_t localsize [ ] , const Queue & q_ )
{
CV_Assert ( p & & p - > handle & & ! p - > isInProgress ) ;
Queue q = q_ . ptr ( ) ? q_ : Queue : : getDefault ( ) ;
CV_Assert ( q . ptr ( ) ) ;
q . finish ( ) ; // call clFinish() on base queue
Queue profilingQueue = q . getProfilingQueue ( ) ;
int64 timeNs = - 1 ;
bool res = p - > run ( dims , globalsize , localsize , true , & timeNs , profilingQueue ) ;
return res ? timeNs : - 1 ;
}
2013-10-22 18:04:49 +08:00
size_t Kernel : : workGroupSize ( ) const
{
2014-02-01 19:07:03 +08:00
if ( ! p | | ! p - > handle )
2013-10-22 18:04:49 +08:00
return 0 ;
size_t val = 0 , retsz = 0 ;
cl_device_id dev = ( cl_device_id ) Device : : getDefault ( ) . ptr ( ) ;
2017-11-01 23:18:54 +08:00
cl_int status = clGetKernelWorkGroupInfo ( p - > handle , dev , CL_KERNEL_WORK_GROUP_SIZE , sizeof ( val ) , & val , & retsz ) ;
CV_OCL_CHECK_RESULT ( status , " clGetKernelWorkGroupInfo(CL_KERNEL_WORK_GROUP_SIZE) " ) ;
return status = = CL_SUCCESS ? val : 0 ;
2013-10-22 18:04:49 +08:00
}
2013-12-30 17:06:32 +08:00
size_t Kernel : : preferedWorkGroupSizeMultiple ( ) const
{
2014-02-01 19:07:03 +08:00
if ( ! p | | ! p - > handle )
2013-12-30 17:06:32 +08:00
return 0 ;
size_t val = 0 , retsz = 0 ;
cl_device_id dev = ( cl_device_id ) Device : : getDefault ( ) . ptr ( ) ;
2017-11-01 23:18:54 +08:00
cl_int status = clGetKernelWorkGroupInfo ( p - > handle , dev , CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE , sizeof ( val ) , & val , & retsz ) ;
CV_OCL_CHECK_RESULT ( status , " clGetKernelWorkGroupInfo(CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE) " ) ;
return status = = CL_SUCCESS ? val : 0 ;
2013-12-30 17:06:32 +08:00
}
2013-10-22 18:04:49 +08:00
bool Kernel : : compileWorkGroupSize ( size_t wsz [ ] ) const
{
2014-02-01 19:07:03 +08:00
if ( ! p | | ! p - > handle | | ! wsz )
2013-10-22 18:04:49 +08:00
return 0 ;
size_t retsz = 0 ;
cl_device_id dev = ( cl_device_id ) Device : : getDefault ( ) . ptr ( ) ;
2017-11-01 23:18:54 +08:00
cl_int status = clGetKernelWorkGroupInfo ( p - > handle , dev , CL_KERNEL_COMPILE_WORK_GROUP_SIZE , sizeof ( wsz [ 0 ] ) * 3 , wsz , & retsz ) ;
CV_OCL_CHECK_RESULT ( status , " clGetKernelWorkGroupInfo(CL_KERNEL_COMPILE_WORK_GROUP_SIZE) " ) ;
return status = = CL_SUCCESS ;
2013-10-22 18:04:49 +08:00
}
size_t Kernel : : localMemSize ( ) const
{
2014-02-01 19:07:03 +08:00
if ( ! p | | ! p - > handle )
2013-10-22 18:04:49 +08:00
return 0 ;
size_t retsz = 0 ;
cl_ulong val = 0 ;
cl_device_id dev = ( cl_device_id ) Device : : getDefault ( ) . ptr ( ) ;
2017-11-01 23:18:54 +08:00
cl_int status = clGetKernelWorkGroupInfo ( p - > handle , dev , CL_KERNEL_LOCAL_MEM_SIZE , sizeof ( val ) , & val , & retsz ) ;
CV_OCL_CHECK_RESULT ( status , " clGetKernelWorkGroupInfo(CL_KERNEL_LOCAL_MEM_SIZE) " ) ;
return status = = CL_SUCCESS ? ( size_t ) val : 0 ;
2013-10-22 18:04:49 +08:00
}
2017-09-07 21:40:03 +08:00
///////////////////////////////////////// ProgramSource ///////////////////////////////////////////////
struct ProgramSource : : Impl
{
2017-12-03 01:48:30 +08:00
IMPLEMENT_REFCOUNTABLE ( ) ;
enum KIND {
PROGRAM_SOURCE_CODE = 0 ,
PROGRAM_BINARIES ,
PROGRAM_SPIR ,
PROGRAM_SPIRV
} kind_ ;
2017-09-07 21:40:03 +08:00
Impl ( const String & src )
{
2017-12-03 01:48:30 +08:00
init ( PROGRAM_SOURCE_CODE , cv : : String ( ) , cv : : String ( ) ) ;
initFromSource ( src , cv : : String ( ) ) ;
2017-09-07 21:40:03 +08:00
}
Impl ( const String & module , const String & name , const String & codeStr , const String & codeHash )
{
2017-12-03 01:48:30 +08:00
init ( PROGRAM_SOURCE_CODE , module , name ) ;
initFromSource ( codeStr , codeHash ) ;
2017-09-07 21:40:03 +08:00
}
2017-12-03 01:48:30 +08:00
/// reset fields
void init ( enum KIND kind , const String & module , const String & name )
2017-09-07 21:40:03 +08:00
{
refcount = 1 ;
2017-12-03 01:48:30 +08:00
kind_ = kind ;
2017-09-07 21:40:03 +08:00
module_ = module ;
name_ = name ;
2017-12-03 01:48:30 +08:00
sourceAddr_ = NULL ;
sourceSize_ = 0 ;
2017-09-07 21:40:03 +08:00
isHashUpdated = false ;
2017-12-03 01:48:30 +08:00
}
void initFromSource ( const String & codeStr , const String & codeHash )
{
codeStr_ = codeStr ;
sourceHash_ = codeHash ;
if ( sourceHash_ . empty ( ) )
2017-09-07 21:40:03 +08:00
{
updateHash ( ) ;
2017-12-03 01:48:30 +08:00
}
else
{
isHashUpdated = true ;
2017-09-07 21:40:03 +08:00
}
}
2017-12-03 01:48:30 +08:00
void updateHash ( const char * hashStr = NULL )
2017-09-07 21:40:03 +08:00
{
2017-12-03 01:48:30 +08:00
if ( hashStr )
{
sourceHash_ = cv : : String ( hashStr ) ;
isHashUpdated = true ;
return ;
}
uint64 hash = 0 ;
switch ( kind_ )
{
case PROGRAM_SOURCE_CODE :
if ( sourceAddr_ )
{
CV_Assert ( codeStr_ . empty ( ) ) ;
hash = crc64 ( sourceAddr_ , sourceSize_ ) ; // static storage
}
else
{
CV_Assert ( ! codeStr_ . empty ( ) ) ;
hash = crc64 ( ( uchar * ) codeStr_ . c_str ( ) , codeStr_ . size ( ) ) ;
}
break ;
case PROGRAM_BINARIES :
case PROGRAM_SPIR :
case PROGRAM_SPIRV :
hash = crc64 ( sourceAddr_ , sourceSize_ ) ;
break ;
default :
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsInternal , " Internal error " ) ;
2017-12-03 01:48:30 +08:00
}
2018-09-19 20:49:59 +08:00
sourceHash_ = cv : : format ( " %08jx " , ( uintmax_t ) hash ) ;
2017-09-07 21:40:03 +08:00
isHashUpdated = true ;
}
2017-12-03 01:48:30 +08:00
Impl ( enum KIND kind ,
const String & module , const String & name ,
const unsigned char * binary , const size_t size ,
const cv : : String & buildOptions = cv : : String ( ) )
{
init ( kind , module , name ) ;
sourceAddr_ = binary ;
sourceSize_ = size ;
buildOptions_ = buildOptions ;
}
static ProgramSource fromSourceWithStaticLifetime ( const String & module , const String & name ,
const char * sourceCodeStaticStr , const char * hashStaticStr ,
const cv : : String & buildOptions )
{
ProgramSource result ;
result . p = new Impl ( PROGRAM_SOURCE_CODE , module , name ,
( const unsigned char * ) sourceCodeStaticStr , strlen ( sourceCodeStaticStr ) , buildOptions ) ;
result . p - > updateHash ( hashStaticStr ) ;
return result ;
}
static ProgramSource fromBinary ( const String & module , const String & name ,
const unsigned char * binary , const size_t size ,
const cv : : String & buildOptions )
{
ProgramSource result ;
result . p = new Impl ( PROGRAM_BINARIES , module , name , binary , size , buildOptions ) ;
return result ;
}
static ProgramSource fromSPIR ( const String & module , const String & name ,
const unsigned char * binary , const size_t size ,
const cv : : String & buildOptions )
{
ProgramSource result ;
result . p = new Impl ( PROGRAM_SPIR , module , name , binary , size , buildOptions ) ;
return result ;
}
2017-09-07 21:40:03 +08:00
String module_ ;
String name_ ;
2017-12-03 01:48:30 +08:00
2017-09-07 21:40:03 +08:00
// TODO std::vector<ProgramSource> includes_;
2017-12-03 01:48:30 +08:00
String codeStr_ ; // PROGRAM_SOURCE_CODE only
const unsigned char * sourceAddr_ ;
size_t sourceSize_ ;
2017-09-07 21:40:03 +08:00
2017-12-03 01:48:30 +08:00
cv : : String buildOptions_ ;
String sourceHash_ ;
2017-09-07 21:40:03 +08:00
bool isHashUpdated ;
2017-12-03 01:48:30 +08:00
friend struct Program : : Impl ;
friend struct internal : : ProgramEntry ;
friend struct Context : : Impl ;
2017-09-07 21:40:03 +08:00
} ;
2021-02-20 21:16:47 +08:00
ProgramSource : : ProgramSource ( ) CV_NOEXCEPT
2017-09-07 21:40:03 +08:00
{
p = 0 ;
}
ProgramSource : : ProgramSource ( const String & module , const String & name , const String & codeStr , const String & codeHash )
{
p = new Impl ( module , name , codeStr , codeHash ) ;
}
ProgramSource : : ProgramSource ( const char * prog )
{
p = new Impl ( prog ) ;
}
ProgramSource : : ProgramSource ( const String & prog )
{
p = new Impl ( prog ) ;
}
ProgramSource : : ~ ProgramSource ( )
{
if ( p )
p - > release ( ) ;
}
ProgramSource : : ProgramSource ( const ProgramSource & prog )
{
p = prog . p ;
if ( p )
p - > addref ( ) ;
}
ProgramSource & ProgramSource : : operator = ( const ProgramSource & prog )
{
Impl * newp = ( Impl * ) prog . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
ProgramSource : : ProgramSource ( ProgramSource & & prog ) CV_NOEXCEPT
{
p = prog . p ;
prog . p = nullptr ;
}
ProgramSource & ProgramSource : : operator = ( ProgramSource & & prog ) CV_NOEXCEPT
{
if ( this ! = & prog ) {
if ( p )
p - > release ( ) ;
p = prog . p ;
prog . p = nullptr ;
}
return * this ;
}
2017-09-07 21:40:03 +08:00
const String & ProgramSource : : source ( ) const
{
CV_Assert ( p ) ;
2017-12-03 01:48:30 +08:00
CV_Assert ( p - > kind_ = = Impl : : PROGRAM_SOURCE_CODE ) ;
CV_Assert ( p - > sourceAddr_ = = NULL ) ; // method returns reference - can't construct temporary object
2017-09-07 21:40:03 +08:00
return p - > codeStr_ ;
}
ProgramSource : : hash_t ProgramSource : : hash ( ) const
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsNotImplemented , " Removed method: ProgramSource::hash() " ) ;
2017-12-03 01:48:30 +08:00
}
ProgramSource ProgramSource : : fromBinary ( const String & module , const String & name ,
const unsigned char * binary , const size_t size ,
const cv : : String & buildOptions )
{
CV_Assert ( binary ) ;
CV_Assert ( size > 0 ) ;
return Impl : : fromBinary ( module , name , binary , size , buildOptions ) ;
}
ProgramSource ProgramSource : : fromSPIR ( const String & module , const String & name ,
const unsigned char * binary , const size_t size ,
const cv : : String & buildOptions )
{
CV_Assert ( binary ) ;
CV_Assert ( size > 0 ) ;
return Impl : : fromBinary ( module , name , binary , size , buildOptions ) ;
2017-09-07 21:40:03 +08:00
}
internal : : ProgramEntry : : operator ProgramSource & ( ) const
{
if ( this - > pProgramSource = = NULL )
{
cv : : AutoLock lock ( cv : : getInitializationMutex ( ) ) ;
if ( this - > pProgramSource = = NULL )
{
2017-12-03 01:48:30 +08:00
ProgramSource ps = ProgramSource : : Impl : : fromSourceWithStaticLifetime ( this - > module , this - > name , this - > programCode , this - > programHash , cv : : String ( ) ) ;
ProgramSource * ptr = new ProgramSource ( ps ) ;
const_cast < ProgramEntry * > ( this ) - > pProgramSource = ptr ;
2017-09-07 21:40:03 +08:00
}
}
return * this - > pProgramSource ;
}
2014-02-01 19:07:03 +08:00
/////////////////////////////////////////// Program /////////////////////////////////////////////
2013-10-22 18:04:49 +08:00
2017-12-03 01:48:30 +08:00
static
cv : : String joinBuildOptions ( const cv : : String & a , const cv : : String & b )
{
if ( b . empty ( ) )
return a ;
if ( a . empty ( ) )
return b ;
if ( b [ 0 ] = = ' ' )
return a + b ;
return a + ( cv : : String ( " " ) + b ) ;
}
2013-10-22 18:04:49 +08:00
struct Program : : Impl
{
2017-12-03 01:48:30 +08:00
IMPLEMENT_REFCOUNTABLE ( ) ;
2017-12-05 18:26:38 +08:00
Impl ( const ProgramSource & src ,
2017-09-07 21:41:19 +08:00
const String & _buildflags , String & errmsg ) :
2017-12-05 18:26:38 +08:00
refcount ( 1 ) ,
handle ( NULL ) ,
buildflags ( _buildflags )
2013-10-22 18:04:49 +08:00
{
2017-12-05 18:26:38 +08:00
const ProgramSource : : Impl * src_ = src . getImpl ( ) ;
CV_Assert ( src_ ) ;
sourceModule_ = src_ - > module_ ;
sourceName_ = src_ - > name_ ;
2017-10-12 19:23:45 +08:00
const Context ctx = Context : : getDefault ( ) ;
Device device = ctx . device ( 0 ) ;
2017-11-27 17:43:29 +08:00
if ( ctx . ptr ( ) = = NULL | | device . ptr ( ) = = NULL )
return ;
2017-12-03 01:48:30 +08:00
buildflags = joinBuildOptions ( buildflags , src_ - > buildOptions_ ) ;
if ( src . getImpl ( ) - > kind_ = = ProgramSource : : Impl : : PROGRAM_SOURCE_CODE )
{
if ( device . isAMD ( ) )
buildflags = joinBuildOptions ( buildflags , " -D AMD_DEVICE " ) ;
else if ( device . isIntel ( ) )
buildflags = joinBuildOptions ( buildflags , " -D INTEL_DEVICE " ) ;
2018-09-21 23:04:18 +08:00
const String param_buildExtraOptions = getBuildExtraOptions ( ) ;
if ( ! param_buildExtraOptions . empty ( ) )
buildflags = joinBuildOptions ( buildflags , param_buildExtraOptions ) ;
2017-12-03 01:48:30 +08:00
}
2021-09-04 09:34:02 +08:00
# if CV_OPENCL_SHOW_BUILD_OPTIONS
CV_LOG_INFO ( NULL , " OpenCL program ' " < < sourceModule_ < < " / " < < sourceName_ < < " ' options: " < < buildflags ) ;
# endif
2017-12-05 18:26:38 +08:00
compile ( ctx , src_ , errmsg ) ;
2021-09-04 09:34:02 +08:00
# if CV_OPENCL_SHOW_BUILD_KERNELS
if ( handle )
{
size_t retsz = 0 ;
char kernels_buffer [ 4096 ] = { 0 } ;
cl_int result = clGetProgramInfo ( handle , CL_PROGRAM_KERNEL_NAMES , sizeof ( kernels_buffer ) , & kernels_buffer [ 0 ] , & retsz ) ;
CV_OCL_DBG_CHECK_RESULT ( result , cv : : format ( " clGetProgramInfo(CL_PROGRAM_KERNEL_NAMES: %s/%s) " , sourceModule_ . c_str ( ) , sourceName_ . c_str ( ) ) . c_str ( ) ) ;
if ( result = = CL_SUCCESS & & retsz < sizeof ( kernels_buffer ) )
{
kernels_buffer [ retsz ] = 0 ;
CV_LOG_INFO ( NULL , " OpenCL program ' " < < sourceModule_ < < " / " < < sourceName_ < < " ' kernels: ' " < < kernels_buffer < < " ' " ) ;
}
else
{
CV_LOG_ERROR ( NULL , " OpenCL program ' " < < sourceModule_ < < " / " < < sourceName_ < < " ' can't retrieve kernel names! " ) ;
}
}
# endif
2017-09-07 21:41:19 +08:00
}
2017-12-05 18:26:38 +08:00
bool compile ( const Context & ctx , const ProgramSource : : Impl * src_ , String & errmsg )
2017-09-07 21:41:19 +08:00
{
2017-11-27 17:43:29 +08:00
CV_Assert ( ctx . getImpl ( ) ) ;
2017-12-03 01:48:30 +08:00
CV_Assert ( src_ ) ;
// We don't cache OpenCL binaries
if ( src_ - > kind_ = = ProgramSource : : Impl : : PROGRAM_BINARIES )
{
2019-02-07 21:26:17 +08:00
CV_LOG_VERBOSE ( NULL , 0 , " Load program binary... " < < src_ - > module_ . c_str ( ) < < " / " < < src_ - > name_ . c_str ( ) ) ;
2017-12-03 01:48:30 +08:00
bool isLoaded = createFromBinary ( ctx , src_ - > sourceAddr_ , src_ - > sourceSize_ , errmsg ) ;
return isLoaded ;
}
2017-12-05 18:26:38 +08:00
return compileWithCache ( ctx , src_ , errmsg ) ;
2017-12-03 01:48:30 +08:00
}
2017-12-05 18:26:38 +08:00
bool compileWithCache ( const Context & ctx , const ProgramSource : : Impl * src_ , String & errmsg )
2017-12-03 01:48:30 +08:00
{
CV_Assert ( ctx . getImpl ( ) ) ;
CV_Assert ( src_ ) ;
CV_Assert ( src_ - > kind_ ! = ProgramSource : : Impl : : PROGRAM_BINARIES ) ;
# if OPENCV_HAVE_FILESYSTEM_SUPPORT
2017-10-12 19:23:45 +08:00
OpenCLBinaryCacheConfigurator & config = OpenCLBinaryCacheConfigurator : : getSingletonInstance ( ) ;
2017-11-24 17:52:29 +08:00
const std : : string base_dir = config . prepareCacheDirectoryForContext (
ctx . getImpl ( ) - > getPrefixString ( ) ,
ctx . getImpl ( ) - > getPrefixBase ( )
) ;
2017-12-03 01:48:30 +08:00
const String & hash_str = src_ - > sourceHash_ ;
cv : : String fname ;
if ( ! base_dir . empty ( ) & & ! src_ - > module_ . empty ( ) & & ! src_ - > name_ . empty ( ) )
{
CV_Assert ( ! hash_str . empty ( ) ) ;
fname = src_ - > module_ + " -- " + src_ - > name_ + " _ " + hash_str + " .bin " ;
fname = utils : : fs : : join ( base_dir , fname ) ;
}
2017-10-12 19:23:45 +08:00
const cv : : Ptr < utils : : fs : : FileLock > fileLock = config . cache_lock_ ; // can be empty
if ( ! fname . empty ( ) & & CV_OPENCL_CACHE_ENABLE )
{
try
{
std : : vector < char > binaryBuf ;
bool res = false ;
{
cv : : utils : : optional_shared_lock_guard < cv : : utils : : fs : : FileLock > lock_fs ( fileLock . get ( ) ) ;
BinaryProgramFile file ( fname , hash_str . c_str ( ) ) ;
res = file . read ( buildflags , binaryBuf ) ;
}
if ( res )
{
CV_Assert ( ! binaryBuf . empty ( ) ) ;
2019-02-07 21:26:17 +08:00
CV_LOG_VERBOSE ( NULL , 0 , " Load program binary from cache: " < < src_ - > module_ . c_str ( ) < < " / " < < src_ - > name_ . c_str ( ) ) ;
2017-10-12 19:23:45 +08:00
bool isLoaded = createFromBinary ( ctx , binaryBuf , errmsg ) ;
if ( isLoaded )
return true ;
}
}
catch ( const cv : : Exception & e )
{
CV_UNUSED ( e ) ;
CV_LOG_VERBOSE ( NULL , 0 , " Can't load OpenCL binary: " + fname < < std : : endl < < e . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_VERBOSE ( NULL , 0 , " Can't load OpenCL binary: " + fname ) ;
}
}
# endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
2017-09-07 21:41:19 +08:00
CV_Assert ( handle = = NULL ) ;
2017-12-03 01:48:30 +08:00
if ( src_ - > kind_ = = ProgramSource : : Impl : : PROGRAM_SOURCE_CODE )
2017-10-12 19:23:45 +08:00
{
2017-12-05 18:26:38 +08:00
if ( ! buildFromSources ( ctx , src_ , errmsg ) )
2017-12-03 01:48:30 +08:00
{
return false ;
}
}
else if ( src_ - > kind_ = = ProgramSource : : Impl : : PROGRAM_SPIR )
{
buildflags = joinBuildOptions ( buildflags , " -x spir " ) ;
if ( ( cv : : String ( " " ) + buildflags ) . find ( " -spir-std= " ) = = cv : : String : : npos )
{
buildflags = joinBuildOptions ( buildflags , " -spir-std=1.2 " ) ;
}
2019-02-07 21:26:17 +08:00
CV_LOG_VERBOSE ( NULL , 0 , " Load program SPIR binary... " < < src_ - > module_ . c_str ( ) < < " / " < < src_ - > name_ . c_str ( ) ) ;
2017-12-03 01:48:30 +08:00
bool isLoaded = createFromBinary ( ctx , src_ - > sourceAddr_ , src_ - > sourceSize_ , errmsg ) ;
if ( ! isLoaded )
return false ;
}
else if ( src_ - > kind_ = = ProgramSource : : Impl : : PROGRAM_SPIRV )
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsNotImplemented , " OpenCL: SPIR-V is not supported " ) ;
2017-12-03 01:48:30 +08:00
}
else
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsInternal , " Internal error " ) ;
2017-10-12 19:23:45 +08:00
}
CV_Assert ( handle ! = NULL ) ;
# if OPENCV_HAVE_FILESYSTEM_SUPPORT
if ( ! fname . empty ( ) & & CV_OPENCL_CACHE_WRITE )
{
try
{
std : : vector < char > binaryBuf ;
getProgramBinary ( binaryBuf ) ;
{
cv : : utils : : optional_lock_guard < cv : : utils : : fs : : FileLock > lock_fs ( fileLock . get ( ) ) ;
BinaryProgramFile file ( fname , hash_str . c_str ( ) ) ;
file . write ( buildflags , binaryBuf ) ;
}
}
catch ( const cv : : Exception & e )
{
CV_LOG_WARNING ( NULL , " Can't save OpenCL binary into cache: " + fname < < std : : endl < < e . what ( ) ) ;
}
catch ( . . . )
{
CV_LOG_WARNING ( NULL , " Can't save OpenCL binary into cache: " + fname ) ;
}
}
# endif // OPENCV_HAVE_FILESYSTEM_SUPPORT
# if CV_OPENCL_VALIDATE_BINARY_PROGRAMS
if ( CV_OPENCL_VALIDATE_BINARY_PROGRAMS_VALUE )
{
std : : vector < char > binaryBuf ;
getProgramBinary ( binaryBuf ) ;
if ( ! binaryBuf . empty ( ) )
{
CV_OCL_DBG_CHECK ( clReleaseProgram ( handle ) ) ;
handle = NULL ;
createFromBinary ( ctx , binaryBuf , errmsg ) ;
}
}
# endif
return handle ! = NULL ;
}
void dumpBuildLog_ ( cl_int result , const cl_device_id * deviceList , String & errmsg )
{
AutoBuffer < char , 4096 > buffer ; buffer [ 0 ] = 0 ;
size_t retsz = 0 ;
cl_int log_retval = clGetProgramBuildInfo ( handle , deviceList [ 0 ] ,
CL_PROGRAM_BUILD_LOG , 0 , 0 , & retsz ) ;
if ( log_retval = = CL_SUCCESS & & retsz > 1 )
{
buffer . resize ( retsz + 16 ) ;
log_retval = clGetProgramBuildInfo ( handle , deviceList [ 0 ] ,
2018-06-11 06:42:00 +08:00
CL_PROGRAM_BUILD_LOG , retsz + 1 , buffer . data ( ) , & retsz ) ;
2017-10-12 19:23:45 +08:00
if ( log_retval = = CL_SUCCESS )
{
if ( retsz < buffer . size ( ) )
buffer [ retsz ] = 0 ;
else
buffer [ buffer . size ( ) - 1 ] = 0 ;
}
else
{
buffer [ 0 ] = 0 ;
}
}
2018-06-11 06:42:00 +08:00
errmsg = String ( buffer . data ( ) ) ;
2017-10-12 19:23:45 +08:00
printf ( " OpenCL program build log: %s/%s \n Status %d: %s \n %s \n %s \n " ,
2017-12-05 18:26:38 +08:00
sourceModule_ . c_str ( ) , sourceName_ . c_str ( ) ,
2017-10-12 19:23:45 +08:00
result , getOpenCLErrorString ( result ) ,
buildflags . c_str ( ) , errmsg . c_str ( ) ) ;
fflush ( stdout ) ;
}
2017-12-05 18:26:38 +08:00
bool buildFromSources ( const Context & ctx , const ProgramSource : : Impl * src_ , String & errmsg )
2017-10-12 19:23:45 +08:00
{
2017-12-03 01:48:30 +08:00
CV_Assert ( src_ ) ;
CV_Assert ( src_ - > kind_ = = ProgramSource : : Impl : : PROGRAM_SOURCE_CODE ) ;
2017-10-12 19:23:45 +08:00
CV_Assert ( handle = = NULL ) ;
2017-12-18 03:23:15 +08:00
CV_INSTRUMENT_REGION_OPENCL_COMPILE ( cv : : format ( " Build OpenCL program: %s/%s %s options: %s " ,
2017-12-05 18:26:38 +08:00
sourceModule_ . c_str ( ) , sourceName_ . c_str ( ) ,
2017-12-18 03:23:15 +08:00
src_ - > sourceHash_ . c_str ( ) , buildflags . c_str ( ) ) . c_str ( ) ) ;
2017-10-12 19:23:45 +08:00
2017-12-05 18:26:38 +08:00
CV_LOG_VERBOSE ( NULL , 0 , " Compile... " < < sourceModule_ . c_str ( ) < < " / " < < sourceName_ . c_str ( ) ) ;
2017-12-03 01:48:30 +08:00
const char * srcptr = src_ - > sourceAddr_ ? ( ( const char * ) src_ - > sourceAddr_ ) : src_ - > codeStr_ . c_str ( ) ;
size_t srclen = src_ - > sourceAddr_ ? src_ - > sourceSize_ : src_ - > codeStr_ . size ( ) ;
CV_Assert ( srcptr ! = NULL ) ;
CV_Assert ( srclen > 0 ) ;
2017-10-12 19:23:45 +08:00
2013-10-22 18:04:49 +08:00
cl_int retval = 0 ;
handle = clCreateProgramWithSource ( ( cl_context ) ctx . ptr ( ) , 1 , & srcptr , & srclen , & retval ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , " clCreateProgramWithSource " ) ;
2017-11-03 17:23:18 +08:00
CV_Assert ( handle | | retval ! = CL_SUCCESS ) ;
2017-09-07 21:41:19 +08:00
if ( handle & & retval = = CL_SUCCESS )
2013-10-22 18:04:49 +08:00
{
2017-10-12 19:23:45 +08:00
size_t n = ctx . ndevices ( ) ;
AutoBuffer < cl_device_id , 4 > deviceListBuf ( n + 1 ) ;
2018-06-11 06:42:00 +08:00
cl_device_id * deviceList = deviceListBuf . data ( ) ;
2017-10-12 19:23:45 +08:00
for ( size_t i = 0 ; i < n ; i + + )
{
deviceList [ i ] = ( cl_device_id ) ( ctx . device ( i ) . ptr ( ) ) ;
}
retval = clBuildProgram ( handle , ( cl_uint ) n , deviceList , buildflags . c_str ( ) , 0 , 0 ) ;
2017-12-03 01:48:30 +08:00
CV_OCL_TRACE_CHECK_RESULT ( /*don't throw: retval*/ CL_SUCCESS , cv : : format ( " clBuildProgram(source: %s) " , buildflags . c_str ( ) ) . c_str ( ) ) ;
2014-02-27 16:51:40 +08:00
# if !CV_OPENCL_ALWAYS_SHOW_BUILD_LOG
2017-09-07 21:41:19 +08:00
if ( retval ! = CL_SUCCESS )
2014-02-27 16:51:40 +08:00
# endif
2013-10-22 18:04:49 +08:00
{
2017-10-12 19:23:45 +08:00
dumpBuildLog_ ( retval , deviceList , errmsg ) ;
2017-09-07 21:41:19 +08:00
// don't remove "retval != CL_SUCCESS" condition here:
// it would break CV_OPENCL_ALWAYS_SHOW_BUILD_LOG mode
2014-02-27 16:51:40 +08:00
if ( retval ! = CL_SUCCESS & & handle )
2013-12-23 19:49:45 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clReleaseProgram ( handle ) ) ;
2013-12-23 19:49:45 +08:00
handle = NULL ;
}
2021-12-28 20:43:42 +08:00
if ( retval ! = CL_SUCCESS & &
sourceName_ ! = " dummy " // used for testing of compilation flags
)
{
onOpenCLKernelBuildError ( ) ;
}
2013-10-22 18:04:49 +08:00
}
2017-12-03 01:48:30 +08:00
# if CV_OPENCL_VALIDATE_BINARY_PROGRAMS
if ( handle & & CV_OPENCL_VALIDATE_BINARY_PROGRAMS_VALUE )
{
CV_LOG_INFO ( NULL , " OpenCL: query kernel names (build from sources)... " ) ;
size_t retsz = 0 ;
char kernels_buffer [ 4096 ] = { 0 } ;
cl_int result = clGetProgramInfo ( handle , CL_PROGRAM_KERNEL_NAMES , sizeof ( kernels_buffer ) , & kernels_buffer [ 0 ] , & retsz ) ;
if ( retsz < sizeof ( kernels_buffer ) )
kernels_buffer [ retsz ] = 0 ;
else
kernels_buffer [ 0 ] = 0 ;
CV_LOG_INFO ( NULL , result < < " : Kernels=' " < < kernels_buffer < < " ' " ) ;
}
# endif
2013-10-22 18:04:49 +08:00
}
2017-09-07 21:41:19 +08:00
return handle ! = NULL ;
2013-10-22 18:04:49 +08:00
}
2017-10-12 19:23:45 +08:00
void getProgramBinary ( std : : vector < char > & buf )
{
CV_Assert ( handle ) ;
size_t sz = 0 ;
CV_OCL_CHECK ( clGetProgramInfo ( handle , CL_PROGRAM_BINARY_SIZES , sizeof ( sz ) , & sz , NULL ) ) ;
buf . resize ( sz ) ;
uchar * ptr = ( uchar * ) & buf [ 0 ] ;
CV_OCL_CHECK ( clGetProgramInfo ( handle , CL_PROGRAM_BINARIES , sizeof ( ptr ) , & ptr , NULL ) ) ;
}
bool createFromBinary ( const Context & ctx , const std : : vector < char > & buf , String & errmsg )
2017-12-03 01:48:30 +08:00
{
return createFromBinary ( ctx , ( const unsigned char * ) & buf [ 0 ] , buf . size ( ) , errmsg ) ;
}
bool createFromBinary ( const Context & ctx , const unsigned char * binaryAddr , const size_t binarySize , String & errmsg )
2017-10-12 19:23:45 +08:00
{
CV_Assert ( handle = = NULL ) ;
CV_INSTRUMENT_REGION_OPENCL_COMPILE ( " Load OpenCL program " ) ;
2019-02-07 21:26:17 +08:00
CV_LOG_VERBOSE ( NULL , 0 , " Load from binary... ( " < < binarySize < < " bytes) " ) ;
2017-10-12 19:23:45 +08:00
CV_Assert ( binarySize > 0 ) ;
size_t ndevices = ( int ) ctx . ndevices ( ) ;
AutoBuffer < cl_device_id > devices_ ( ndevices ) ;
AutoBuffer < const uchar * > binaryPtrs_ ( ndevices ) ;
AutoBuffer < size_t > binarySizes_ ( ndevices ) ;
2018-06-11 06:42:00 +08:00
cl_device_id * devices = devices_ . data ( ) ;
const uchar * * binaryPtrs = binaryPtrs_ . data ( ) ;
size_t * binarySizes = binarySizes_ . data ( ) ;
2017-10-12 19:23:45 +08:00
for ( size_t i = 0 ; i < ndevices ; i + + )
{
devices [ i ] = ( cl_device_id ) ctx . device ( i ) . ptr ( ) ;
2017-12-03 01:48:30 +08:00
binaryPtrs [ i ] = binaryAddr ;
2017-10-12 19:23:45 +08:00
binarySizes [ i ] = binarySize ;
}
cl_int result = 0 ;
2018-06-11 06:42:00 +08:00
handle = clCreateProgramWithBinary ( ( cl_context ) ctx . ptr ( ) , ( cl_uint ) ndevices , devices_ . data ( ) ,
2017-10-12 19:23:45 +08:00
binarySizes , binaryPtrs , NULL , & result ) ;
if ( result ! = CL_SUCCESS )
{
CV_LOG_ERROR ( NULL , CV_OCL_API_ERROR_MSG ( result , " clCreateProgramWithBinary " ) ) ;
if ( handle )
{
CV_OCL_DBG_CHECK ( clReleaseProgram ( handle ) ) ;
handle = NULL ;
}
}
if ( ! handle )
{
2018-01-29 20:49:50 +08:00
return false ;
2017-10-12 19:23:45 +08:00
}
2018-01-29 20:49:50 +08:00
// call clBuildProgram()
2017-10-12 19:23:45 +08:00
{
2018-06-11 06:42:00 +08:00
result = clBuildProgram ( handle , ( cl_uint ) ndevices , devices_ . data ( ) , buildflags . c_str ( ) , 0 , 0 ) ;
2017-12-05 18:26:38 +08:00
CV_OCL_DBG_CHECK_RESULT ( result , cv : : format ( " clBuildProgram(binary: %s/%s) " , sourceModule_ . c_str ( ) , sourceName_ . c_str ( ) ) . c_str ( ) ) ;
2017-10-12 19:23:45 +08:00
if ( result ! = CL_SUCCESS )
{
dumpBuildLog_ ( result , devices , errmsg ) ;
if ( handle )
{
CV_OCL_DBG_CHECK ( clReleaseProgram ( handle ) ) ;
handle = NULL ;
}
return false ;
}
}
2018-01-29 20:49:50 +08:00
// check build status
2017-10-12 19:23:45 +08:00
{
2018-01-29 20:49:50 +08:00
cl_build_status build_status = CL_BUILD_NONE ;
size_t retsz = 0 ;
2017-10-12 19:23:45 +08:00
CV_OCL_DBG_CHECK ( result = clGetProgramBuildInfo ( handle , devices [ 0 ] , CL_PROGRAM_BUILD_STATUS ,
sizeof ( build_status ) , & build_status , & retsz ) ) ;
if ( result = = CL_SUCCESS )
{
if ( build_status = = CL_BUILD_SUCCESS )
{
return true ;
}
else
{
CV_LOG_WARNING ( NULL , " clGetProgramBuildInfo() returns " < < build_status ) ;
return false ;
}
}
else
{
CV_LOG_ERROR ( NULL , CV_OCL_API_ERROR_MSG ( result , " clGetProgramBuildInfo() " ) ) ;
if ( handle )
{
CV_OCL_DBG_CHECK ( clReleaseProgram ( handle ) ) ;
handle = NULL ;
}
}
}
# if CV_OPENCL_VALIDATE_BINARY_PROGRAMS
if ( handle & & CV_OPENCL_VALIDATE_BINARY_PROGRAMS_VALUE )
{
CV_LOG_INFO ( NULL , " OpenCL: query kernel names (binary)... " ) ;
2018-01-29 20:49:50 +08:00
size_t retsz = 0 ;
2017-10-12 19:23:45 +08:00
char kernels_buffer [ 4096 ] = { 0 } ;
result = clGetProgramInfo ( handle , CL_PROGRAM_KERNEL_NAMES , sizeof ( kernels_buffer ) , & kernels_buffer [ 0 ] , & retsz ) ;
if ( retsz < sizeof ( kernels_buffer ) )
kernels_buffer [ retsz ] = 0 ;
else
kernels_buffer [ 0 ] = 0 ;
CV_LOG_INFO ( NULL , result < < " : Kernels=' " < < kernels_buffer < < " ' " ) ;
}
# endif
return handle ! = NULL ;
}
2013-10-22 18:04:49 +08:00
~ Impl ( )
{
if ( handle )
2014-02-01 19:07:03 +08:00
{
2014-03-13 19:45:23 +08:00
# ifdef _WIN32
if ( ! cv : : __termination )
# endif
{
clReleaseProgram ( handle ) ;
}
2014-02-01 19:07:03 +08:00
handle = NULL ;
}
2013-10-22 18:04:49 +08:00
}
cl_program handle ;
2017-12-05 18:26:38 +08:00
String buildflags ;
String sourceModule_ ;
String sourceName_ ;
2013-10-22 18:04:49 +08:00
} ;
2021-02-20 21:16:47 +08:00
Program : : Program ( ) CV_NOEXCEPT
{
p = 0 ;
}
2013-10-22 18:04:49 +08:00
2014-02-01 00:23:01 +08:00
Program : : Program ( const ProgramSource & src ,
2013-10-22 18:04:49 +08:00
const String & buildflags , String & errmsg )
{
p = 0 ;
create ( src , buildflags , errmsg ) ;
}
Program : : Program ( const Program & prog )
{
p = prog . p ;
if ( p )
p - > addref ( ) ;
}
Program & Program : : operator = ( const Program & prog )
{
Impl * newp = ( Impl * ) prog . p ;
if ( newp )
newp - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = newp ;
return * this ;
}
2021-02-21 01:56:04 +08:00
Program : : Program ( Program & & prog ) CV_NOEXCEPT
{
p = prog . p ;
prog . p = nullptr ;
}
Program & Program : : operator = ( Program & & prog ) CV_NOEXCEPT
{
if ( this ! = & prog ) {
if ( p )
p - > release ( ) ;
p = prog . p ;
prog . p = nullptr ;
}
return * this ;
}
2013-10-22 18:04:49 +08:00
Program : : ~ Program ( )
{
if ( p )
p - > release ( ) ;
}
2014-02-01 00:23:01 +08:00
bool Program : : create ( const ProgramSource & src ,
2013-10-22 18:04:49 +08:00
const String & buildflags , String & errmsg )
{
if ( p )
2017-12-03 01:48:30 +08:00
{
2013-10-22 18:04:49 +08:00
p - > release ( ) ;
2017-12-03 01:48:30 +08:00
p = NULL ;
}
2013-10-22 18:04:49 +08:00
p = new Impl ( src , buildflags , errmsg ) ;
if ( ! p - > handle )
{
p - > release ( ) ;
p = 0 ;
}
return p ! = 0 ;
}
2017-12-05 18:26:38 +08:00
void * Program : : ptr ( ) const
2013-10-22 18:04:49 +08:00
{
2017-12-05 18:26:38 +08:00
return p ? p - > handle : 0 ;
2013-10-22 18:04:49 +08:00
}
2017-12-05 18:26:38 +08:00
# ifndef OPENCV_REMOVE_DEPRECATED_API
const ProgramSource & Program : : source ( ) const
2013-10-22 18:04:49 +08:00
{
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsNotImplemented , " Removed API " ) ;
2013-10-22 18:04:49 +08:00
}
bool Program : : read ( const String & bin , const String & buildflags )
{
2017-12-05 18:26:38 +08:00
CV_UNUSED ( bin ) ; CV_UNUSED ( buildflags ) ;
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsNotImplemented , " Removed API " ) ;
2013-10-22 18:04:49 +08:00
}
bool Program : : write ( String & bin ) const
{
2017-12-05 18:26:38 +08:00
CV_UNUSED ( bin ) ;
2018-04-24 00:02:39 +08:00
CV_Error ( Error : : StsNotImplemented , " Removed API " ) ;
2013-10-22 18:04:49 +08:00
}
String Program : : getPrefix ( ) const
{
if ( ! p )
return String ( ) ;
2017-12-05 18:26:38 +08:00
Context : : Impl * ctx_ = Context : : getDefault ( ) . getImpl ( ) ;
CV_Assert ( ctx_ ) ;
return cv : : format ( " opencl=%s \n buildflags=%s " , ctx_ - > getPrefixString ( ) . c_str ( ) , p - > buildflags . c_str ( ) ) ;
2013-10-22 18:04:49 +08:00
}
String Program : : getPrefix ( const String & buildflags )
{
2017-12-05 18:26:38 +08:00
Context : : Impl * ctx_ = Context : : getDefault ( ) . getImpl ( ) ;
CV_Assert ( ctx_ ) ;
return cv : : format ( " opencl=%s \n buildflags=%s " , ctx_ - > getPrefixString ( ) . c_str ( ) , buildflags . c_str ( ) ) ;
2013-10-22 18:04:49 +08:00
}
2020-08-31 17:30:06 +08:00
# endif // OPENCV_REMOVE_DEPRECATED_API
2013-10-22 18:04:49 +08:00
2017-12-03 01:48:30 +08:00
void Program : : getBinary ( std : : vector < char > & binary ) const
{
2017-12-05 18:26:38 +08:00
CV_Assert ( p & & " Empty program " ) ;
2017-12-03 01:48:30 +08:00
p - > getProgramBinary ( binary ) ;
}
Program Context : : Impl : : getProg ( const ProgramSource & src ,
const String & buildflags , String & errmsg )
{
size_t limit = getProgramCountLimit ( ) ;
const ProgramSource : : Impl * src_ = src . getImpl ( ) ;
CV_Assert ( src_ ) ;
2017-12-05 18:26:38 +08:00
String key = cv : : format ( " module=%s name=%s codehash=%s \n opencl=%s \n buildflags=%s " ,
src_ - > module_ . c_str ( ) , src_ - > name_ . c_str ( ) , src_ - > sourceHash_ . c_str ( ) ,
getPrefixString ( ) . c_str ( ) ,
buildflags . c_str ( ) ) ;
2017-12-03 01:48:30 +08:00
{
cv : : AutoLock lock ( program_cache_mutex ) ;
phash_t : : iterator it = phash . find ( key ) ;
if ( it ! = phash . end ( ) )
{
// TODO LRU cache
CacheList : : iterator i = std : : find ( cacheList . begin ( ) , cacheList . end ( ) , key ) ;
if ( i ! = cacheList . end ( ) & & i ! = cacheList . begin ( ) )
{
cacheList . erase ( i ) ;
cacheList . push_front ( key ) ;
}
return it - > second ;
}
{ // cleanup program cache
size_t sz = phash . size ( ) ;
if ( limit > 0 & & sz > = limit )
{
static bool warningFlag = false ;
if ( ! warningFlag )
{
printf ( " \n WARNING: OpenCV-OpenCL: \n "
" In-memory cache for OpenCL programs is full, older programs will be unloaded. \n "
" You can change cache size via OPENCV_OPENCL_PROGRAM_CACHE environment variable \n \n " ) ;
warningFlag = true ;
}
while ( ! cacheList . empty ( ) )
{
size_t c = phash . erase ( cacheList . back ( ) ) ;
cacheList . pop_back ( ) ;
if ( c ! = 0 )
break ;
}
}
}
}
Program prog ( src , buildflags , errmsg ) ;
// Cache result of build failures too (to prevent unnecessary compiler invocations)
{
cv : : AutoLock lock ( program_cache_mutex ) ;
phash . insert ( std : : pair < std : : string , Program > ( key , prog ) ) ;
cacheList . push_front ( key ) ;
}
return prog ;
}
2013-10-22 18:04:49 +08:00
2016-12-19 05:38:33 +08:00
2014-02-01 19:07:03 +08:00
//////////////////////////////////////////// OpenCLAllocator //////////////////////////////////////////////////
2013-10-22 18:04:49 +08:00
2015-01-02 08:33:40 +08:00
template < typename T >
2014-01-16 22:30:39 +08:00
class OpenCLBufferPool
{
protected :
~ OpenCLBufferPool ( ) { }
public :
2015-01-02 08:33:40 +08:00
virtual T allocate ( size_t size ) = 0 ;
virtual void release ( T buffer ) = 0 ;
2014-01-16 22:30:39 +08:00
} ;
2015-01-02 08:33:40 +08:00
template < typename Derived , typename BufferEntry , typename T >
class OpenCLBufferPoolBaseImpl : public BufferPoolController , public OpenCLBufferPool < T >
2014-01-16 22:30:39 +08:00
{
2015-01-02 08:33:40 +08:00
private :
inline Derived & derived ( ) { return * static_cast < Derived * > ( this ) ; }
2014-01-16 22:30:39 +08:00
protected :
Mutex mutex_ ;
size_t currentReservedSize ;
size_t maxReservedSize ;
2015-01-02 08:33:40 +08:00
std : : list < BufferEntry > allocatedEntries_ ; // Allocated and used entries
std : : list < BufferEntry > reservedEntries_ ; // LRU order. Allocated, but not used entries
// synchronized
bool _findAndRemoveEntryFromAllocatedList ( CV_OUT BufferEntry & entry , T buffer )
{
typename std : : list < BufferEntry > : : iterator i = allocatedEntries_ . begin ( ) ;
for ( ; i ! = allocatedEntries_ . end ( ) ; + + i )
{
BufferEntry & e = * i ;
if ( e . clBuffer_ = = buffer )
{
entry = e ;
allocatedEntries_ . erase ( i ) ;
return true ;
}
}
return false ;
}
2014-01-16 22:30:39 +08:00
// synchronized
bool _findAndRemoveEntryFromReservedList ( CV_OUT BufferEntry & entry , const size_t size )
{
if ( reservedEntries_ . empty ( ) )
return false ;
2015-01-02 08:33:40 +08:00
typename std : : list < BufferEntry > : : iterator i = reservedEntries_ . begin ( ) ;
typename std : : list < BufferEntry > : : iterator result_pos = reservedEntries_ . end ( ) ;
BufferEntry result ;
2014-01-16 22:30:39 +08:00
size_t minDiff = ( size_t ) ( - 1 ) ;
for ( ; i ! = reservedEntries_ . end ( ) ; + + i )
{
BufferEntry & e = * i ;
if ( e . capacity_ > = size )
{
size_t diff = e . capacity_ - size ;
2015-12-16 18:49:00 +08:00
if ( diff < std : : max ( ( size_t ) 4096 , size / 8 ) & & ( result_pos = = reservedEntries_ . end ( ) | | diff < minDiff ) )
2014-01-16 22:30:39 +08:00
{
minDiff = diff ;
result_pos = i ;
result = e ;
if ( diff = = 0 )
break ;
}
}
}
if ( result_pos ! = reservedEntries_ . end ( ) )
{
//CV_DbgAssert(result == *result_pos);
reservedEntries_ . erase ( result_pos ) ;
entry = result ;
currentReservedSize - = entry . capacity_ ;
2015-01-02 08:33:40 +08:00
allocatedEntries_ . push_back ( entry ) ;
2014-01-16 22:30:39 +08:00
return true ;
}
return false ;
}
// synchronized
void _checkSizeOfReservedEntries ( )
{
while ( currentReservedSize > maxReservedSize )
{
CV_DbgAssert ( ! reservedEntries_ . empty ( ) ) ;
const BufferEntry & entry = reservedEntries_ . back ( ) ;
CV_DbgAssert ( currentReservedSize > = entry . capacity_ ) ;
currentReservedSize - = entry . capacity_ ;
2015-01-02 08:33:40 +08:00
derived ( ) . _releaseBufferEntry ( entry ) ;
2014-01-16 22:30:39 +08:00
reservedEntries_ . pop_back ( ) ;
}
}
inline size_t _allocationGranularity ( size_t size )
{
// heuristic values
2015-12-16 18:49:00 +08:00
if ( size < 1024 * 1024 )
return 4096 ; // don't work with buffers smaller than 4Kb (hidden allocation overhead issue)
2014-01-16 22:30:39 +08:00
else if ( size < 16 * 1024 * 1024 )
return 64 * 1024 ;
else
return 1024 * 1024 ;
}
public :
2015-01-02 08:33:40 +08:00
OpenCLBufferPoolBaseImpl ( )
: currentReservedSize ( 0 ) ,
maxReservedSize ( 0 )
2014-01-16 22:30:39 +08:00
{
2015-01-02 08:33:40 +08:00
// nothing
2014-01-16 22:30:39 +08:00
}
2015-01-02 08:33:40 +08:00
virtual ~ OpenCLBufferPoolBaseImpl ( )
2014-01-16 22:30:39 +08:00
{
freeAllReservedBuffers ( ) ;
CV_Assert ( reservedEntries_ . empty ( ) ) ;
}
public :
2018-03-15 21:16:50 +08:00
virtual T allocate ( size_t size ) CV_OVERRIDE
2014-01-16 22:30:39 +08:00
{
2015-01-02 08:33:40 +08:00
AutoLock locker ( mutex_ ) ;
BufferEntry entry ;
if ( maxReservedSize > 0 & & _findAndRemoveEntryFromReservedList ( entry , size ) )
2014-01-16 22:30:39 +08:00
{
2015-01-02 08:33:40 +08:00
CV_DbgAssert ( size < = entry . capacity_ ) ;
LOG_BUFFER_POOL ( " Reuse reserved buffer: %p \n " , entry . clBuffer_ ) ;
}
else
{
derived ( ) . _allocateBufferEntry ( entry , size ) ;
2014-01-16 22:30:39 +08:00
}
return entry . clBuffer_ ;
}
2018-03-15 21:16:50 +08:00
virtual void release ( T buffer ) CV_OVERRIDE
2014-01-16 22:30:39 +08:00
{
2015-01-02 08:33:40 +08:00
AutoLock locker ( mutex_ ) ;
BufferEntry entry ;
CV_Assert ( _findAndRemoveEntryFromAllocatedList ( entry , buffer ) ) ;
2014-01-16 22:30:39 +08:00
if ( maxReservedSize = = 0 | | entry . capacity_ > maxReservedSize / 8 )
{
2015-01-02 08:33:40 +08:00
derived ( ) . _releaseBufferEntry ( entry ) ;
2014-01-16 22:30:39 +08:00
}
else
{
reservedEntries_ . push_front ( entry ) ;
currentReservedSize + = entry . capacity_ ;
_checkSizeOfReservedEntries ( ) ;
}
}
2018-03-15 21:16:50 +08:00
virtual size_t getReservedSize ( ) const CV_OVERRIDE { return currentReservedSize ; }
virtual size_t getMaxReservedSize ( ) const CV_OVERRIDE { return maxReservedSize ; }
virtual void setMaxReservedSize ( size_t size ) CV_OVERRIDE
2014-01-16 22:30:39 +08:00
{
AutoLock locker ( mutex_ ) ;
size_t oldMaxReservedSize = maxReservedSize ;
maxReservedSize = size ;
if ( maxReservedSize < oldMaxReservedSize )
{
2015-01-02 08:33:40 +08:00
typename std : : list < BufferEntry > : : iterator i = reservedEntries_ . begin ( ) ;
2014-01-16 22:30:39 +08:00
for ( ; i ! = reservedEntries_ . end ( ) ; )
{
const BufferEntry & entry = * i ;
if ( entry . capacity_ > maxReservedSize / 8 )
{
CV_DbgAssert ( currentReservedSize > = entry . capacity_ ) ;
currentReservedSize - = entry . capacity_ ;
2015-01-02 08:33:40 +08:00
derived ( ) . _releaseBufferEntry ( entry ) ;
2014-01-16 22:30:39 +08:00
i = reservedEntries_ . erase ( i ) ;
continue ;
}
+ + i ;
}
_checkSizeOfReservedEntries ( ) ;
}
}
2018-03-15 21:16:50 +08:00
virtual void freeAllReservedBuffers ( ) CV_OVERRIDE
2014-01-16 22:30:39 +08:00
{
AutoLock locker ( mutex_ ) ;
2015-01-02 08:33:40 +08:00
typename std : : list < BufferEntry > : : const_iterator i = reservedEntries_ . begin ( ) ;
2014-01-16 22:30:39 +08:00
for ( ; i ! = reservedEntries_ . end ( ) ; + + i )
{
const BufferEntry & entry = * i ;
2015-01-02 08:33:40 +08:00
derived ( ) . _releaseBufferEntry ( entry ) ;
2014-01-16 22:30:39 +08:00
}
reservedEntries_ . clear ( ) ;
2015-03-20 18:21:10 +08:00
currentReservedSize = 0 ;
2014-01-16 22:30:39 +08:00
}
} ;
2015-01-02 08:33:40 +08:00
struct CLBufferEntry
{
cl_mem clBuffer_ ;
size_t capacity_ ;
CLBufferEntry ( ) : clBuffer_ ( ( cl_mem ) NULL ) , capacity_ ( 0 ) { }
} ;
2018-03-15 21:16:50 +08:00
class OpenCLBufferPoolImpl CV_FINAL : public OpenCLBufferPoolBaseImpl < OpenCLBufferPoolImpl , CLBufferEntry , cl_mem >
2015-01-02 08:33:40 +08:00
{
public :
typedef struct CLBufferEntry BufferEntry ;
protected :
int createFlags_ ;
public :
OpenCLBufferPoolImpl ( int createFlags = 0 )
: createFlags_ ( createFlags )
{
}
void _allocateBufferEntry ( BufferEntry & entry , size_t size )
{
CV_DbgAssert ( entry . clBuffer_ = = NULL ) ;
entry . capacity_ = alignSize ( size , ( int ) _allocationGranularity ( size ) ) ;
Context & ctx = Context : : getDefault ( ) ;
cl_int retval = CL_SUCCESS ;
2018-04-13 00:28:46 +08:00
entry . clBuffer_ = clCreateBuffer ( ( cl_context ) ctx . ptr ( ) , CL_MEM_READ_WRITE | createFlags_ , entry . capacity_ , 0 , & retval ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clCreateBuffer(capacity=%lld) => %p " , ( long long int ) entry . capacity_ , ( void * ) entry . clBuffer_ ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
CV_Assert ( entry . clBuffer_ ! = NULL ) ;
if ( retval = = CL_SUCCESS )
{
CV_IMPL_ADD ( CV_IMPL_OCL ) ;
}
LOG_BUFFER_POOL ( " OpenCL allocate %lld (0x%llx) bytes: %p \n " ,
( long long ) entry . capacity_ , ( long long ) entry . capacity_ , entry . clBuffer_ ) ;
allocatedEntries_ . push_back ( entry ) ;
}
void _releaseBufferEntry ( const BufferEntry & entry )
{
CV_Assert ( entry . capacity_ ! = 0 ) ;
CV_Assert ( entry . clBuffer_ ! = NULL ) ;
LOG_BUFFER_POOL ( " OpenCL release buffer: %p, %lld (0x%llx) bytes \n " ,
entry . clBuffer_ , ( long long ) entry . capacity_ , ( long long ) entry . capacity_ ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clReleaseMemObject ( entry . clBuffer_ ) ) ;
2015-01-02 08:33:40 +08:00
}
} ;
# ifdef HAVE_OPENCL_SVM
struct CLSVMBufferEntry
{
void * clBuffer_ ;
size_t capacity_ ;
CLSVMBufferEntry ( ) : clBuffer_ ( NULL ) , capacity_ ( 0 ) { }
} ;
2018-03-15 21:16:50 +08:00
class OpenCLSVMBufferPoolImpl CV_FINAL : public OpenCLBufferPoolBaseImpl < OpenCLSVMBufferPoolImpl , CLSVMBufferEntry , void * >
2015-01-02 08:33:40 +08:00
{
public :
typedef struct CLSVMBufferEntry BufferEntry ;
public :
OpenCLSVMBufferPoolImpl ( )
{
}
void _allocateBufferEntry ( BufferEntry & entry , size_t size )
{
CV_DbgAssert ( entry . clBuffer_ = = NULL ) ;
entry . capacity_ = alignSize ( size , ( int ) _allocationGranularity ( size ) ) ;
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMCapabilities svmCaps = svm : : getSVMCapabilitites ( ctx ) ;
bool isFineGrainBuffer = svmCaps . isSupportFineGrainBuffer ( ) ;
cl_svm_mem_flags memFlags = CL_MEM_READ_WRITE |
( isFineGrainBuffer ? CL_MEM_SVM_FINE_GRAIN_BUFFER : 0 ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_OPENCL_SVM_TRACE_P ( " clSVMAlloc: %d \n " , ( int ) entry . capacity_ ) ;
void * buf = svmFns - > fn_clSVMAlloc ( ( cl_context ) ctx . ptr ( ) , memFlags , entry . capacity_ , 0 ) ;
CV_Assert ( buf ) ;
entry . clBuffer_ = buf ;
{
CV_IMPL_ADD ( CV_IMPL_OCL ) ;
}
LOG_BUFFER_POOL ( " OpenCL SVM allocate %lld (0x%llx) bytes: %p \n " ,
( long long ) entry . capacity_ , ( long long ) entry . capacity_ , entry . clBuffer_ ) ;
allocatedEntries_ . push_back ( entry ) ;
}
void _releaseBufferEntry ( const BufferEntry & entry )
{
CV_Assert ( entry . capacity_ ! = 0 ) ;
CV_Assert ( entry . clBuffer_ ! = NULL ) ;
LOG_BUFFER_POOL ( " OpenCL release SVM buffer: %p, %lld (0x%llx) bytes \n " ,
entry . clBuffer_ , ( long long ) entry . capacity_ , ( long long ) entry . capacity_ ) ;
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_OPENCL_SVM_TRACE_P ( " clSVMFree: %p \n " , entry . clBuffer_ ) ;
svmFns - > fn_clSVMFree ( ( cl_context ) ctx . ptr ( ) , entry . clBuffer_ ) ;
}
} ;
# endif
2014-02-06 21:27:29 +08:00
template < bool readAccess , bool writeAccess >
class AlignedDataPtr
{
protected :
const size_t size_ ;
uchar * const originPtr_ ;
const size_t alignment_ ;
uchar * ptr_ ;
uchar * allocatedPtr_ ;
public :
AlignedDataPtr ( uchar * ptr , size_t size , size_t alignment )
: size_ ( size ) , originPtr_ ( ptr ) , alignment_ ( alignment ) , ptr_ ( ptr ) , allocatedPtr_ ( NULL )
{
CV_DbgAssert ( ( alignment & ( alignment - 1 ) ) = = 0 ) ; // check for 2^n
2019-03-13 22:14:03 +08:00
CV_DbgAssert ( ! readAccess | | ptr ) ;
2014-02-06 21:27:29 +08:00
if ( ( ( size_t ) ptr_ & ( alignment - 1 ) ) ! = 0 )
{
allocatedPtr_ = new uchar [ size_ + alignment - 1 ] ;
ptr_ = ( uchar * ) ( ( ( uintptr_t ) allocatedPtr_ + ( alignment - 1 ) ) & ~ ( alignment - 1 ) ) ;
if ( readAccess )
{
memcpy ( ptr_ , originPtr_ , size_ ) ;
}
}
}
uchar * getAlignedPtr ( ) const
{
CV_DbgAssert ( ( ( size_t ) ptr_ & ( alignment_ - 1 ) ) = = 0 ) ;
return ptr_ ;
}
~ AlignedDataPtr ( )
{
if ( allocatedPtr_ )
{
if ( writeAccess )
{
memcpy ( originPtr_ , ptr_ , size_ ) ;
}
delete [ ] allocatedPtr_ ;
allocatedPtr_ = NULL ;
}
ptr_ = NULL ;
}
private :
AlignedDataPtr ( const AlignedDataPtr & ) ; // disabled
AlignedDataPtr & operator = ( const AlignedDataPtr & ) ; // disabled
} ;
2015-08-11 06:33:46 +08:00
template < bool readAccess , bool writeAccess >
class AlignedDataPtr2D
{
protected :
const size_t size_ ;
uchar * const originPtr_ ;
const size_t alignment_ ;
uchar * ptr_ ;
uchar * allocatedPtr_ ;
size_t rows_ ;
size_t cols_ ;
size_t step_ ;
public :
2017-12-05 18:32:28 +08:00
AlignedDataPtr2D ( uchar * ptr , size_t rows , size_t cols , size_t step , size_t alignment , size_t extrabytes = 0 )
2015-08-11 06:33:46 +08:00
: size_ ( rows * step ) , originPtr_ ( ptr ) , alignment_ ( alignment ) , ptr_ ( ptr ) , allocatedPtr_ ( NULL ) , rows_ ( rows ) , cols_ ( cols ) , step_ ( step )
{
CV_DbgAssert ( ( alignment & ( alignment - 1 ) ) = = 0 ) ; // check for 2^n
2019-03-13 22:14:03 +08:00
CV_DbgAssert ( ! readAccess | | ptr ! = NULL ) ;
2017-12-05 18:32:28 +08:00
if ( ptr = = 0 | | ( ( size_t ) ptr_ & ( alignment - 1 ) ) ! = 0 )
2015-08-11 06:33:46 +08:00
{
2017-12-05 18:32:28 +08:00
allocatedPtr_ = new uchar [ size_ + extrabytes + alignment - 1 ] ;
2015-08-11 06:33:46 +08:00
ptr_ = ( uchar * ) ( ( ( uintptr_t ) allocatedPtr_ + ( alignment - 1 ) ) & ~ ( alignment - 1 ) ) ;
if ( readAccess )
{
for ( size_t i = 0 ; i < rows_ ; i + + )
memcpy ( ptr_ + i * step_ , originPtr_ + i * step_ , cols_ ) ;
}
}
}
uchar * getAlignedPtr ( ) const
{
CV_DbgAssert ( ( ( size_t ) ptr_ & ( alignment_ - 1 ) ) = = 0 ) ;
return ptr_ ;
}
~ AlignedDataPtr2D ( )
{
if ( allocatedPtr_ )
{
if ( writeAccess )
{
for ( size_t i = 0 ; i < rows_ ; i + + )
memcpy ( originPtr_ + i * step_ , ptr_ + i * step_ , cols_ ) ;
}
delete [ ] allocatedPtr_ ;
allocatedPtr_ = NULL ;
}
ptr_ = NULL ;
}
private :
AlignedDataPtr2D ( const AlignedDataPtr2D & ) ; // disabled
AlignedDataPtr2D & operator = ( const AlignedDataPtr2D & ) ; // disabled
} ;
2014-02-06 21:27:29 +08:00
# ifndef CV_OPENCL_DATA_PTR_ALIGNMENT
# define CV_OPENCL_DATA_PTR_ALIGNMENT 16
# endif
2014-01-16 22:30:39 +08:00
2020-08-12 02:13:52 +08:00
void Context : : Impl : : __init_buffer_pools ( )
2013-10-22 18:04:49 +08:00
{
2020-08-12 02:13:52 +08:00
bufferPool_ = std : : make_shared < OpenCLBufferPoolImpl > ( 0 ) ;
OpenCLBufferPoolImpl & bufferPool = * bufferPool_ . get ( ) ;
bufferPoolHostPtr_ = std : : make_shared < OpenCLBufferPoolImpl > ( CL_MEM_ALLOC_HOST_PTR ) ;
OpenCLBufferPoolImpl & bufferPoolHostPtr = * bufferPoolHostPtr_ . get ( ) ;
size_t defaultPoolSize = ocl : : Device : : getDefault ( ) . isIntel ( ) ? 1 < < 27 : 0 ;
size_t poolSize = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_BUFFERPOOL_LIMIT " , defaultPoolSize ) ;
bufferPool . setMaxReservedSize ( poolSize ) ;
size_t poolSizeHostPtr = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_HOST_PTR_BUFFERPOOL_LIMIT " , defaultPoolSize ) ;
bufferPoolHostPtr . setMaxReservedSize ( poolSizeHostPtr ) ;
# ifdef HAVE_OPENCL_SVM
bufferPoolSVM_ = std : : make_shared < OpenCLSVMBufferPoolImpl > ( ) ;
OpenCLSVMBufferPoolImpl & bufferPoolSVM = * bufferPoolSVM_ . get ( ) ;
size_t poolSizeSVM = utils : : getConfigurationParameterSizeT ( " OPENCV_OPENCL_SVM_BUFFERPOOL_LIMIT " , defaultPoolSize ) ;
bufferPoolSVM . setMaxReservedSize ( poolSizeSVM ) ;
2015-01-02 08:33:40 +08:00
# endif
2020-08-12 02:13:52 +08:00
CV_LOG_INFO ( NULL , " OpenCL: Initializing buffer pool for context@ " < < contextId < < " with max capacity: poolSize= " < < poolSize < < " poolSizeHostPtr= " < < poolSizeHostPtr ) ;
}
class OpenCLAllocator CV_FINAL : public MatAllocator
{
2018-10-01 21:28:17 +08:00
public :
2014-02-10 20:34:45 +08:00
enum AllocatorFlags
{
2015-01-02 08:33:40 +08:00
ALLOCATOR_FLAGS_BUFFER_POOL_USED = 1 < < 0 ,
2018-10-01 21:28:17 +08:00
ALLOCATOR_FLAGS_BUFFER_POOL_HOST_PTR_USED = 1 < < 1 ,
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
2018-10-01 21:28:17 +08:00
ALLOCATOR_FLAGS_BUFFER_POOL_SVM_USED = 1 < < 2 ,
2015-01-02 08:33:40 +08:00
# endif
2018-10-01 21:28:17 +08:00
ALLOCATOR_FLAGS_EXTERNAL_BUFFER = 1 < < 3 // convertFromBuffer()
2014-02-10 20:34:45 +08:00
} ;
2018-10-01 21:28:17 +08:00
2015-01-02 08:33:40 +08:00
OpenCLAllocator ( )
2020-08-12 02:13:52 +08:00
{
2015-12-01 04:45:48 +08:00
matStdAllocator = Mat : : getDefaultAllocator ( ) ;
2015-01-02 08:33:40 +08:00
}
2017-07-06 22:57:05 +08:00
~ OpenCLAllocator ( )
{
flushCleanupQueue ( ) ;
}
2013-10-22 18:04:49 +08:00
2014-02-10 20:34:45 +08:00
UMatData * defaultAllocate ( int dims , const int * sizes , int type , void * data , size_t * step ,
2018-09-21 23:12:35 +08:00
AccessFlag flags , UMatUsageFlags usageFlags ) const
2013-10-22 18:04:49 +08:00
{
2014-02-10 20:34:45 +08:00
UMatData * u = matStdAllocator - > allocate ( dims , sizes , type , data , step , flags , usageFlags ) ;
2013-10-22 18:04:49 +08:00
return u ;
}
2020-04-02 18:10:57 +08:00
static bool isOpenCLMapForced ( ) // force clEnqueueMapBuffer / clEnqueueUnmapMemObject OpenCL API
{
static bool value = cv : : utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_BUFFER_FORCE_MAPPING " , false ) ;
return value ;
}
static bool isOpenCLCopyingForced ( ) // force clEnqueueReadBuffer[Rect] / clEnqueueWriteBuffer[Rect] OpenCL API
{
static bool value = cv : : utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_BUFFER_FORCE_COPYING " , false ) ;
return value ;
}
2018-09-21 23:12:35 +08:00
void getBestFlags ( const Context & ctx , AccessFlag /*flags*/ , UMatUsageFlags usageFlags , int & createFlags , UMatData : : MemoryFlag & flags0 ) const
2013-10-22 18:04:49 +08:00
{
const Device & dev = ctx . device ( 0 ) ;
2014-02-10 20:34:45 +08:00
createFlags = 0 ;
if ( ( usageFlags & USAGE_ALLOCATE_HOST_MEMORY ) ! = 0 )
createFlags | = CL_MEM_ALLOC_HOST_PTR ;
2013-10-22 18:04:49 +08:00
2020-04-02 18:10:57 +08:00
if ( ! isOpenCLCopyingForced ( ) & &
( isOpenCLMapForced ( ) | |
( dev . hostUnifiedMemory ( )
# ifndef __APPLE__
| | dev . isIntel ( )
# endif
)
)
)
2018-09-21 23:12:35 +08:00
flags0 = static_cast < UMatData : : MemoryFlag > ( 0 ) ;
2013-10-22 18:04:49 +08:00
else
flags0 = UMatData : : COPY_ON_MAP ;
}
2013-12-01 07:12:19 +08:00
UMatData * allocate ( int dims , const int * sizes , int type ,
2018-09-21 23:12:35 +08:00
void * data , size_t * step , AccessFlag flags , UMatUsageFlags usageFlags ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! useOpenCL ( ) )
2014-02-10 20:34:45 +08:00
return defaultAllocate ( dims , sizes , type , data , step , flags , usageFlags ) ;
2020-08-12 02:13:52 +08:00
flushCleanupQueue ( ) ;
2013-12-01 07:12:19 +08:00
CV_Assert ( data = = 0 ) ;
2013-10-22 18:04:49 +08:00
size_t total = CV_ELEM_SIZE ( type ) ;
for ( int i = dims - 1 ; i > = 0 ; i - - )
{
if ( step )
step [ i ] = total ;
total * = sizes [ i ] ;
}
2014-02-01 00:23:01 +08:00
Context & ctx = Context : : getDefault ( ) ;
2020-08-12 02:13:52 +08:00
if ( ! ctx . getImpl ( ) )
return defaultAllocate ( dims , sizes , type , data , step , flags , usageFlags ) ;
Context : : Impl & ctxImpl = * ctx . getImpl ( ) ;
2015-01-02 08:33:40 +08:00
2018-09-21 23:12:35 +08:00
int createFlags = 0 ;
UMatData : : MemoryFlag flags0 = static_cast < UMatData : : MemoryFlag > ( 0 ) ;
2014-02-10 20:34:45 +08:00
getBestFlags ( ctx , flags , usageFlags , createFlags , flags0 ) ;
2013-10-22 18:04:49 +08:00
2014-02-10 20:34:45 +08:00
void * handle = NULL ;
int allocatorFlags = 0 ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
const svm : : SVMCapabilities svmCaps = svm : : getSVMCapabilitites ( ctx ) ;
if ( ctx . useSVM ( ) & & svm : : useSVM ( usageFlags ) & & ! svmCaps . isNoSVMSupport ( ) )
{
allocatorFlags = ALLOCATOR_FLAGS_BUFFER_POOL_SVM_USED ;
2020-08-12 02:13:52 +08:00
handle = ctxImpl . getBufferPoolSVM ( ) . allocate ( total ) ;
2015-01-02 08:33:40 +08:00
// this property is constant, so single buffer pool can be used here
bool isFineGrainBuffer = svmCaps . isSupportFineGrainBuffer ( ) ;
allocatorFlags | = isFineGrainBuffer ? svm : : OPENCL_SVM_FINE_GRAIN_BUFFER : svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER ;
}
else
# endif
2014-02-10 20:34:45 +08:00
if ( createFlags = = 0 )
{
allocatorFlags = ALLOCATOR_FLAGS_BUFFER_POOL_USED ;
2020-08-12 02:13:52 +08:00
handle = ctxImpl . getBufferPool ( ) . allocate ( total ) ;
2015-01-02 08:33:40 +08:00
}
else if ( createFlags = = CL_MEM_ALLOC_HOST_PTR )
{
allocatorFlags = ALLOCATOR_FLAGS_BUFFER_POOL_HOST_PTR_USED ;
2020-08-12 02:13:52 +08:00
handle = ctxImpl . getBufferPoolHostPtr ( ) . allocate ( total ) ;
2014-02-10 20:34:45 +08:00
}
else
{
2015-01-02 08:33:40 +08:00
CV_Assert ( handle ! = NULL ) ; // Unsupported, throw
2014-02-10 20:34:45 +08:00
}
2015-01-02 08:33:40 +08:00
if ( ! handle )
return defaultAllocate ( dims , sizes , type , data , step , flags , usageFlags ) ;
2013-10-22 18:04:49 +08:00
UMatData * u = new UMatData ( this ) ;
u - > data = 0 ;
u - > size = total ;
u - > handle = handle ;
u - > flags = flags0 ;
2014-02-10 20:34:45 +08:00
u - > allocatorFlags_ = allocatorFlags ;
2020-08-12 02:13:52 +08:00
u - > allocatorContext = std : : static_pointer_cast < void > ( std : : make_shared < ocl : : Context > ( ctx ) ) ;
2014-02-10 20:34:45 +08:00
CV_DbgAssert ( ! u - > tempUMat ( ) ) ; // for bufferPool.release() consistency in deallocate()
2015-07-25 00:10:31 +08:00
u - > markHostCopyObsolete ( true ) ;
2018-10-01 21:28:17 +08:00
opencl_allocator_stats . onAllocate ( u - > size ) ;
2013-10-22 18:04:49 +08:00
return u ;
}
2018-09-21 23:12:35 +08:00
bool allocate ( UMatData * u , AccessFlag accessFlags , UMatUsageFlags usageFlags ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! u )
return false ;
2017-07-06 22:57:05 +08:00
flushCleanupQueue ( ) ;
2013-10-22 18:04:49 +08:00
UMatDataAutoLock lock ( u ) ;
if ( u - > handle = = 0 )
{
CV_Assert ( u - > origdata ! = 0 ) ;
2014-02-01 00:23:01 +08:00
Context & ctx = Context : : getDefault ( ) ;
2018-09-21 23:12:35 +08:00
int createFlags = 0 ;
UMatData : : MemoryFlag flags0 = static_cast < UMatData : : MemoryFlag > ( 0 ) ;
2014-02-10 20:34:45 +08:00
getBestFlags ( ctx , accessFlags , usageFlags , createFlags , flags0 ) ;
2013-10-22 18:04:49 +08:00
2020-02-21 21:13:41 +08:00
bool copyOnMap = ( flags0 & UMatData : : COPY_ON_MAP ) ! = 0 ;
2013-10-22 18:04:49 +08:00
cl_context ctx_handle = ( cl_context ) ctx . ptr ( ) ;
2015-01-02 08:33:40 +08:00
int allocatorFlags = 0 ;
2018-09-21 23:12:35 +08:00
UMatData : : MemoryFlag tempUMatFlags = static_cast < UMatData : : MemoryFlag > ( 0 ) ;
2015-01-02 08:33:40 +08:00
void * handle = NULL ;
cl_int retval = CL_SUCCESS ;
# ifdef HAVE_OPENCL_SVM
svm : : SVMCapabilities svmCaps = svm : : getSVMCapabilitites ( ctx ) ;
bool useSVM = ctx . useSVM ( ) & & svm : : useSVM ( usageFlags ) ;
if ( useSVM & & svmCaps . isSupportFineGrainSystem ( ) )
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
allocatorFlags = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM ;
tempUMatFlags = UMatData : : TEMP_UMAT ;
handle = u - > origdata ;
CV_OPENCL_SVM_TRACE_P ( " Use fine grain system: %d (%p) \n " , ( int ) u - > size , handle ) ;
}
else if ( useSVM & & ( svmCaps . isSupportFineGrainBuffer ( ) | | svmCaps . isSupportCoarseGrainBuffer ( ) ) )
{
if ( ! ( accessFlags & ACCESS_FAST ) ) // memcpy used
{
bool isFineGrainBuffer = svmCaps . isSupportFineGrainBuffer ( ) ;
2014-07-09 16:04:22 +08:00
2015-01-02 08:33:40 +08:00
cl_svm_mem_flags memFlags = createFlags |
( isFineGrainBuffer ? CL_MEM_SVM_FINE_GRAIN_BUFFER : 0 ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_OPENCL_SVM_TRACE_P ( " clSVMAlloc + copy: %d \n " , ( int ) u - > size ) ;
handle = svmFns - > fn_clSVMAlloc ( ( cl_context ) ctx . ptr ( ) , memFlags , u - > size , 0 ) ;
CV_Assert ( handle ) ;
cl_command_queue q = NULL ;
if ( ! isFineGrainBuffer )
{
q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMap: %p (%d) \n " , handle , ( int ) u - > size ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMap ( q , CL_TRUE , CL_MAP_WRITE ,
handle , u - > size ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMap() " ) ;
2015-01-02 08:33:40 +08:00
}
memcpy ( handle , u - > origdata , u - > size ) ;
if ( ! isFineGrainBuffer )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , handle , 0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
}
tempUMatFlags = UMatData : : TEMP_UMAT | UMatData : : TEMP_COPIED_UMAT ;
allocatorFlags | = isFineGrainBuffer ? svm : : OPENCL_SVM_FINE_GRAIN_BUFFER
: svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER ;
}
}
else
# endif
{
2020-02-21 21:13:41 +08:00
if ( copyOnMap )
accessFlags & = ~ ACCESS_FAST ;
2015-01-02 08:33:40 +08:00
tempUMatFlags = UMatData : : TEMP_UMAT ;
2020-02-21 21:13:41 +08:00
if (
# ifdef __APPLE__
! copyOnMap & &
# endif
CV_OPENCL_ENABLE_MEM_USE_HOST_PTR
2019-09-24 18:03:29 +08:00
// There are OpenCL runtime issues for less aligned data
& & ( CV_OPENCL_ALIGNMENT_MEM_USE_HOST_PTR ! = 0
& & u - > origdata = = cv : : alignPtr ( u - > origdata , ( int ) CV_OPENCL_ALIGNMENT_MEM_USE_HOST_PTR ) )
// Avoid sharing of host memory between OpenCL buffers
& & ! ( u - > originalUMatData & & u - > originalUMatData - > handle )
2018-04-20 19:51:55 +08:00
)
2015-09-08 09:06:04 +08:00
{
2021-05-05 04:29:15 +08:00
// Change the host-side origdata[size] to "pinned memory" that enables fast
// DMA-transfers over PCIe to the device. Often used with clEnqueueMapBuffer/clEnqueueUnmapMemObject
handle = clCreateBuffer ( ctx_handle , CL_MEM_USE_HOST_PTR | ( createFlags & ~ CL_MEM_ALLOC_HOST_PTR ) ,
2015-09-08 09:06:04 +08:00
u - > size , u - > origdata , & retval ) ;
2021-05-05 04:29:15 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clCreateBuffer(CL_MEM_USE_HOST_PTR|(createFlags & ~CL_MEM_ALLOC_HOST_PTR), sz=%lld, origdata=%p) => %p " ,
2018-04-13 00:28:46 +08:00
( long long int ) u - > size , u - > origdata , ( void * ) handle ) . c_str ( ) ) ;
2015-09-08 09:06:04 +08:00
}
2015-01-02 08:33:40 +08:00
if ( ( ! handle | | retval < 0 ) & & ! ( accessFlags & ACCESS_FAST ) )
{
2021-05-05 04:29:15 +08:00
// Allocate device-side memory and immediately copy data from the host-side pointer origdata[size].
// If createFlags=CL_MEM_ALLOC_HOST_PTR (aka cv::USAGE_ALLOCATE_HOST_MEMORY), then
// additionally allocate a host-side "pinned" duplicate of the origdata that is
// managed by OpenCL. This is potentially faster in unaligned/unmanaged scenarios.
2015-01-02 08:33:40 +08:00
handle = clCreateBuffer ( ctx_handle , CL_MEM_COPY_HOST_PTR | CL_MEM_READ_WRITE | createFlags ,
u - > size , u - > origdata , & retval ) ;
2018-04-13 00:28:46 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clCreateBuffer(CL_MEM_COPY_HOST_PTR|CL_MEM_READ_WRITE|createFlags, sz=%lld, origdata=%p) => %p " ,
( long long int ) u - > size , u - > origdata , ( void * ) handle ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
tempUMatFlags | = UMatData : : TEMP_COPIED_UMAT ;
}
2013-10-22 18:04:49 +08:00
}
2018-04-13 00:28:46 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clCreateBuffer() => %p " , ( void * ) handle ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
if ( ! handle | | retval ! = CL_SUCCESS )
2013-10-22 18:04:49 +08:00
return false ;
2015-01-02 08:33:40 +08:00
u - > handle = handle ;
2013-10-22 18:04:49 +08:00
u - > prevAllocator = u - > currAllocator ;
u - > currAllocator = this ;
2020-02-21 21:13:41 +08:00
u - > flags | = tempUMatFlags | flags0 ;
2015-01-02 08:33:40 +08:00
u - > allocatorFlags_ = allocatorFlags ;
2013-10-22 18:04:49 +08:00
}
2018-09-21 23:12:35 +08:00
if ( ! ! ( accessFlags & ACCESS_WRITE ) )
2013-10-22 18:04:49 +08:00
u - > markHostCopyObsolete ( true ) ;
2018-10-01 21:28:17 +08:00
opencl_allocator_stats . onAllocate ( u - > size ) ;
2013-10-22 18:04:49 +08:00
return true ;
}
2013-12-02 22:27:08 +08:00
/*void sync(UMatData* u) const
2013-12-01 07:12:19 +08:00
{
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
2013-12-02 02:14:15 +08:00
UMatDataAutoLock lock ( u ) ;
2013-12-02 04:12:20 +08:00
if ( u - > hostCopyObsolete ( ) & & u - > handle & & u - > refcount > 0 & & u - > origdata )
2013-12-01 07:12:19 +08:00
{
2013-12-02 04:12:20 +08:00
if ( u - > tempCopiedUMat ( ) )
{
clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE , 0 ,
u - > size , u - > origdata , 0 , 0 , 0 ) ;
}
else
{
cl_int retval = 0 ;
void * data = clEnqueueMapBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
( CL_MAP_READ | CL_MAP_WRITE ) ,
0 , u - > size , 0 , 0 , 0 , & retval ) ;
clEnqueueUnmapMemObject ( q , ( cl_mem ) u - > handle , data , 0 , 0 , 0 ) ;
clFinish ( q ) ;
}
2013-12-01 07:12:19 +08:00
u - > markHostCopyObsolete ( false ) ;
}
else if ( u - > copyOnMap ( ) & & u - > deviceCopyObsolete ( ) & & u - > data )
{
clEnqueueWriteBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE , 0 ,
u - > size , u - > data , 0 , 0 , 0 ) ;
}
2013-12-02 22:27:08 +08:00
} */
2013-12-01 07:12:19 +08:00
2018-03-15 21:16:50 +08:00
void deallocate ( UMatData * u ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! u )
return ;
2015-09-08 09:06:04 +08:00
CV_Assert ( u - > urefcount = = 0 ) ;
CV_Assert ( u - > refcount = = 0 & & " UMat deallocation error: some derived Mat is still alive " ) ;
2013-12-16 20:46:36 +08:00
2015-09-08 09:06:04 +08:00
CV_Assert ( u - > handle ! = 0 ) ;
2015-09-03 22:18:59 +08:00
CV_Assert ( u - > mapcount = = 0 ) ;
2017-07-06 22:57:05 +08:00
2018-09-21 23:12:35 +08:00
if ( ! ! ( u - > flags & UMatData : : ASYNC_CLEANUP ) )
2017-07-06 22:57:05 +08:00
addToCleanupQueue ( u ) ;
else
deallocate_ ( u ) ;
}
void deallocate_ ( UMatData * u ) const
{
2018-10-01 21:28:17 +08:00
CV_Assert ( u ) ;
CV_Assert ( u - > handle ) ;
if ( ( u - > allocatorFlags_ & ALLOCATOR_FLAGS_EXTERNAL_BUFFER ) = = 0 )
{
opencl_allocator_stats . onFree ( u - > size ) ;
}
2018-10-11 02:04:39 +08:00
# ifdef _WIN32
if ( cv : : __termination ) // process is not in consistent state (after ExitProcess call) and terminating
return ; // avoid any OpenCL calls
# endif
2013-10-22 18:04:49 +08:00
if ( u - > tempUMat ( ) )
{
2015-07-25 00:10:31 +08:00
CV_Assert ( u - > origdata ) ;
2014-01-30 04:19:18 +08:00
// UMatDataAutoLock lock(u);
2015-01-02 08:33:40 +08:00
2015-09-08 09:06:04 +08:00
if ( u - > hostCopyObsolete ( ) )
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
2013-12-02 04:12:20 +08:00
{
2015-01-02 08:33:40 +08:00
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
if ( u - > tempCopiedUMat ( ) )
{
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER | |
( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER ) ;
bool isFineGrainBuffer = ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER ;
cl_command_queue q = NULL ;
if ( ! isFineGrainBuffer )
{
CV_DbgAssert ( ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) = = 0 ) ) ;
q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMap: %p (%d) \n " , u - > handle , ( int ) u - > size ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMap ( q , CL_FALSE , CL_MAP_READ ,
u - > handle , u - > size ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMap() " ) ;
2015-01-02 08:33:40 +08:00
}
clFinish ( q ) ;
memcpy ( u - > origdata , u - > handle , u - > size ) ;
if ( ! isFineGrainBuffer )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , u - > handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , u - > handle , 0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
}
}
else
{
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM ) ;
// nothing
}
2013-12-02 04:12:20 +08:00
}
else
2015-01-02 08:33:40 +08:00
# endif
{
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
if ( u - > tempCopiedUMat ( ) )
{
AlignedDataPtr < false , true > alignedPtr ( u - > origdata , u - > size , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE , 0 ,
u - > size , alignedPtr . getAlignedPtr ( ) , 0 , 0 , 0 ) ) ;
2015-01-02 08:33:40 +08:00
}
else
{
2015-08-25 00:21:37 +08:00
cl_int retval = 0 ;
2015-09-03 22:18:59 +08:00
if ( u - > tempUMat ( ) )
{
CV_Assert ( u - > mapcount = = 0 ) ;
2017-10-30 22:54:56 +08:00
flushCleanupQueue ( ) ; // workaround for CL_OUT_OF_RESOURCES problem (#9960)
2015-09-03 22:18:59 +08:00
void * data = clEnqueueMapBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
( CL_MAP_READ | CL_MAP_WRITE ) ,
0 , u - > size , 0 , 0 , 0 , & retval ) ;
2018-04-13 00:28:46 +08:00
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueMapBuffer(handle=%p, sz=%lld) => %p " , ( void * ) u - > handle , ( long long int ) u - > size , data ) . c_str ( ) ) ;
2019-09-24 18:03:29 +08:00
CV_Assert ( u - > origdata = = data & & " Details: https://github.com/opencv/opencv/issues/6293 " ) ;
2015-09-08 09:06:04 +08:00
if ( u - > originalUMatData )
{
CV_Assert ( u - > originalUMatData - > data = = data ) ;
}
2018-04-13 00:28:46 +08:00
retval = clEnqueueUnmapMemObject ( q , ( cl_mem ) u - > handle , data , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueUnmapMemObject(handle=%p, data=%p, [sz=%lld]) " , ( void * ) u - > handle , data , ( long long int ) u - > size ) . c_str ( ) ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( q ) ) ;
2015-09-03 22:18:59 +08:00
}
2015-01-02 08:33:40 +08:00
}
}
u - > markHostCopyObsolete ( false ) ;
}
2015-09-08 09:06:04 +08:00
else
{
// nothing
}
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
if ( u - > tempCopiedUMat ( ) )
2013-12-02 04:12:20 +08:00
{
2015-01-02 08:33:40 +08:00
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_OPENCL_SVM_TRACE_P ( " clSVMFree: %p \n " , u - > handle ) ;
svmFns - > fn_clSVMFree ( ( cl_context ) ctx . ptr ( ) , u - > handle ) ;
2013-12-02 04:12:20 +08:00
}
2013-10-22 18:04:49 +08:00
}
2015-01-02 08:33:40 +08:00
else
# endif
{
2018-04-13 00:28:46 +08:00
cl_int retval = clReleaseMemObject ( ( cl_mem ) u - > handle ) ;
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clReleaseMemObject(ptr=%p) " , ( void * ) u - > handle ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
}
2013-11-25 21:16:22 +08:00
u - > handle = 0 ;
2015-07-25 00:10:31 +08:00
u - > markDeviceCopyObsolete ( true ) ;
2013-10-22 18:04:49 +08:00
u - > currAllocator = u - > prevAllocator ;
2015-07-25 00:10:31 +08:00
u - > prevAllocator = NULL ;
if ( u - > data & & u - > copyOnMap ( ) & & u - > data ! = u - > origdata )
2013-10-22 21:41:28 +08:00
fastFree ( u - > data ) ;
u - > data = u - > origdata ;
2015-09-08 09:06:04 +08:00
u - > currAllocator - > deallocate ( u ) ;
u = NULL ;
2013-10-22 18:04:49 +08:00
}
else
{
2015-07-25 00:10:31 +08:00
CV_Assert ( u - > origdata = = NULL ) ;
if ( u - > data & & u - > copyOnMap ( ) & & u - > data ! = u - > origdata )
2013-12-01 07:12:19 +08:00
{
2013-10-22 18:04:49 +08:00
fastFree ( u - > data ) ;
2013-12-01 07:12:19 +08:00
u - > data = 0 ;
2015-07-25 00:10:31 +08:00
u - > markHostCopyObsolete ( true ) ;
2013-12-01 07:12:19 +08:00
}
2014-02-10 20:34:45 +08:00
if ( u - > allocatorFlags_ & ALLOCATOR_FLAGS_BUFFER_POOL_USED )
{
2020-08-12 02:13:52 +08:00
std : : shared_ptr < ocl : : Context > pCtx = std : : static_pointer_cast < ocl : : Context > ( u - > allocatorContext ) ;
CV_Assert ( pCtx ) ;
ocl : : Context & ctx = * pCtx . get ( ) ;
CV_Assert ( ctx . getImpl ( ) ) ;
ctx . getImpl ( ) - > getBufferPool ( ) . release ( ( cl_mem ) u - > handle ) ;
2015-01-02 08:33:40 +08:00
}
else if ( u - > allocatorFlags_ & ALLOCATOR_FLAGS_BUFFER_POOL_HOST_PTR_USED )
{
2020-08-12 02:13:52 +08:00
std : : shared_ptr < ocl : : Context > pCtx = std : : static_pointer_cast < ocl : : Context > ( u - > allocatorContext ) ;
CV_Assert ( pCtx ) ;
ocl : : Context & ctx = * pCtx . get ( ) ;
CV_Assert ( ctx . getImpl ( ) ) ;
ctx . getImpl ( ) - > getBufferPoolHostPtr ( ) . release ( ( cl_mem ) u - > handle ) ;
2015-01-02 08:33:40 +08:00
}
# ifdef HAVE_OPENCL_SVM
else if ( u - > allocatorFlags_ & ALLOCATOR_FLAGS_BUFFER_POOL_SVM_USED )
{
2020-08-12 02:13:52 +08:00
std : : shared_ptr < ocl : : Context > pCtx = std : : static_pointer_cast < ocl : : Context > ( u - > allocatorContext ) ;
CV_Assert ( pCtx ) ;
ocl : : Context & ctx = * pCtx . get ( ) ;
2015-01-02 08:33:40 +08:00
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM )
{
//nothing
}
else if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER | |
( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) ! = 0 )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , u - > handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , u - > handle , 0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
}
}
2020-08-12 02:13:52 +08:00
CV_Assert ( ctx . getImpl ( ) ) ;
ctx . getImpl ( ) - > getBufferPoolSVM ( ) . release ( ( void * ) u - > handle ) ;
2014-02-10 20:34:45 +08:00
}
2015-01-02 08:33:40 +08:00
# endif
2014-02-10 20:34:45 +08:00
else
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clReleaseMemObject ( ( cl_mem ) u - > handle ) ) ;
2014-02-10 20:34:45 +08:00
}
2013-11-25 21:16:22 +08:00
u - > handle = 0 ;
2015-07-25 00:10:31 +08:00
u - > markDeviceCopyObsolete ( true ) ;
2013-10-22 18:04:49 +08:00
delete u ;
2015-07-25 00:10:31 +08:00
u = NULL ;
2013-10-22 18:04:49 +08:00
}
2015-09-08 09:06:04 +08:00
CV_Assert ( u = = NULL ) ;
2013-10-22 18:04:49 +08:00
}
2015-09-08 09:06:04 +08:00
// synchronized call (external UMatDataAutoLock, see UMat::getMat)
2018-09-21 23:12:35 +08:00
void map ( UMatData * u , AccessFlag accessFlags ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
2015-09-08 09:06:04 +08:00
CV_Assert ( u & & u - > handle ) ;
2013-10-22 18:04:49 +08:00
2018-09-21 23:12:35 +08:00
if ( ! ! ( accessFlags & ACCESS_WRITE ) )
2013-10-22 18:04:49 +08:00
u - > markDeviceCopyObsolete ( true ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
{
if ( ! u - > copyOnMap ( ) )
{
2015-01-02 08:33:40 +08:00
// TODO
// because there can be other map requests for the same UMat with different access flags,
// we use the universal (read-write) access mode.
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) = = 0 )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMap: %p (%d) \n " , u - > handle , ( int ) u - > size ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMap ( q , CL_FALSE , CL_MAP_READ | CL_MAP_WRITE ,
u - > handle , u - > size ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMap() " ) ;
2015-01-02 08:33:40 +08:00
u - > allocatorFlags_ | = svm : : OPENCL_SVM_BUFFER_MAP ;
}
}
clFinish ( q ) ;
u - > data = ( uchar * ) u - > handle ;
u - > markHostCopyObsolete ( false ) ;
u - > markDeviceMemMapped ( true ) ;
return ;
}
# endif
2015-09-03 22:18:59 +08:00
cl_int retval = CL_SUCCESS ;
if ( ! u - > deviceMemMapped ( ) )
{
CV_Assert ( u - > refcount = = 1 ) ;
CV_Assert ( u - > mapcount + + = = 0 ) ;
u - > data = ( uchar * ) clEnqueueMapBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
( CL_MAP_READ | CL_MAP_WRITE ) ,
0 , u - > size , 0 , 0 , 0 , & retval ) ;
2018-04-13 00:28:46 +08:00
CV_OCL_DBG_CHECK_RESULT ( retval , cv : : format ( " clEnqueueMapBuffer(handle=%p, sz=%lld) => %p " , ( void * ) u - > handle , ( long long int ) u - > size , u - > data ) . c_str ( ) ) ;
2015-09-03 22:18:59 +08:00
}
if ( u - > data & & retval = = CL_SUCCESS )
2013-10-22 18:04:49 +08:00
{
u - > markHostCopyObsolete ( false ) ;
2014-07-09 16:04:22 +08:00
u - > markDeviceMemMapped ( true ) ;
2013-10-22 18:04:49 +08:00
return ;
}
2015-01-02 08:33:40 +08:00
// TODO Is it really a good idea and was it tested well?
2013-10-22 18:04:49 +08:00
// if map failed, switch to copy-on-map mode for the particular buffer
u - > flags | = UMatData : : COPY_ON_MAP ;
}
if ( ! u - > data )
{
u - > data = ( uchar * ) fastMalloc ( u - > size ) ;
u - > markHostCopyObsolete ( true ) ;
}
}
2018-09-21 23:12:35 +08:00
if ( ! ! ( accessFlags & ACCESS_READ ) & & u - > hostCopyObsolete ( ) )
2013-10-22 18:04:49 +08:00
{
2014-02-06 21:27:29 +08:00
AlignedDataPtr < false , true > alignedPtr ( u - > data , u - > size , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = 0 ) ;
# endif
2018-04-13 00:28:46 +08:00
cl_int retval = clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
0 , u - > size , alignedPtr . getAlignedPtr ( ) , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueReadBuffer(q, handle=%p, CL_TRUE, 0, sz=%lld, data=%p, 0, 0, 0) " ,
( void * ) u - > handle , ( long long int ) u - > size , alignedPtr . getAlignedPtr ( ) ) . c_str ( ) ) ;
2013-10-22 18:04:49 +08:00
u - > markHostCopyObsolete ( false ) ;
}
}
2018-03-15 21:16:50 +08:00
void unmap ( UMatData * u ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! u )
return ;
2014-07-09 16:04:22 +08:00
2013-10-22 18:04:49 +08:00
CV_Assert ( u - > handle ! = 0 ) ;
UMatDataAutoLock autolock ( u ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
2013-12-01 07:12:19 +08:00
cl_int retval = 0 ;
2014-07-09 16:04:22 +08:00
if ( ! u - > copyOnMap ( ) & & u - > deviceMemMapped ( ) )
2013-10-22 18:04:49 +08:00
{
2014-07-09 16:04:22 +08:00
CV_Assert ( u - > data ! = NULL ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) ! = 0 ) ;
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , u - > handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , u - > handle ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
clFinish ( q ) ;
u - > allocatorFlags_ & = ~ svm : : OPENCL_SVM_BUFFER_MAP ;
}
}
2015-08-25 22:25:03 +08:00
if ( u - > refcount = = 0 )
u - > data = 0 ;
2015-01-02 08:33:40 +08:00
u - > markDeviceCopyObsolete ( false ) ;
2015-07-25 00:10:31 +08:00
u - > markHostCopyObsolete ( true ) ;
2015-01-02 08:33:40 +08:00
return ;
}
# endif
2015-08-25 22:25:03 +08:00
if ( u - > refcount = = 0 )
2015-09-03 22:18:59 +08:00
{
CV_Assert ( u - > mapcount - - = = 1 ) ;
2018-04-13 00:28:46 +08:00
retval = clEnqueueUnmapMemObject ( q , ( cl_mem ) u - > handle , u - > data , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueUnmapMemObject(handle=%p, data=%p, [sz=%lld]) " , ( void * ) u - > handle , u - > data , ( long long int ) u - > size ) . c_str ( ) ) ;
2015-09-03 22:18:59 +08:00
if ( Device : : getDefault ( ) . isAMD ( ) )
{
// required for multithreaded applications (see stitching test)
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( q ) ) ;
2015-09-03 22:18:59 +08:00
}
u - > markDeviceMemMapped ( false ) ;
2015-08-25 22:25:03 +08:00
u - > data = 0 ;
2015-09-03 22:18:59 +08:00
u - > markDeviceCopyObsolete ( false ) ;
u - > markHostCopyObsolete ( true ) ;
}
2013-10-22 18:04:49 +08:00
}
else if ( u - > copyOnMap ( ) & & u - > deviceCopyObsolete ( ) )
{
2014-02-06 21:27:29 +08:00
AlignedDataPtr < true , false > alignedPtr ( u - > data , u - > size , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = 0 ) ;
# endif
2018-04-13 00:28:46 +08:00
retval = clEnqueueWriteBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
0 , u - > size , alignedPtr . getAlignedPtr ( ) , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueWriteBuffer(q, handle=%p, CL_TRUE, 0, sz=%lld, data=%p, 0, 0, 0) " ,
( void * ) u - > handle , ( long long int ) u - > size , alignedPtr . getAlignedPtr ( ) ) . c_str ( ) ) ;
2015-09-03 22:18:59 +08:00
u - > markDeviceCopyObsolete ( false ) ;
u - > markHostCopyObsolete ( true ) ;
2013-10-22 18:04:49 +08:00
}
}
bool checkContinuous ( int dims , const size_t sz [ ] ,
const size_t srcofs [ ] , const size_t srcstep [ ] ,
const size_t dstofs [ ] , const size_t dststep [ ] ,
size_t & total , size_t new_sz [ ] ,
size_t & srcrawofs , size_t new_srcofs [ ] , size_t new_srcstep [ ] ,
size_t & dstrawofs , size_t new_dstofs [ ] , size_t new_dststep [ ] ) const
{
bool iscontinuous = true ;
srcrawofs = srcofs ? srcofs [ dims - 1 ] : 0 ;
dstrawofs = dstofs ? dstofs [ dims - 1 ] : 0 ;
total = sz [ dims - 1 ] ;
for ( int i = dims - 2 ; i > = 0 ; i - - )
{
2013-10-25 15:19:40 +08:00
if ( i > = 0 & & ( total ! = srcstep [ i ] | | total ! = dststep [ i ] ) )
2013-10-22 18:04:49 +08:00
iscontinuous = false ;
total * = sz [ i ] ;
if ( srcofs )
srcrawofs + = srcofs [ i ] * srcstep [ i ] ;
if ( dstofs )
dstrawofs + = dstofs [ i ] * dststep [ i ] ;
}
if ( ! iscontinuous )
{
// OpenCL uses {x, y, z} order while OpenCV uses {z, y, x} order.
if ( dims = = 2 )
{
new_sz [ 0 ] = sz [ 1 ] ; new_sz [ 1 ] = sz [ 0 ] ; new_sz [ 2 ] = 1 ;
// we assume that new_... arrays are initialized by caller
// with 0's, so there is no else branch
if ( srcofs )
{
new_srcofs [ 0 ] = srcofs [ 1 ] ;
new_srcofs [ 1 ] = srcofs [ 0 ] ;
new_srcofs [ 2 ] = 0 ;
}
if ( dstofs )
{
new_dstofs [ 0 ] = dstofs [ 1 ] ;
new_dstofs [ 1 ] = dstofs [ 0 ] ;
new_dstofs [ 2 ] = 0 ;
}
new_srcstep [ 0 ] = srcstep [ 0 ] ; new_srcstep [ 1 ] = 0 ;
new_dststep [ 0 ] = dststep [ 0 ] ; new_dststep [ 1 ] = 0 ;
}
else
{
// we could check for dims == 3 here,
// but from user perspective this one is more informative
CV_Assert ( dims < = 3 ) ;
new_sz [ 0 ] = sz [ 2 ] ; new_sz [ 1 ] = sz [ 1 ] ; new_sz [ 2 ] = sz [ 0 ] ;
if ( srcofs )
{
new_srcofs [ 0 ] = srcofs [ 2 ] ;
new_srcofs [ 1 ] = srcofs [ 1 ] ;
new_srcofs [ 2 ] = srcofs [ 0 ] ;
}
if ( dstofs )
{
new_dstofs [ 0 ] = dstofs [ 2 ] ;
new_dstofs [ 1 ] = dstofs [ 1 ] ;
new_dstofs [ 2 ] = dstofs [ 0 ] ;
}
new_srcstep [ 0 ] = srcstep [ 1 ] ; new_srcstep [ 1 ] = srcstep [ 0 ] ;
new_dststep [ 0 ] = dststep [ 1 ] ; new_dststep [ 1 ] = dststep [ 0 ] ;
}
}
return iscontinuous ;
}
void download ( UMatData * u , void * dstptr , int dims , const size_t sz [ ] ,
const size_t srcofs [ ] , const size_t srcstep [ ] ,
2018-03-15 21:16:50 +08:00
const size_t dststep [ ] ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! u )
return ;
UMatDataAutoLock autolock ( u ) ;
if ( u - > data & & ! u - > hostCopyObsolete ( ) )
{
2015-12-01 04:45:48 +08:00
Mat : : getDefaultAllocator ( ) - > download ( u , dstptr , dims , sz , srcofs , srcstep , dststep ) ;
2013-10-22 18:04:49 +08:00
return ;
}
CV_Assert ( u - > handle ! = 0 ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
size_t total = 0 , new_sz [ ] = { 0 , 0 , 0 } ;
size_t srcrawofs = 0 , new_srcofs [ ] = { 0 , 0 , 0 } , new_srcstep [ ] = { 0 , 0 , 0 } ;
size_t dstrawofs = 0 , new_dstofs [ ] = { 0 , 0 , 0 } , new_dststep [ ] = { 0 , 0 , 0 } ;
bool iscontinuous = checkContinuous ( dims , sz , srcofs , srcstep , 0 , dststep ,
total , new_sz ,
srcrawofs , new_srcofs , new_srcstep ,
dstrawofs , new_dstofs , new_dststep ) ;
2014-02-06 21:27:29 +08:00
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
CV_DbgAssert ( u - > data = = NULL | | u - > data = = u - > handle ) ;
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) = = 0 ) ;
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMap: %p (%d) \n " , u - > handle , ( int ) u - > size ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMap ( q , CL_FALSE , CL_MAP_READ ,
u - > handle , u - > size ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMap() " ) ;
2015-01-02 08:33:40 +08:00
}
clFinish ( q ) ;
if ( iscontinuous )
{
memcpy ( dstptr , ( uchar * ) u - > handle + srcrawofs , total ) ;
}
else
{
// This code is from MatAllocator::download()
int isz [ CV_MAX_DIM ] ;
uchar * srcptr = ( uchar * ) u - > handle ;
for ( int i = 0 ; i < dims ; i + + )
{
CV_Assert ( sz [ i ] < = ( size_t ) INT_MAX ) ;
if ( sz [ i ] = = 0 )
return ;
if ( srcofs )
srcptr + = srcofs [ i ] * ( i < = dims - 2 ? srcstep [ i ] : 1 ) ;
isz [ i ] = ( int ) sz [ i ] ;
}
Mat src ( dims , isz , CV_8U , srcptr , srcstep ) ;
Mat dst ( dims , isz , CV_8U , dstptr , dststep ) ;
const Mat * arrays [ ] = { & src , & dst } ;
uchar * ptrs [ 2 ] ;
NAryMatIterator it ( arrays , ptrs , 2 ) ;
size_t j , planesz = it . size ;
for ( j = 0 ; j < it . nplanes ; j + + , + + it )
memcpy ( ptrs [ 1 ] , ptrs [ 0 ] , planesz ) ;
}
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , u - > handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , u - > handle ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
clFinish ( q ) ;
}
2013-10-22 18:04:49 +08:00
}
else
2015-01-02 08:33:40 +08:00
# endif
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
if ( iscontinuous )
{
2015-08-11 06:33:46 +08:00
AlignedDataPtr < false , true > alignedPtr ( ( uchar * ) dstptr , total , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
srcrawofs , total , alignedPtr . getAlignedPtr ( ) , 0 , 0 , 0 ) ) ;
2015-01-02 08:33:40 +08:00
}
2017-12-22 17:28:41 +08:00
else if ( CV_OPENCL_DISABLE_BUFFER_RECT_OPERATIONS )
2017-12-05 18:32:28 +08:00
{
const size_t padding = CV_OPENCL_DATA_PTR_ALIGNMENT ;
size_t new_srcrawofs = srcrawofs & ~ ( padding - 1 ) ;
size_t membuf_ofs = srcrawofs - new_srcrawofs ;
AlignedDataPtr2D < false , false > alignedPtr ( 0 , new_sz [ 1 ] , new_srcstep [ 0 ] , new_srcstep [ 0 ] ,
CV_OPENCL_DATA_PTR_ALIGNMENT , padding * 2 ) ;
uchar * ptr = alignedPtr . getAlignedPtr ( ) ;
CV_Assert ( new_srcstep [ 0 ] > = new_sz [ 0 ] ) ;
total = alignSize ( new_srcstep [ 0 ] * new_sz [ 1 ] + membuf_ofs , padding ) ;
total = std : : min ( total , u - > size - new_srcrawofs ) ;
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
new_srcrawofs , total , ptr , 0 , 0 , 0 ) ) ;
for ( size_t i = 0 ; i < new_sz [ 1 ] ; i + + )
memcpy ( ( uchar * ) dstptr + i * new_dststep [ 0 ] , ptr + i * new_srcstep [ 0 ] + membuf_ofs , new_sz [ 0 ] ) ;
}
2015-01-02 08:33:40 +08:00
else
{
2015-08-11 06:33:46 +08:00
AlignedDataPtr2D < false , true > alignedPtr ( ( uchar * ) dstptr , new_sz [ 1 ] , new_sz [ 0 ] , new_dststep [ 0 ] , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
uchar * ptr = alignedPtr . getAlignedPtr ( ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueReadBufferRect ( q , ( cl_mem ) u - > handle , CL_TRUE ,
2015-08-11 06:33:46 +08:00
new_srcofs , new_dstofs , new_sz ,
new_srcstep [ 0 ] , 0 ,
new_dststep [ 0 ] , 0 ,
2017-11-01 23:18:54 +08:00
ptr , 0 , 0 , 0 ) ) ;
2015-01-02 08:33:40 +08:00
}
2013-10-22 18:04:49 +08:00
}
}
void upload ( UMatData * u , const void * srcptr , int dims , const size_t sz [ ] ,
const size_t dstofs [ ] , const size_t dststep [ ] ,
2018-03-15 21:16:50 +08:00
const size_t srcstep [ ] ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! u )
return ;
// there should be no user-visible CPU copies of the UMat which we are going to copy to
2013-12-02 00:58:30 +08:00
CV_Assert ( u - > refcount = = 0 | | u - > tempUMat ( ) ) ;
2013-10-22 18:04:49 +08:00
size_t total = 0 , new_sz [ ] = { 0 , 0 , 0 } ;
size_t srcrawofs = 0 , new_srcofs [ ] = { 0 , 0 , 0 } , new_srcstep [ ] = { 0 , 0 , 0 } ;
size_t dstrawofs = 0 , new_dstofs [ ] = { 0 , 0 , 0 } , new_dststep [ ] = { 0 , 0 , 0 } ;
bool iscontinuous = checkContinuous ( dims , sz , 0 , srcstep , dstofs , dststep ,
total , new_sz ,
srcrawofs , new_srcofs , new_srcstep ,
dstrawofs , new_dstofs , new_dststep ) ;
UMatDataAutoLock autolock ( u ) ;
// if there is cached CPU copy of the GPU matrix,
// we could use it as a destination.
// we can do it in 2 cases:
// 1. we overwrite the whole content
// 2. we overwrite part of the matrix, but the GPU copy is out-of-date
2014-01-18 18:27:30 +08:00
if ( u - > data & & ( u - > hostCopyObsolete ( ) < u - > deviceCopyObsolete ( ) | | total = = u - > size ) )
2013-10-22 18:04:49 +08:00
{
2015-12-01 04:45:48 +08:00
Mat : : getDefaultAllocator ( ) - > upload ( u , srcptr , dims , sz , dstofs , dststep , srcstep ) ;
2013-10-22 18:04:49 +08:00
u - > markHostCopyObsolete ( false ) ;
u - > markDeviceCopyObsolete ( true ) ;
return ;
}
CV_Assert ( u - > handle ! = 0 ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
CV_DbgAssert ( u - > data = = NULL | | u - > data = = u - > handle ) ;
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
CV_DbgAssert ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MAP ) = = 0 ) ;
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMap: %p (%d) \n " , u - > handle , ( int ) u - > size ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMap ( q , CL_FALSE , CL_MAP_WRITE ,
u - > handle , u - > size ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMap() " ) ;
2015-01-02 08:33:40 +08:00
}
clFinish ( q ) ;
if ( iscontinuous )
{
memcpy ( ( uchar * ) u - > handle + dstrawofs , srcptr , total ) ;
}
else
{
// This code is from MatAllocator::upload()
int isz [ CV_MAX_DIM ] ;
uchar * dstptr = ( uchar * ) u - > handle ;
for ( int i = 0 ; i < dims ; i + + )
{
CV_Assert ( sz [ i ] < = ( size_t ) INT_MAX ) ;
if ( sz [ i ] = = 0 )
return ;
if ( dstofs )
dstptr + = dstofs [ i ] * ( i < = dims - 2 ? dststep [ i ] : 1 ) ;
isz [ i ] = ( int ) sz [ i ] ;
}
Mat src ( dims , isz , CV_8U , ( void * ) srcptr , srcstep ) ;
Mat dst ( dims , isz , CV_8U , dstptr , dststep ) ;
const Mat * arrays [ ] = { & src , & dst } ;
uchar * ptrs [ 2 ] ;
NAryMatIterator it ( arrays , ptrs , 2 ) ;
size_t j , planesz = it . size ;
for ( j = 0 ; j < it . nplanes ; j + + , + + it )
memcpy ( ptrs [ 1 ] , ptrs [ 0 ] , planesz ) ;
}
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_COARSE_GRAIN_BUFFER )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMUnmap: %p \n " , u - > handle ) ;
cl_int status = svmFns - > fn_clEnqueueSVMUnmap ( q , u - > handle ,
0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMUnmap() " ) ;
2015-01-02 08:33:40 +08:00
clFinish ( q ) ;
}
2013-10-22 18:04:49 +08:00
}
else
2015-01-02 08:33:40 +08:00
# endif
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
if ( iscontinuous )
{
2015-08-11 06:33:46 +08:00
AlignedDataPtr < true , false > alignedPtr ( ( uchar * ) srcptr , total , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2018-04-13 00:28:46 +08:00
cl_int retval = clEnqueueWriteBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
dstrawofs , total , alignedPtr . getAlignedPtr ( ) , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueWriteBuffer(q, handle=%p, CL_TRUE, offset=%lld, sz=%lld, data=%p, 0, 0, 0) " ,
( void * ) u - > handle , ( long long int ) dstrawofs , ( long long int ) u - > size , alignedPtr . getAlignedPtr ( ) ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
}
2017-12-22 17:28:41 +08:00
else if ( CV_OPENCL_DISABLE_BUFFER_RECT_OPERATIONS )
2017-12-05 18:32:28 +08:00
{
const size_t padding = CV_OPENCL_DATA_PTR_ALIGNMENT ;
size_t new_dstrawofs = dstrawofs & ~ ( padding - 1 ) ;
size_t membuf_ofs = dstrawofs - new_dstrawofs ;
AlignedDataPtr2D < false , false > alignedPtr ( 0 , new_sz [ 1 ] , new_dststep [ 0 ] , new_dststep [ 0 ] ,
CV_OPENCL_DATA_PTR_ALIGNMENT , padding * 2 ) ;
uchar * ptr = alignedPtr . getAlignedPtr ( ) ;
CV_Assert ( new_dststep [ 0 ] > = new_sz [ 0 ] & & new_srcstep [ 0 ] > = new_sz [ 0 ] ) ;
total = alignSize ( new_dststep [ 0 ] * new_sz [ 1 ] + membuf_ofs , padding ) ;
total = std : : min ( total , u - > size - new_dstrawofs ) ;
/*printf("new_sz0=%d, new_sz1=%d, membuf_ofs=%d, total=%d (%08x), new_dstrawofs=%d (%08x)\n",
( int ) new_sz [ 0 ] , ( int ) new_sz [ 1 ] , ( int ) membuf_ofs ,
( int ) total , ( int ) total , ( int ) new_dstrawofs , ( int ) new_dstrawofs ) ; */
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
new_dstrawofs , total , ptr , 0 , 0 , 0 ) ) ;
for ( size_t i = 0 ; i < new_sz [ 1 ] ; i + + )
memcpy ( ptr + i * new_dststep [ 0 ] + membuf_ofs , ( uchar * ) srcptr + i * new_srcstep [ 0 ] , new_sz [ 0 ] ) ;
CV_OCL_CHECK ( clEnqueueWriteBuffer ( q , ( cl_mem ) u - > handle , CL_TRUE ,
new_dstrawofs , total , ptr , 0 , 0 , 0 ) ) ;
}
2015-01-02 08:33:40 +08:00
else
{
2015-08-11 21:01:05 +08:00
AlignedDataPtr2D < true , false > alignedPtr ( ( uchar * ) srcptr , new_sz [ 1 ] , new_sz [ 0 ] , new_srcstep [ 0 ] , CV_OPENCL_DATA_PTR_ALIGNMENT ) ;
2015-08-11 06:33:46 +08:00
uchar * ptr = alignedPtr . getAlignedPtr ( ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueWriteBufferRect ( q , ( cl_mem ) u - > handle , CL_TRUE ,
2015-08-11 06:33:46 +08:00
new_dstofs , new_srcofs , new_sz ,
new_dststep [ 0 ] , 0 ,
new_srcstep [ 0 ] , 0 ,
2017-11-01 23:18:54 +08:00
ptr , 0 , 0 , 0 ) ) ;
2015-01-02 08:33:40 +08:00
}
2013-10-22 18:04:49 +08:00
}
u - > markHostCopyObsolete ( true ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER | |
( u - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM )
{
// nothing
}
else
# endif
{
u - > markHostCopyObsolete ( true ) ;
}
2013-10-22 18:04:49 +08:00
u - > markDeviceCopyObsolete ( false ) ;
}
void copy ( UMatData * src , UMatData * dst , int dims , const size_t sz [ ] ,
const size_t srcofs [ ] , const size_t srcstep [ ] ,
2018-03-15 21:16:50 +08:00
const size_t dstofs [ ] , const size_t dststep [ ] , bool _sync ) const CV_OVERRIDE
2013-10-22 18:04:49 +08:00
{
if ( ! src | | ! dst )
return ;
size_t total = 0 , new_sz [ ] = { 0 , 0 , 0 } ;
size_t srcrawofs = 0 , new_srcofs [ ] = { 0 , 0 , 0 } , new_srcstep [ ] = { 0 , 0 , 0 } ;
size_t dstrawofs = 0 , new_dstofs [ ] = { 0 , 0 , 0 } , new_dststep [ ] = { 0 , 0 , 0 } ;
bool iscontinuous = checkContinuous ( dims , sz , srcofs , srcstep , dstofs , dststep ,
total , new_sz ,
srcrawofs , new_srcofs , new_srcstep ,
dstrawofs , new_dstofs , new_dststep ) ;
2018-01-16 22:33:06 +08:00
UMatDataAutoLock src_autolock ( src , dst ) ;
2013-10-22 18:04:49 +08:00
2014-01-18 18:27:30 +08:00
if ( ! src - > handle | | ( src - > data & & src - > hostCopyObsolete ( ) < src - > deviceCopyObsolete ( ) ) )
2013-10-22 18:04:49 +08:00
{
upload ( dst , src - > data + srcrawofs , dims , sz , dstofs , dststep , srcstep ) ;
return ;
}
2014-01-18 18:27:30 +08:00
if ( ! dst - > handle | | ( dst - > data & & dst - > hostCopyObsolete ( ) < dst - > deviceCopyObsolete ( ) ) )
2013-10-22 18:04:49 +08:00
{
download ( src , dst - > data + dstrawofs , dims , sz , srcofs , srcstep , dststep ) ;
dst - > markHostCopyObsolete ( false ) ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER | |
( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM )
{
// nothing
}
else
# endif
{
dst - > markDeviceCopyObsolete ( true ) ;
}
2013-10-22 18:04:49 +08:00
return ;
}
// there should be no user-visible CPU copies of the UMat which we are going to copy to
CV_Assert ( dst - > refcount = = 0 ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
2015-01-02 08:33:40 +08:00
cl_int retval = CL_SUCCESS ;
# ifdef HAVE_OPENCL_SVM
if ( ( src - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 | |
( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
if ( ( src - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 & &
( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
Context & ctx = Context : : getDefault ( ) ;
const svm : : SVMFunctions * svmFns = svm : : getSVMFunctions ( ctx ) ;
CV_DbgAssert ( svmFns - > isValid ( ) ) ;
if ( iscontinuous )
{
CV_OPENCL_SVM_TRACE_P ( " clEnqueueSVMMemcpy: %p <-- %p (%d) \n " ,
( uchar * ) dst - > handle + dstrawofs , ( uchar * ) src - > handle + srcrawofs , ( int ) total ) ;
cl_int status = svmFns - > fn_clEnqueueSVMMemcpy ( q , CL_TRUE ,
( uchar * ) dst - > handle + dstrawofs , ( uchar * ) src - > handle + srcrawofs ,
total , 0 , NULL , NULL ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK_RESULT ( status , " clEnqueueSVMMemcpy() " ) ;
2015-01-02 08:33:40 +08:00
}
else
{
clFinish ( q ) ;
// This code is from MatAllocator::download()/upload()
int isz [ CV_MAX_DIM ] ;
uchar * srcptr = ( uchar * ) src - > handle ;
for ( int i = 0 ; i < dims ; i + + )
{
CV_Assert ( sz [ i ] < = ( size_t ) INT_MAX ) ;
if ( sz [ i ] = = 0 )
return ;
if ( srcofs )
srcptr + = srcofs [ i ] * ( i < = dims - 2 ? srcstep [ i ] : 1 ) ;
isz [ i ] = ( int ) sz [ i ] ;
}
Mat m_src ( dims , isz , CV_8U , srcptr , srcstep ) ;
uchar * dstptr = ( uchar * ) dst - > handle ;
for ( int i = 0 ; i < dims ; i + + )
{
if ( dstofs )
dstptr + = dstofs [ i ] * ( i < = dims - 2 ? dststep [ i ] : 1 ) ;
}
Mat m_dst ( dims , isz , CV_8U , dstptr , dststep ) ;
const Mat * arrays [ ] = { & m_src , & m_dst } ;
uchar * ptrs [ 2 ] ;
NAryMatIterator it ( arrays , ptrs , 2 ) ;
size_t j , planesz = it . size ;
for ( j = 0 ; j < it . nplanes ; j + + , + + it )
memcpy ( ptrs [ 1 ] , ptrs [ 0 ] , planesz ) ;
}
}
else
{
if ( ( src - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) ! = 0 )
{
map ( src , ACCESS_READ ) ;
upload ( dst , src - > data + srcrawofs , dims , sz , dstofs , dststep , srcstep ) ;
unmap ( src ) ;
}
else
{
map ( dst , ACCESS_WRITE ) ;
download ( src , dst - > data + dstrawofs , dims , sz , srcofs , srcstep , dststep ) ;
unmap ( dst ) ;
}
}
2013-10-22 18:04:49 +08:00
}
else
2015-01-02 08:33:40 +08:00
# endif
2013-10-22 18:04:49 +08:00
{
2015-01-02 08:33:40 +08:00
if ( iscontinuous )
{
2018-04-13 00:28:46 +08:00
retval = clEnqueueCopyBuffer ( q , ( cl_mem ) src - > handle , ( cl_mem ) dst - > handle ,
srcrawofs , dstrawofs , total , 0 , 0 , 0 ) ;
CV_OCL_CHECK_RESULT ( retval , cv : : format ( " clEnqueueCopyBuffer(q, src=%p, dst=%p, src_offset=%lld, dst_offset=%lld, sz=%lld, 0, 0, 0) " ,
( void * ) src - > handle , ( void * ) dst - > handle , ( long long int ) srcrawofs , ( long long int ) dstrawofs , ( long long int ) total ) . c_str ( ) ) ;
2015-01-02 08:33:40 +08:00
}
2017-12-22 17:28:41 +08:00
else if ( CV_OPENCL_DISABLE_BUFFER_RECT_OPERATIONS )
2017-12-05 18:32:28 +08:00
{
const size_t padding = CV_OPENCL_DATA_PTR_ALIGNMENT ;
size_t new_srcrawofs = srcrawofs & ~ ( padding - 1 ) ;
size_t srcmembuf_ofs = srcrawofs - new_srcrawofs ;
size_t new_dstrawofs = dstrawofs & ~ ( padding - 1 ) ;
size_t dstmembuf_ofs = dstrawofs - new_dstrawofs ;
AlignedDataPtr2D < false , false > srcBuf ( 0 , new_sz [ 1 ] , new_srcstep [ 0 ] , new_srcstep [ 0 ] ,
CV_OPENCL_DATA_PTR_ALIGNMENT , padding * 2 ) ;
AlignedDataPtr2D < false , false > dstBuf ( 0 , new_sz [ 1 ] , new_dststep [ 0 ] , new_dststep [ 0 ] ,
CV_OPENCL_DATA_PTR_ALIGNMENT , padding * 2 ) ;
uchar * srcptr = srcBuf . getAlignedPtr ( ) ;
uchar * dstptr = dstBuf . getAlignedPtr ( ) ;
CV_Assert ( new_dststep [ 0 ] > = new_sz [ 0 ] & & new_srcstep [ 0 ] > = new_sz [ 0 ] ) ;
size_t src_total = alignSize ( new_srcstep [ 0 ] * new_sz [ 1 ] + srcmembuf_ofs , padding ) ;
src_total = std : : min ( src_total , src - > size - new_srcrawofs ) ;
size_t dst_total = alignSize ( new_dststep [ 0 ] * new_sz [ 1 ] + dstmembuf_ofs , padding ) ;
dst_total = std : : min ( dst_total , dst - > size - new_dstrawofs ) ;
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) src - > handle , CL_TRUE ,
new_srcrawofs , src_total , srcptr , 0 , 0 , 0 ) ) ;
CV_OCL_CHECK ( clEnqueueReadBuffer ( q , ( cl_mem ) dst - > handle , CL_TRUE ,
new_dstrawofs , dst_total , dstptr , 0 , 0 , 0 ) ) ;
for ( size_t i = 0 ; i < new_sz [ 1 ] ; i + + )
memcpy ( dstptr + dstmembuf_ofs + i * new_dststep [ 0 ] ,
srcptr + srcmembuf_ofs + i * new_srcstep [ 0 ] , new_sz [ 0 ] ) ;
CV_OCL_CHECK ( clEnqueueWriteBuffer ( q , ( cl_mem ) dst - > handle , CL_TRUE ,
new_dstrawofs , dst_total , dstptr , 0 , 0 , 0 ) ) ;
}
2015-01-02 08:33:40 +08:00
else
{
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( retval = clEnqueueCopyBufferRect ( q , ( cl_mem ) src - > handle , ( cl_mem ) dst - > handle ,
2015-01-02 08:33:40 +08:00
new_srcofs , new_dstofs , new_sz ,
2015-08-11 06:33:46 +08:00
new_srcstep [ 0 ] , 0 ,
new_dststep [ 0 ] , 0 ,
2017-11-01 23:18:54 +08:00
0 , 0 , 0 ) ) ;
2015-01-02 08:33:40 +08:00
}
2013-10-22 18:04:49 +08:00
}
2015-01-02 08:33:40 +08:00
if ( retval = = CL_SUCCESS )
2014-10-03 19:17:28 +08:00
{
CV_IMPL_ADD ( CV_IMPL_OCL )
}
2013-10-22 18:04:49 +08:00
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_BUFFER | |
2015-08-11 06:33:46 +08:00
( dst - > allocatorFlags_ & svm : : OPENCL_SVM_BUFFER_MASK ) = = svm : : OPENCL_SVM_FINE_GRAIN_SYSTEM )
2015-01-02 08:33:40 +08:00
{
// nothing
}
else
# endif
{
dst - > markHostCopyObsolete ( true ) ;
}
2013-10-22 18:04:49 +08:00
dst - > markDeviceCopyObsolete ( false ) ;
2013-12-02 00:58:30 +08:00
if ( _sync )
2014-02-01 19:07:03 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( q ) ) ;
2014-02-01 19:07:03 +08:00
}
2013-10-22 18:04:49 +08:00
}
2013-12-01 07:12:19 +08:00
2020-08-12 02:13:52 +08:00
BufferPoolController * getBufferPoolController ( const char * id ) const CV_OVERRIDE
{
ocl : : Context ctx = Context : : getDefault ( ) ;
if ( ctx . empty ( ) )
return NULL ;
2015-01-02 08:33:40 +08:00
# ifdef HAVE_OPENCL_SVM
if ( ( svm : : checkForceSVMUmatUsage ( ) & & ( id = = NULL | | strcmp ( id , " OCL " ) = = 0 ) ) | | ( id ! = NULL & & strcmp ( id , " SVM " ) = = 0 ) )
{
2020-08-12 02:13:52 +08:00
return & ctx . getImpl ( ) - > getBufferPoolSVM ( ) ;
2015-01-02 08:33:40 +08:00
}
# endif
if ( id ! = NULL & & strcmp ( id , " HOST_ALLOC " ) = = 0 )
{
2020-08-12 02:13:52 +08:00
return & ctx . getImpl ( ) - > getBufferPoolHostPtr ( ) ;
2015-01-02 08:33:40 +08:00
}
if ( id ! = NULL & & strcmp ( id , " OCL " ) ! = 0 )
{
2018-04-24 00:02:39 +08:00
CV_Error ( cv : : Error : : StsBadArg , " getBufferPoolController(): unknown BufferPool ID \n " ) ;
2015-01-02 08:33:40 +08:00
}
2020-08-12 02:13:52 +08:00
return & ctx . getImpl ( ) - > getBufferPool ( ) ;
2015-01-02 08:33:40 +08:00
}
2014-01-16 22:30:39 +08:00
2013-12-01 07:12:19 +08:00
MatAllocator * matStdAllocator ;
2017-07-06 22:57:05 +08:00
mutable cv : : Mutex cleanupQueueMutex ;
mutable std : : deque < UMatData * > cleanupQueue ;
void flushCleanupQueue ( ) const
{
if ( ! cleanupQueue . empty ( ) )
{
std : : deque < UMatData * > q ;
{
cv : : AutoLock lock ( cleanupQueueMutex ) ;
q . swap ( cleanupQueue ) ;
}
for ( std : : deque < UMatData * > : : const_iterator i = q . begin ( ) ; i ! = q . end ( ) ; + + i )
{
deallocate_ ( * i ) ;
}
}
}
void addToCleanupQueue ( UMatData * u ) const
{
//TODO: Validation check: CV_Assert(!u->tempUMat());
{
cv : : AutoLock lock ( cleanupQueueMutex ) ;
cleanupQueue . push_back ( u ) ;
}
}
2013-10-22 18:04:49 +08:00
} ;
2017-11-24 22:34:02 +08:00
static OpenCLAllocator * getOpenCLAllocator_ ( ) // call once guarantee
{
Fix modules/ typos
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`
2019-08-16 06:02:09 +08:00
static OpenCLAllocator * g_allocator = new OpenCLAllocator ( ) ; // avoid destructor call (using of this object is too wide)
2017-11-24 22:34:02 +08:00
return g_allocator ;
}
2013-10-22 18:04:49 +08:00
MatAllocator * getOpenCLAllocator ( )
{
2017-11-24 22:34:02 +08:00
CV_SINGLETON_LAZY_INIT ( MatAllocator , getOpenCLAllocator_ ( ) )
2013-10-22 18:04:49 +08:00
}
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
} } // namespace cv::ocl
namespace cv {
// three funcs below are implemented in umatrix.cpp
void setSize ( UMat & m , int _dims , const int * _sz , const size_t * _steps ,
bool autoSteps = false ) ;
void finalizeHdr ( UMat & m ) ;
} // namespace cv
namespace cv { namespace ocl {
/*
// Convert OpenCL buffer memory to UMat
*/
void convertFromBuffer ( void * cl_mem_buffer , size_t step , int rows , int cols , int type , UMat & dst )
{
int d = 2 ;
int sizes [ ] = { rows , cols } ;
CV_Assert ( 0 < = d & & d < = CV_MAX_DIM ) ;
dst . release ( ) ;
dst . flags = ( type & Mat : : TYPE_MASK ) | Mat : : MAGIC_VAL ;
dst . usageFlags = USAGE_DEFAULT ;
setSize ( dst , d , sizes , 0 , true ) ;
dst . offset = 0 ;
cl_mem memobj = ( cl_mem ) cl_mem_buffer ;
cl_mem_object_type mem_type = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetMemObjectInfo ( memobj , CL_MEM_TYPE , sizeof ( cl_mem_object_type ) , & mem_type , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
CV_Assert ( CL_MEM_OBJECT_BUFFER = = mem_type ) ;
size_t total = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetMemObjectInfo ( memobj , CL_MEM_SIZE , sizeof ( size_t ) , & total , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clRetainMemObject ( memobj ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
CV_Assert ( ( int ) step > = cols * CV_ELEM_SIZE ( type ) ) ;
CV_Assert ( total > = rows * step ) ;
// attach clBuffer to UMatData
dst . u = new UMatData ( getOpenCLAllocator ( ) ) ;
dst . u - > data = 0 ;
2018-10-01 21:28:17 +08:00
dst . u - > allocatorFlags_ = OpenCLAllocator : : ALLOCATOR_FLAGS_EXTERNAL_BUFFER ; // not allocated from any OpenCV buffer pool
2018-09-21 23:12:35 +08:00
dst . u - > flags = static_cast < UMatData : : MemoryFlag > ( 0 ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
dst . u - > handle = cl_mem_buffer ;
dst . u - > origdata = 0 ;
dst . u - > prevAllocator = 0 ;
dst . u - > size = total ;
finalizeHdr ( dst ) ;
dst . addref ( ) ;
return ;
} // convertFromBuffer()
/*
// Convert OpenCL image2d_t memory to UMat
*/
void convertFromImage ( void * cl_mem_image , UMat & dst )
{
cl_mem clImage = ( cl_mem ) cl_mem_image ;
cl_mem_object_type mem_type = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetMemObjectInfo ( clImage , CL_MEM_TYPE , sizeof ( cl_mem_object_type ) , & mem_type , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
CV_Assert ( CL_MEM_OBJECT_IMAGE2D = = mem_type ) ;
cl_image_format fmt = { 0 , 0 } ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetImageInfo ( clImage , CL_IMAGE_FORMAT , sizeof ( cl_image_format ) , & fmt , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
int depth = CV_8U ;
switch ( fmt . image_channel_data_type )
{
case CL_UNORM_INT8 :
case CL_UNSIGNED_INT8 :
depth = CV_8U ;
break ;
case CL_SNORM_INT8 :
case CL_SIGNED_INT8 :
depth = CV_8S ;
break ;
case CL_UNORM_INT16 :
case CL_UNSIGNED_INT16 :
depth = CV_16U ;
break ;
case CL_SNORM_INT16 :
case CL_SIGNED_INT16 :
depth = CV_16S ;
break ;
case CL_SIGNED_INT32 :
depth = CV_32S ;
break ;
case CL_FLOAT :
depth = CV_32F ;
break ;
2021-06-21 11:46:32 +08:00
case CL_HALF_FLOAT :
depth = CV_16F ;
break ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
default :
CV_Error ( cv : : Error : : OpenCLApiCallError , " Not supported image_channel_data_type " ) ;
}
int type = CV_8UC1 ;
switch ( fmt . image_channel_order )
{
case CL_R :
2021-06-21 11:46:32 +08:00
case CL_A :
case CL_INTENSITY :
case CL_LUMINANCE :
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
type = CV_MAKE_TYPE ( depth , 1 ) ;
break ;
2021-06-21 11:46:32 +08:00
case CL_RG :
case CL_RA :
type = CV_MAKE_TYPE ( depth , 2 ) ;
break ;
// CL_RGB has no mappings to OpenCV types because CL_RGB can only be used with
// CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, or CL_UNORM_INT_101010.
/*case CL_RGB:
type = CV_MAKE_TYPE ( depth , 3 ) ;
break ; */
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
case CL_RGBA :
case CL_BGRA :
case CL_ARGB :
2017-11-01 23:18:54 +08:00
type = CV_MAKE_TYPE ( depth , 4 ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
break ;
default :
CV_Error ( cv : : Error : : OpenCLApiCallError , " Not supported image_channel_order " ) ;
break ;
}
size_t step = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetImageInfo ( clImage , CL_IMAGE_ROW_PITCH , sizeof ( size_t ) , & step , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
size_t w = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetImageInfo ( clImage , CL_IMAGE_WIDTH , sizeof ( size_t ) , & w , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
size_t h = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clGetImageInfo ( clImage , CL_IMAGE_HEIGHT , sizeof ( size_t ) , & h , 0 ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
dst . create ( ( int ) h , ( int ) w , type ) ;
cl_mem clBuffer = ( cl_mem ) dst . handle ( ACCESS_READ ) ;
cl_command_queue q = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
size_t offset = 0 ;
size_t src_origin [ 3 ] = { 0 , 0 , 0 } ;
size_t region [ 3 ] = { w , h , 1 } ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueCopyImageToBuffer ( q , clImage , clBuffer , src_origin , region , offset , 0 , NULL , NULL ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clFinish ( q ) ) ;
OpenCV-OpenCL interop (PR #4072):
Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead
2015-05-28 04:22:33 +08:00
return ;
} // convertFromImage()
2014-02-01 19:07:03 +08:00
///////////////////////////////////////////// Utility functions /////////////////////////////////////////////////
2014-01-22 14:08:42 +08:00
2014-02-01 19:07:03 +08:00
static void getDevices ( std : : vector < cl_device_id > & devices , cl_platform_id platform )
2014-01-22 14:08:42 +08:00
{
cl_uint numDevices = 0 ;
2019-10-30 23:24:32 +08:00
cl_int status = clGetDeviceIDs ( platform , ( cl_device_type ) Device : : TYPE_ALL , 0 , NULL , & numDevices ) ;
if ( status ! = CL_DEVICE_NOT_FOUND ) // Not an error if platform has no devices
{
CV_OCL_DBG_CHECK_RESULT ( status ,
cv : : format ( " clGetDeviceIDs(platform, Device::TYPE_ALL, num_entries=0, devices=NULL, numDevices=%p) " , & numDevices ) . c_str ( ) ) ;
}
2014-02-01 19:07:03 +08:00
2014-01-22 14:08:42 +08:00
if ( numDevices = = 0 )
2014-02-01 19:07:03 +08:00
{
devices . clear ( ) ;
2014-01-22 14:08:42 +08:00
return ;
2014-02-01 19:07:03 +08:00
}
2014-01-22 14:08:42 +08:00
devices . resize ( ( size_t ) numDevices ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetDeviceIDs ( platform , ( cl_device_type ) Device : : TYPE_ALL , numDevices , & devices [ 0 ] , & numDevices ) ) ;
2014-01-22 14:08:42 +08:00
}
2014-02-01 00:23:01 +08:00
struct PlatformInfo : : Impl
2014-01-22 14:08:42 +08:00
{
Impl ( void * id )
{
2014-01-29 00:22:56 +08:00
refcount = 1 ;
2014-01-22 14:08:42 +08:00
handle = * ( cl_platform_id * ) id ;
getDevices ( devices , handle ) ;
2020-12-14 03:03:11 +08:00
version_ = getStrProp ( CL_PLATFORM_VERSION ) ;
parseOpenCLVersion ( version_ , versionMajor_ , versionMinor_ ) ;
2014-01-22 14:08:42 +08:00
}
2017-02-15 23:44:40 +08:00
String getStrProp ( cl_platform_info prop ) const
2014-01-22 14:08:42 +08:00
{
char buf [ 1024 ] ;
size_t sz = 0 ;
2014-02-01 19:07:03 +08:00
return clGetPlatformInfo ( handle , prop , sizeof ( buf ) - 16 , buf , & sz ) = = CL_SUCCESS & &
2014-01-22 14:08:42 +08:00
sz < sizeof ( buf ) ? String ( buf ) : String ( ) ;
}
IMPLEMENT_REFCOUNTABLE ( ) ;
std : : vector < cl_device_id > devices ;
cl_platform_id handle ;
2020-12-14 03:03:11 +08:00
String version_ ;
int versionMajor_ ;
int versionMinor_ ;
2014-01-22 14:08:42 +08:00
} ;
2021-02-20 21:16:47 +08:00
PlatformInfo : : PlatformInfo ( ) CV_NOEXCEPT
2014-01-22 14:08:42 +08:00
{
p = 0 ;
}
2014-02-01 00:23:01 +08:00
PlatformInfo : : PlatformInfo ( void * platform_id )
2014-01-22 14:08:42 +08:00
{
p = new Impl ( platform_id ) ;
}
2014-02-01 00:23:01 +08:00
PlatformInfo : : ~ PlatformInfo ( )
2014-01-22 14:08:42 +08:00
{
if ( p )
p - > release ( ) ;
}
2014-02-01 00:23:01 +08:00
PlatformInfo : : PlatformInfo ( const PlatformInfo & i )
2014-01-29 21:37:52 +08:00
{
if ( i . p )
i . p - > addref ( ) ;
2014-02-01 19:07:03 +08:00
p = i . p ;
2014-01-29 21:37:52 +08:00
}
2014-02-01 00:23:01 +08:00
PlatformInfo & PlatformInfo : : operator = ( const PlatformInfo & i )
2014-01-29 21:37:52 +08:00
{
2014-02-01 19:07:03 +08:00
if ( i . p ! = p )
2014-01-29 21:37:52 +08:00
{
if ( i . p )
i . p - > addref ( ) ;
2014-02-01 19:07:03 +08:00
if ( p )
p - > release ( ) ;
p = i . p ;
2014-01-29 21:37:52 +08:00
}
return * this ;
}
2021-02-21 01:56:04 +08:00
PlatformInfo : : PlatformInfo ( PlatformInfo & & i ) CV_NOEXCEPT
{
p = i . p ;
i . p = nullptr ;
}
PlatformInfo & PlatformInfo : : operator = ( PlatformInfo & & i ) CV_NOEXCEPT
{
if ( this ! = & i ) {
if ( p )
p - > release ( ) ;
p = i . p ;
i . p = nullptr ;
}
return * this ;
}
2014-02-01 00:23:01 +08:00
int PlatformInfo : : deviceNumber ( ) const
2014-01-22 14:08:42 +08:00
{
return p ? ( int ) p - > devices . size ( ) : 0 ;
}
2014-02-01 00:23:01 +08:00
void PlatformInfo : : getDevice ( Device & device , int d ) const
2014-01-22 14:08:42 +08:00
{
2014-01-29 00:22:56 +08:00
CV_Assert ( p & & d < ( int ) p - > devices . size ( ) ) ;
2014-01-22 14:08:42 +08:00
if ( p )
device . set ( p - > devices [ d ] ) ;
}
2014-02-01 00:23:01 +08:00
String PlatformInfo : : name ( ) const
2014-01-22 14:08:42 +08:00
{
return p ? p - > getStrProp ( CL_PLATFORM_NAME ) : String ( ) ;
}
2014-02-01 00:23:01 +08:00
String PlatformInfo : : vendor ( ) const
2014-01-22 14:08:42 +08:00
{
return p ? p - > getStrProp ( CL_PLATFORM_VENDOR ) : String ( ) ;
}
2014-02-01 00:23:01 +08:00
String PlatformInfo : : version ( ) const
2014-01-22 14:08:42 +08:00
{
2020-12-14 03:03:11 +08:00
return p ? p - > version_ : String ( ) ;
}
int PlatformInfo : : versionMajor ( ) const
{
CV_Assert ( p ) ;
return p - > versionMajor_ ;
}
int PlatformInfo : : versionMinor ( ) const
{
CV_Assert ( p ) ;
return p - > versionMinor_ ;
2014-01-22 14:08:42 +08:00
}
static void getPlatforms ( std : : vector < cl_platform_id > & platforms )
{
cl_uint numPlatforms = 0 ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetPlatformIDs ( 0 , NULL , & numPlatforms ) ) ;
2014-02-01 19:07:03 +08:00
2014-01-22 14:08:42 +08:00
if ( numPlatforms = = 0 )
2014-02-01 19:07:03 +08:00
{
platforms . clear ( ) ;
2014-01-22 14:08:42 +08:00
return ;
2014-02-01 19:07:03 +08:00
}
2014-01-22 14:08:42 +08:00
platforms . resize ( ( size_t ) numPlatforms ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clGetPlatformIDs ( numPlatforms , & platforms [ 0 ] , & numPlatforms ) ) ;
2014-01-22 14:08:42 +08:00
}
2014-02-01 00:23:01 +08:00
void getPlatfomsInfo ( std : : vector < PlatformInfo > & platformsInfo )
2014-01-22 14:08:42 +08:00
{
std : : vector < cl_platform_id > platforms ;
getPlatforms ( platforms ) ;
2014-02-01 19:07:03 +08:00
2014-01-22 14:08:42 +08:00
for ( size_t i = 0 ; i < platforms . size ( ) ; i + + )
2014-02-01 00:23:01 +08:00
platformsInfo . push_back ( PlatformInfo ( ( void * ) & platforms [ i ] ) ) ;
2014-01-22 14:08:42 +08:00
}
2014-02-01 19:07:03 +08:00
const char * typeToStr ( int type )
2013-11-19 00:48:00 +08:00
{
static const char * tab [ ] =
{
2014-03-08 05:29:27 +08:00
" uchar " , " uchar2 " , " uchar3 " , " uchar4 " , 0 , 0 , 0 , " uchar8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " uchar16 " ,
" char " , " char2 " , " char3 " , " char4 " , 0 , 0 , 0 , " char8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " char16 " ,
2023-04-18 14:22:59 +08:00
" ushort " , " ushort2 " , " ushort3 " , " ushort4 " , 0 , 0 , 0 , " ushort8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " ushort16 " ,
2014-03-08 05:29:27 +08:00
" short " , " short2 " , " short3 " , " short4 " , 0 , 0 , 0 , " short8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " short16 " ,
" int " , " int2 " , " int3 " , " int4 " , 0 , 0 , 0 , " int8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int16 " ,
" float " , " float2 " , " float3 " , " float4 " , 0 , 0 , 0 , " float8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " float16 " ,
" double " , " double2 " , " double3 " , " double4 " , 0 , 0 , 0 , " double8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " double16 " ,
2018-11-01 17:57:21 +08:00
" half " , " half2 " , " half3 " , " half4 " , 0 , 0 , 0 , " half8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " half16 " ,
0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0
2013-11-19 00:48:00 +08:00
} ;
2014-02-01 19:07:03 +08:00
int cn = CV_MAT_CN ( type ) , depth = CV_MAT_DEPTH ( type ) ;
2023-04-18 14:22:59 +08:00
const char * result = cn > 16 ? nullptr : tab [ depth * 16 + cn - 1 ] ;
2018-11-01 17:57:21 +08:00
CV_Assert ( result ) ;
return result ;
2013-11-19 00:48:00 +08:00
}
2014-02-01 19:07:03 +08:00
const char * memopTypeToStr ( int type )
2014-08-29 18:18:52 +08:00
{
static const char * tab [ ] =
{
" uchar " , " uchar2 " , " uchar3 " , " uchar4 " , 0 , 0 , 0 , " uchar8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " uchar16 " ,
" char " , " char2 " , " char3 " , " char4 " , 0 , 0 , 0 , " char8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " char16 " ,
2023-04-18 14:22:59 +08:00
" ushort " , " ushort2 " , " ushort3 " , " ushort4 " , 0 , 0 , 0 , " ushort8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " ushort16 " ,
2014-08-29 18:18:52 +08:00
" short " , " short2 " , " short3 " , " short4 " , 0 , 0 , 0 , " short8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " short16 " ,
" int " , " int2 " , " int3 " , " int4 " , 0 , 0 , 0 , " int8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int16 " ,
" int " , " int2 " , " int3 " , " int4 " , 0 , 0 , 0 , " int8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int16 " ,
" ulong " , " ulong2 " , " ulong3 " , " ulong4 " , 0 , 0 , 0 , " ulong8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " ulong16 " ,
2018-11-01 17:57:21 +08:00
" short " , " short2 " , " short3 " , " short4 " , 0 , 0 , 0 , " short8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " short16 " ,
0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0
2014-08-29 18:18:52 +08:00
} ;
int cn = CV_MAT_CN ( type ) , depth = CV_MAT_DEPTH ( type ) ;
2023-04-18 14:22:59 +08:00
const char * result = cn > 16 ? nullptr : tab [ depth * 16 + cn - 1 ] ;
2018-11-01 17:57:21 +08:00
CV_Assert ( result ) ;
return result ;
2014-08-29 18:18:52 +08:00
}
const char * vecopTypeToStr ( int type )
2013-11-19 00:48:00 +08:00
{
2014-02-23 19:12:38 +08:00
static const char * tab [ ] =
2013-11-19 00:48:00 +08:00
{
2014-07-01 17:49:03 +08:00
" uchar " , " short " , " uchar3 " , " int " , 0 , 0 , 0 , " int2 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int4 " ,
" char " , " short " , " char3 " , " int " , 0 , 0 , 0 , " int2 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int4 " ,
" ushort " , " int " , " ushort3 " , " int2 " , 0 , 0 , 0 , " int4 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int8 " ,
" short " , " int " , " short3 " , " int2 " , 0 , 0 , 0 , " int4 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int8 " ,
2014-03-08 05:29:27 +08:00
" int " , " int2 " , " int3 " , " int4 " , 0 , 0 , 0 , " int8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int16 " ,
" int " , " int2 " , " int3 " , " int4 " , 0 , 0 , 0 , " int8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " int16 " ,
" ulong " , " ulong2 " , " ulong3 " , " ulong4 " , 0 , 0 , 0 , " ulong8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " ulong16 " ,
2018-11-01 17:57:21 +08:00
" short " , " short2 " , " short3 " , " short4 " , 0 , 0 , 0 , " short8 " , 0 , 0 , 0 , 0 , 0 , 0 , 0 , " short16 " ,
0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0
2013-11-19 00:48:00 +08:00
} ;
2014-02-01 19:07:03 +08:00
int cn = CV_MAT_CN ( type ) , depth = CV_MAT_DEPTH ( type ) ;
2018-11-01 17:57:21 +08:00
const char * result = cn > 16 ? 0 : tab [ depth * 16 + cn - 1 ] ;
CV_Assert ( result ) ;
return result ;
2013-11-19 00:48:00 +08:00
}
2022-12-27 08:00:03 +08:00
// Deprecated due to size of buf buffer being unknowable.
2013-11-19 00:48:00 +08:00
const char * convertTypeStr ( int sdepth , int ddepth , int cn , char * buf )
{
2023-04-18 14:22:59 +08:00
// Since the size of buf is not given, we assume 50 because that's what all callers use.
constexpr size_t buf_max = 50 ;
2022-12-27 08:00:03 +08:00
return convertTypeStr ( sdepth , ddepth , cn , buf , buf_max ) ;
}
const char * convertTypeStr ( int sdepth , int ddepth , int cn , char * buf , size_t buf_size )
{
2013-11-19 00:48:00 +08:00
if ( sdepth = = ddepth )
return " noconvert " ;
const char * typestr = typeToStr ( CV_MAKETYPE ( ddepth , cn ) ) ;
if ( ddepth > = CV_32F | |
( ddepth = = CV_32S & & sdepth < CV_32S ) | |
( ddepth = = CV_16S & & sdepth < = CV_8S ) | |
( ddepth = = CV_16U & & sdepth = = CV_8U ) )
{
2022-12-27 08:00:03 +08:00
snprintf ( buf , buf_size , " convert_%s " , typestr ) ;
2013-11-19 00:48:00 +08:00
}
else if ( sdepth > = CV_32F )
2022-12-27 08:00:03 +08:00
snprintf ( buf , buf_size , " convert_%s%s_rte " , typestr , ( ddepth < CV_32S ? " _sat " : " " ) ) ;
2013-11-19 00:48:00 +08:00
else
2022-12-27 08:00:03 +08:00
snprintf ( buf , buf_size , " convert_%s_sat " , typestr ) ;
2014-02-01 19:07:03 +08:00
2013-11-19 00:48:00 +08:00
return buf ;
}
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
const char * getOpenCLErrorString ( int errorCode )
{
2018-03-01 18:52:43 +08:00
# define CV_OCL_CODE(id) case id: return #id
# define CV_OCL_CODE_(id, name) case id: return #name
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
switch ( errorCode )
{
2018-03-01 18:52:43 +08:00
CV_OCL_CODE ( CL_SUCCESS ) ;
CV_OCL_CODE ( CL_DEVICE_NOT_FOUND ) ;
CV_OCL_CODE ( CL_DEVICE_NOT_AVAILABLE ) ;
CV_OCL_CODE ( CL_COMPILER_NOT_AVAILABLE ) ;
CV_OCL_CODE ( CL_MEM_OBJECT_ALLOCATION_FAILURE ) ;
CV_OCL_CODE ( CL_OUT_OF_RESOURCES ) ;
CV_OCL_CODE ( CL_OUT_OF_HOST_MEMORY ) ;
CV_OCL_CODE ( CL_PROFILING_INFO_NOT_AVAILABLE ) ;
CV_OCL_CODE ( CL_MEM_COPY_OVERLAP ) ;
CV_OCL_CODE ( CL_IMAGE_FORMAT_MISMATCH ) ;
CV_OCL_CODE ( CL_IMAGE_FORMAT_NOT_SUPPORTED ) ;
CV_OCL_CODE ( CL_BUILD_PROGRAM_FAILURE ) ;
CV_OCL_CODE ( CL_MAP_FAILURE ) ;
CV_OCL_CODE ( CL_MISALIGNED_SUB_BUFFER_OFFSET ) ;
CV_OCL_CODE ( CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST ) ;
CV_OCL_CODE ( CL_COMPILE_PROGRAM_FAILURE ) ;
CV_OCL_CODE ( CL_LINKER_NOT_AVAILABLE ) ;
CV_OCL_CODE ( CL_LINK_PROGRAM_FAILURE ) ;
CV_OCL_CODE ( CL_DEVICE_PARTITION_FAILED ) ;
CV_OCL_CODE ( CL_KERNEL_ARG_INFO_NOT_AVAILABLE ) ;
CV_OCL_CODE ( CL_INVALID_VALUE ) ;
CV_OCL_CODE ( CL_INVALID_DEVICE_TYPE ) ;
CV_OCL_CODE ( CL_INVALID_PLATFORM ) ;
CV_OCL_CODE ( CL_INVALID_DEVICE ) ;
CV_OCL_CODE ( CL_INVALID_CONTEXT ) ;
CV_OCL_CODE ( CL_INVALID_QUEUE_PROPERTIES ) ;
CV_OCL_CODE ( CL_INVALID_COMMAND_QUEUE ) ;
CV_OCL_CODE ( CL_INVALID_HOST_PTR ) ;
CV_OCL_CODE ( CL_INVALID_MEM_OBJECT ) ;
CV_OCL_CODE ( CL_INVALID_IMAGE_FORMAT_DESCRIPTOR ) ;
CV_OCL_CODE ( CL_INVALID_IMAGE_SIZE ) ;
CV_OCL_CODE ( CL_INVALID_SAMPLER ) ;
CV_OCL_CODE ( CL_INVALID_BINARY ) ;
CV_OCL_CODE ( CL_INVALID_BUILD_OPTIONS ) ;
CV_OCL_CODE ( CL_INVALID_PROGRAM ) ;
CV_OCL_CODE ( CL_INVALID_PROGRAM_EXECUTABLE ) ;
CV_OCL_CODE ( CL_INVALID_KERNEL_NAME ) ;
CV_OCL_CODE ( CL_INVALID_KERNEL_DEFINITION ) ;
CV_OCL_CODE ( CL_INVALID_KERNEL ) ;
CV_OCL_CODE ( CL_INVALID_ARG_INDEX ) ;
CV_OCL_CODE ( CL_INVALID_ARG_VALUE ) ;
CV_OCL_CODE ( CL_INVALID_ARG_SIZE ) ;
CV_OCL_CODE ( CL_INVALID_KERNEL_ARGS ) ;
CV_OCL_CODE ( CL_INVALID_WORK_DIMENSION ) ;
CV_OCL_CODE ( CL_INVALID_WORK_GROUP_SIZE ) ;
CV_OCL_CODE ( CL_INVALID_WORK_ITEM_SIZE ) ;
CV_OCL_CODE ( CL_INVALID_GLOBAL_OFFSET ) ;
CV_OCL_CODE ( CL_INVALID_EVENT_WAIT_LIST ) ;
CV_OCL_CODE ( CL_INVALID_EVENT ) ;
CV_OCL_CODE ( CL_INVALID_OPERATION ) ;
CV_OCL_CODE ( CL_INVALID_GL_OBJECT ) ;
CV_OCL_CODE ( CL_INVALID_BUFFER_SIZE ) ;
CV_OCL_CODE ( CL_INVALID_MIP_LEVEL ) ;
CV_OCL_CODE ( CL_INVALID_GLOBAL_WORK_SIZE ) ;
// OpenCL 1.1
CV_OCL_CODE ( CL_INVALID_PROPERTY ) ;
// OpenCL 1.2
CV_OCL_CODE ( CL_INVALID_IMAGE_DESCRIPTOR ) ;
CV_OCL_CODE ( CL_INVALID_COMPILER_OPTIONS ) ;
CV_OCL_CODE ( CL_INVALID_LINKER_OPTIONS ) ;
CV_OCL_CODE ( CL_INVALID_DEVICE_PARTITION_COUNT ) ;
// OpenCL 2.0
CV_OCL_CODE_ ( - 69 , CL_INVALID_PIPE_SIZE ) ;
CV_OCL_CODE_ ( - 70 , CL_INVALID_DEVICE_QUEUE ) ;
// Extensions
CV_OCL_CODE_ ( - 1000 , CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR ) ;
CV_OCL_CODE_ ( - 1001 , CL_PLATFORM_NOT_FOUND_KHR ) ;
CV_OCL_CODE_ ( - 1002 , CL_INVALID_D3D10_DEVICE_KHR ) ;
CV_OCL_CODE_ ( - 1003 , CL_INVALID_D3D10_RESOURCE_KHR ) ;
CV_OCL_CODE_ ( - 1004 , CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR ) ;
CV_OCL_CODE_ ( - 1005 , CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
default : return " Unknown OpenCL error " ;
}
2018-03-01 18:52:43 +08:00
# undef CV_OCL_CODE
# undef CV_OCL_CODE_
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
2014-01-30 19:23:11 +08:00
template < typename T >
static std : : string kerToStr ( const Mat & k )
{
int width = k . cols - 1 , depth = k . depth ( ) ;
2014-08-13 19:08:27 +08:00
const T * const data = k . ptr < T > ( ) ;
2014-01-30 19:23:11 +08:00
std : : ostringstream stream ;
stream . precision ( 10 ) ;
if ( depth < = CV_8S )
{
for ( int i = 0 ; i < width ; + + i )
2014-01-30 21:23:56 +08:00
stream < < " DIG( " < < ( int ) data [ i ] < < " ) " ;
stream < < " DIG( " < < ( int ) data [ width ] < < " ) " ;
2014-01-30 19:23:11 +08:00
}
else if ( depth = = CV_32F )
{
stream . setf ( std : : ios_base : : showpoint ) ;
for ( int i = 0 ; i < width ; + + i )
2014-01-30 21:23:56 +08:00
stream < < " DIG( " < < data [ i ] < < " f) " ;
stream < < " DIG( " < < data [ width ] < < " f) " ;
2014-01-30 19:23:11 +08:00
}
2021-06-21 11:46:32 +08:00
else if ( depth = = CV_16F )
{
stream . setf ( std : : ios_base : : showpoint ) ;
for ( int i = 0 ; i < width ; + + i )
stream < < " DIG( " < < ( float ) data [ i ] < < " h) " ;
stream < < " DIG( " < < ( float ) data [ width ] < < " h) " ;
}
2014-01-30 19:23:11 +08:00
else
{
for ( int i = 0 ; i < width ; + + i )
2014-01-30 21:23:56 +08:00
stream < < " DIG( " < < data [ i ] < < " ) " ;
stream < < " DIG( " < < data [ width ] < < " ) " ;
2014-01-30 19:23:11 +08:00
}
return stream . str ( ) ;
}
2014-03-17 23:59:35 +08:00
String kernelToStr ( InputArray _kernel , int ddepth , const char * name )
2014-01-30 19:23:11 +08:00
{
Mat kernel = _kernel . getMat ( ) . reshape ( 1 , 1 ) ;
int depth = kernel . depth ( ) ;
if ( ddepth < 0 )
ddepth = depth ;
if ( ddepth ! = depth )
kernel . convertTo ( kernel , ddepth ) ;
2014-03-19 22:49:33 +08:00
typedef std : : string ( * func_t ) ( const Mat & ) ;
static const func_t funcs [ ] = { kerToStr < uchar > , kerToStr < char > , kerToStr < ushort > , kerToStr < short > ,
2021-06-21 11:46:32 +08:00
kerToStr < int > , kerToStr < float > , kerToStr < double > , kerToStr < float16_t > } ;
2014-04-24 01:20:09 +08:00
const func_t func = funcs [ ddepth ] ;
2014-01-30 19:23:11 +08:00
CV_Assert ( func ! = 0 ) ;
2014-03-17 23:59:35 +08:00
return cv : : format ( " -D %s=%s " , name ? name : " COEFF " , func ( kernel ) . c_str ( ) ) ;
2014-01-30 19:23:11 +08:00
}
2014-03-08 05:29:27 +08:00
# define PROCESS_SRC(src) \
do \
{ \
if ( ! src . empty ( ) ) \
{ \
CV_Assert ( src . isMat ( ) | | src . isUMat ( ) ) ; \
Size csize = src . size ( ) ; \
2014-07-09 23:57:27 +08:00
int ctype = src . type ( ) , ccn = CV_MAT_CN ( ctype ) , cdepth = CV_MAT_DEPTH ( ctype ) , \
ckercn = vectorWidths [ cdepth ] , cwidth = ccn * csize . width ; \
if ( cwidth < ckercn | | ckercn < = 0 ) \
return 1 ; \
cols . push_back ( cwidth ) ; \
if ( strat = = OCL_VECTOR_OWN & & ctype ! = ref_type ) \
2014-03-08 05:29:27 +08:00
return 1 ; \
offsets . push_back ( src . offset ( ) ) ; \
steps . push_back ( src . step ( ) ) ; \
2014-07-09 23:57:27 +08:00
dividers . push_back ( ckercn * CV_ELEM_SIZE1 ( ctype ) ) ; \
2014-07-10 00:06:54 +08:00
kercns . push_back ( ckercn ) ; \
2014-03-08 05:29:27 +08:00
} \
} \
while ( ( void ) 0 , 0 )
int predictOptimalVectorWidth ( InputArray src1 , InputArray src2 , InputArray src3 ,
InputArray src4 , InputArray src5 , InputArray src6 ,
2014-07-09 23:57:27 +08:00
InputArray src7 , InputArray src8 , InputArray src9 ,
OclVectorStrategy strat )
2014-03-08 05:29:27 +08:00
{
const ocl : : Device & d = ocl : : Device : : getDefault ( ) ;
int vectorWidths [ ] = { d . preferredVectorWidthChar ( ) , d . preferredVectorWidthChar ( ) ,
d . preferredVectorWidthShort ( ) , d . preferredVectorWidthShort ( ) ,
d . preferredVectorWidthInt ( ) , d . preferredVectorWidthFloat ( ) ,
2021-06-21 11:46:32 +08:00
d . preferredVectorWidthDouble ( ) , d . preferredVectorWidthHalf ( ) } ;
2014-07-03 17:45:55 +08:00
// if the device says don't use vectors
if ( vectorWidths [ 0 ] = = 1 )
2014-05-12 16:45:52 +08:00
{
// it's heuristic
2014-09-03 17:04:24 +08:00
vectorWidths [ CV_8U ] = vectorWidths [ CV_8S ] = 4 ;
2021-06-21 11:46:32 +08:00
vectorWidths [ CV_16U ] = vectorWidths [ CV_16S ] = vectorWidths [ CV_16F ] = 2 ;
2014-07-10 00:06:54 +08:00
vectorWidths [ CV_32S ] = vectorWidths [ CV_32F ] = vectorWidths [ CV_64F ] = 1 ;
2014-05-12 16:45:52 +08:00
}
2014-03-08 05:29:27 +08:00
2014-10-21 18:13:15 +08:00
return checkOptimalVectorWidth ( vectorWidths , src1 , src2 , src3 , src4 , src5 , src6 , src7 , src8 , src9 , strat ) ;
}
2014-10-20 21:43:18 +08:00
int checkOptimalVectorWidth ( const int * vectorWidths ,
2014-10-21 18:13:15 +08:00
InputArray src1 , InputArray src2 , InputArray src3 ,
InputArray src4 , InputArray src5 , InputArray src6 ,
InputArray src7 , InputArray src8 , InputArray src9 ,
OclVectorStrategy strat )
{
2014-10-20 21:43:18 +08:00
CV_Assert ( vectorWidths ) ;
2014-10-21 18:13:15 +08:00
int ref_type = src1 . type ( ) ;
2014-03-08 05:29:27 +08:00
std : : vector < size_t > offsets , steps , cols ;
2014-07-10 00:06:54 +08:00
std : : vector < int > dividers , kercns ;
2014-03-08 05:29:27 +08:00
PROCESS_SRC ( src1 ) ;
PROCESS_SRC ( src2 ) ;
PROCESS_SRC ( src3 ) ;
PROCESS_SRC ( src4 ) ;
PROCESS_SRC ( src5 ) ;
PROCESS_SRC ( src6 ) ;
PROCESS_SRC ( src7 ) ;
PROCESS_SRC ( src8 ) ;
PROCESS_SRC ( src9 ) ;
size_t size = offsets . size ( ) ;
for ( size_t i = 0 ; i < size ; + + i )
2014-07-10 00:06:54 +08:00
while ( offsets [ i ] % dividers [ i ] ! = 0 | | steps [ i ] % dividers [ i ] ! = 0 | | cols [ i ] % kercns [ i ] ! = 0 )
dividers [ i ] > > = 1 , kercns [ i ] > > = 1 ;
2014-03-08 05:29:27 +08:00
// default strategy
2014-07-10 00:06:54 +08:00
int kercn = * std : : min_element ( kercns . begin ( ) , kercns . end ( ) ) ;
2014-03-08 05:29:27 +08:00
2014-05-21 22:12:26 +08:00
return kercn ;
2014-03-08 05:29:27 +08:00
}
2014-07-10 00:06:54 +08:00
int predictOptimalVectorWidthMax ( InputArray src1 , InputArray src2 , InputArray src3 ,
InputArray src4 , InputArray src5 , InputArray src6 ,
InputArray src7 , InputArray src8 , InputArray src9 )
{
return predictOptimalVectorWidth ( src1 , src2 , src3 , src4 , src5 , src6 , src7 , src8 , src9 , OCL_VECTOR_MAX ) ;
}
2014-03-08 05:29:27 +08:00
# undef PROCESS_SRC
2014-02-26 23:02:36 +08:00
// TODO Make this as a method of OpenCL "BuildOptions" class
void buildOptionsAddMatrixDescription ( String & buildOptions , const String & name , InputArray _m )
{
if ( ! buildOptions . empty ( ) )
buildOptions + = " " ;
int type = _m . type ( ) , depth = CV_MAT_DEPTH ( type ) ;
buildOptions + = format (
" -D %s_T=%s -D %s_T1=%s -D %s_CN=%d -D %s_TSIZE=%d -D %s_T1SIZE=%d -D %s_DEPTH=%d " ,
name . c_str ( ) , ocl : : typeToStr ( type ) ,
name . c_str ( ) , ocl : : typeToStr ( CV_MAKE_TYPE ( depth , 1 ) ) ,
name . c_str ( ) , ( int ) CV_MAT_CN ( type ) ,
name . c_str ( ) , ( int ) CV_ELEM_SIZE ( type ) ,
name . c_str ( ) , ( int ) CV_ELEM_SIZE1 ( type ) ,
name . c_str ( ) , ( int ) depth
) ;
}
2014-01-16 15:57:57 +08:00
struct Image2D : : Impl
{
2014-04-15 07:09:17 +08:00
Impl ( const UMat & src , bool norm , bool alias )
2014-01-16 15:57:57 +08:00
{
2014-01-16 19:44:35 +08:00
handle = 0 ;
refcount = 1 ;
2014-04-15 07:09:17 +08:00
init ( src , norm , alias ) ;
2014-01-16 15:57:57 +08:00
}
2014-02-01 19:07:03 +08:00
2014-01-16 15:57:57 +08:00
~ Impl ( )
{
if ( handle )
clReleaseMemObject ( handle ) ;
}
2014-02-01 19:07:03 +08:00
2014-04-15 07:09:17 +08:00
static cl_image_format getImageFormat ( int depth , int cn , bool norm )
2014-01-16 15:57:57 +08:00
{
cl_image_format format ;
2014-02-01 19:07:03 +08:00
static const int channelTypes [ ] = { CL_UNSIGNED_INT8 , CL_SIGNED_INT8 , CL_UNSIGNED_INT16 ,
2021-06-21 11:46:32 +08:00
CL_SIGNED_INT16 , CL_SIGNED_INT32 , CL_FLOAT , - 1 , CL_HALF_FLOAT } ;
2014-04-15 07:09:17 +08:00
static const int channelTypesNorm [ ] = { CL_UNORM_INT8 , CL_SNORM_INT8 , CL_UNORM_INT16 ,
CL_SNORM_INT16 , - 1 , - 1 , - 1 , - 1 } ;
2021-06-21 11:46:32 +08:00
// CL_RGB has no mappings to OpenCV types because CL_RGB can only be used with
// CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, or CL_UNORM_INT_101010.
static const int channelOrders [ ] = { - 1 , CL_R , CL_RG , /*CL_RGB*/ - 1 , CL_RGBA } ;
2014-02-01 19:07:03 +08:00
2014-04-15 07:09:17 +08:00
int channelType = norm ? channelTypesNorm [ depth ] : channelTypes [ depth ] ;
int channelOrder = channelOrders [ cn ] ;
2014-02-01 19:07:03 +08:00
format . image_channel_data_type = ( cl_channel_type ) channelType ;
format . image_channel_order = ( cl_channel_order ) channelOrder ;
2014-04-15 07:09:17 +08:00
return format ;
}
static bool isFormatSupported ( cl_image_format format )
{
2014-10-23 19:23:37 +08:00
if ( ! haveOpenCL ( ) )
CV_Error ( Error : : OpenCLApiCallError , " OpenCL runtime not found! " ) ;
2014-04-15 07:09:17 +08:00
cl_context context = ( cl_context ) Context : : getDefault ( ) . ptr ( ) ;
2020-08-14 02:33:18 +08:00
if ( ! context )
return false ;
2014-04-15 07:09:17 +08:00
// Figure out how many formats are supported by this context.
cl_uint numFormats = 0 ;
cl_int err = clGetSupportedImageFormats ( context , CL_MEM_READ_WRITE ,
CL_MEM_OBJECT_IMAGE2D , numFormats ,
NULL , & numFormats ) ;
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( err , " clGetSupportedImageFormats(CL_MEM_OBJECT_IMAGE2D, NULL) " ) ;
2020-03-03 20:16:32 +08:00
if ( numFormats > 0 )
{
AutoBuffer < cl_image_format > formats ( numFormats ) ;
err = clGetSupportedImageFormats ( context , CL_MEM_READ_WRITE ,
CL_MEM_OBJECT_IMAGE2D , numFormats ,
formats . data ( ) , NULL ) ;
CV_OCL_DBG_CHECK_RESULT ( err , " clGetSupportedImageFormats(CL_MEM_OBJECT_IMAGE2D, formats) " ) ;
for ( cl_uint i = 0 ; i < numFormats ; + + i )
2014-04-15 07:09:17 +08:00
{
2020-03-03 20:16:32 +08:00
if ( ! memcmp ( & formats [ i ] , & format , sizeof ( format ) ) )
{
return true ;
}
2014-04-15 07:09:17 +08:00
}
}
return false ;
}
void init ( const UMat & src , bool norm , bool alias )
{
2014-10-24 18:55:16 +08:00
if ( ! haveOpenCL ( ) )
CV_Error ( Error : : OpenCLApiCallError , " OpenCL runtime not found! " ) ;
CV_Assert ( ! src . empty ( ) ) ;
2014-04-15 07:09:17 +08:00
CV_Assert ( ocl : : Device : : getDefault ( ) . imageSupport ( ) ) ;
int err , depth = src . depth ( ) , cn = src . channels ( ) ;
CV_Assert ( cn < = 4 ) ;
cl_image_format format = getImageFormat ( depth , cn , norm ) ;
if ( ! isFormatSupported ( format ) )
CV_Error ( Error : : OpenCLApiCallError , " Image format is not supported " ) ;
2014-02-01 19:07:03 +08:00
2014-10-24 18:55:16 +08:00
if ( alias & & ! src . handle ( ACCESS_RW ) )
CV_Error ( Error : : OpenCLApiCallError , " Incorrect UMat, handle is null " ) ;
2014-02-01 00:23:01 +08:00
cl_context context = ( cl_context ) Context : : getDefault ( ) . ptr ( ) ;
2014-02-01 19:07:03 +08:00
cl_command_queue queue = ( cl_command_queue ) Queue : : getDefault ( ) . ptr ( ) ;
2014-01-16 15:57:57 +08:00
# ifdef CL_VERSION_1_2
2014-02-01 19:07:03 +08:00
// this enables backwards portability to
// run on OpenCL 1.1 platform if library binaries are compiled with OpenCL 1.2 support
const Device & d = ocl : : Device : : getDefault ( ) ;
int minor = d . deviceVersionMinor ( ) , major = d . deviceVersionMajor ( ) ;
2014-04-15 07:09:17 +08:00
CV_Assert ( ! alias | | canCreateAlias ( src ) ) ;
2014-02-01 19:07:03 +08:00
if ( 1 < major | | ( 1 = = major & & 2 < = minor ) )
2014-01-16 15:57:57 +08:00
{
cl_image_desc desc ;
desc . image_type = CL_MEM_OBJECT_IMAGE2D ;
desc . image_width = src . cols ;
desc . image_height = src . rows ;
desc . image_depth = 0 ;
desc . image_array_size = 1 ;
2014-04-15 07:09:17 +08:00
desc . image_row_pitch = alias ? src . step [ 0 ] : 0 ;
2014-01-16 15:57:57 +08:00
desc . image_slice_pitch = 0 ;
2014-04-15 07:09:17 +08:00
desc . buffer = alias ? ( cl_mem ) src . handle ( ACCESS_RW ) : 0 ;
2014-01-16 15:57:57 +08:00
desc . num_mip_levels = 0 ;
desc . num_samples = 0 ;
2014-02-01 19:07:03 +08:00
handle = clCreateImage ( context , CL_MEM_READ_WRITE , & format , & desc , NULL , & err ) ;
2014-01-16 15:57:57 +08:00
}
else
# endif
{
2014-04-10 21:19:02 +08:00
CV_SUPPRESS_DEPRECATED_START
2014-04-15 07:09:17 +08:00
CV_Assert ( ! alias ) ; // This is an OpenCL 1.2 extension
2014-02-01 19:07:03 +08:00
handle = clCreateImage2D ( context , CL_MEM_READ_WRITE , & format , src . cols , src . rows , 0 , NULL , & err ) ;
2014-04-10 21:19:02 +08:00
CV_SUPPRESS_DEPRECATED_END
2014-01-16 15:57:57 +08:00
}
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK_RESULT ( err , " clCreateImage() " ) ;
2014-02-01 19:07:03 +08:00
2014-01-16 15:57:57 +08:00
size_t origin [ ] = { 0 , 0 , 0 } ;
2014-06-05 23:14:56 +08:00
size_t region [ ] = { static_cast < size_t > ( src . cols ) , static_cast < size_t > ( src . rows ) , 1 } ;
2014-01-16 15:57:57 +08:00
cl_mem devData ;
2014-04-15 07:09:17 +08:00
if ( ! alias & & ! src . isContinuous ( ) )
2014-01-16 15:57:57 +08:00
{
2014-02-01 19:07:03 +08:00
devData = clCreateBuffer ( context , CL_MEM_READ_ONLY , src . cols * src . rows * src . elemSize ( ) , NULL , & err ) ;
2018-04-13 00:28:46 +08:00
CV_OCL_CHECK_RESULT ( err , cv : : format ( " clCreateBuffer(CL_MEM_READ_ONLY, sz=%lld) => %p " ,
( long long int ) ( src . cols * src . rows * src . elemSize ( ) ) , ( void * ) devData
) . c_str ( ) ) ;
2014-02-01 19:07:03 +08:00
2014-06-05 23:14:56 +08:00
const size_t roi [ 3 ] = { static_cast < size_t > ( src . cols ) * src . elemSize ( ) , static_cast < size_t > ( src . rows ) , 1 } ;
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueCopyBufferRect ( queue , ( cl_mem ) src . handle ( ACCESS_READ ) , devData , origin , origin ,
roi , src . step , 0 , src . cols * src . elemSize ( ) , 0 , 0 , NULL , NULL ) ) ;
CV_OCL_DBG_CHECK ( clFlush ( queue ) ) ;
2014-01-16 15:57:57 +08:00
}
else
2014-04-15 07:09:17 +08:00
{
2014-01-16 15:57:57 +08:00
devData = ( cl_mem ) src . handle ( ACCESS_READ ) ;
2014-04-15 07:09:17 +08:00
}
2014-02-01 19:07:03 +08:00
CV_Assert ( devData ! = NULL ) ;
2014-01-16 15:57:57 +08:00
2014-04-15 07:09:17 +08:00
if ( ! alias )
2014-01-16 15:57:57 +08:00
{
2017-11-01 23:18:54 +08:00
CV_OCL_CHECK ( clEnqueueCopyBufferToImage ( queue , devData , handle , 0 , origin , region , 0 , NULL , 0 ) ) ;
2014-04-15 07:09:17 +08:00
if ( ! src . isContinuous ( ) )
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFlush ( queue ) ) ;
CV_OCL_DBG_CHECK ( clReleaseMemObject ( devData ) ) ;
2014-04-15 07:09:17 +08:00
}
2014-01-16 15:57:57 +08:00
}
}
IMPLEMENT_REFCOUNTABLE ( ) ;
cl_mem handle ;
} ;
2021-02-21 01:56:04 +08:00
Image2D : : Image2D ( ) CV_NOEXCEPT
2014-01-16 15:57:57 +08:00
{
p = NULL ;
}
2014-02-01 19:07:03 +08:00
2014-04-15 07:09:17 +08:00
Image2D : : Image2D ( const UMat & src , bool norm , bool alias )
2014-01-16 15:57:57 +08:00
{
2014-04-15 07:09:17 +08:00
p = new Impl ( src , norm , alias ) ;
}
bool Image2D : : canCreateAlias ( const UMat & m )
{
bool ret = false ;
const Device & d = ocl : : Device : : getDefault ( ) ;
2014-10-24 18:55:16 +08:00
if ( d . imageFromBufferSupport ( ) & & ! m . empty ( ) )
2014-04-15 07:09:17 +08:00
{
// This is the required pitch alignment in pixels
uint pitchAlign = d . imagePitchAlignment ( ) ;
if ( pitchAlign & & ! ( m . step % ( pitchAlign * m . elemSize ( ) ) ) )
{
// We don't currently handle the case where the buffer was created
// with CL_MEM_USE_HOST_PTR
if ( ! m . u - > tempUMat ( ) )
{
ret = true ;
}
}
}
return ret ;
}
bool Image2D : : isFormatSupported ( int depth , int cn , bool norm )
{
cl_image_format format = Impl : : getImageFormat ( depth , cn , norm ) ;
return Impl : : isFormatSupported ( format ) ;
2014-01-16 15:57:57 +08:00
}
2014-02-01 19:07:03 +08:00
Image2D : : Image2D ( const Image2D & i )
{
p = i . p ;
if ( p )
p - > addref ( ) ;
}
Image2D & Image2D : : operator = ( const Image2D & i )
{
if ( i . p ! = p )
{
if ( i . p )
i . p - > addref ( ) ;
if ( p )
p - > release ( ) ;
p = i . p ;
}
return * this ;
}
2021-02-21 01:56:04 +08:00
Image2D : : Image2D ( Image2D & & i ) CV_NOEXCEPT
{
p = i . p ;
i . p = nullptr ;
}
Image2D & Image2D : : operator = ( Image2D & & i ) CV_NOEXCEPT
{
if ( this ! = & i ) {
if ( p )
p - > release ( ) ;
p = i . p ;
i . p = nullptr ;
}
return * this ;
}
2014-01-16 15:57:57 +08:00
Image2D : : ~ Image2D ( )
{
if ( p )
p - > release ( ) ;
}
void * Image2D : : ptr ( ) const
{
return p ? p - > handle : 0 ;
}
2016-12-04 07:19:38 +08:00
bool internal : : isOpenCLForced ( )
{
static bool initialized = false ;
static bool value = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
value = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_FORCE " , false ) ;
2016-12-04 07:19:38 +08:00
initialized = true ;
}
return value ;
}
2015-01-02 08:33:40 +08:00
bool internal : : isPerformanceCheckBypassed ( )
2014-06-19 19:18:52 +08:00
{
static bool initialized = false ;
static bool value = false ;
if ( ! initialized )
{
2017-05-25 23:59:01 +08:00
value = utils : : getConfigurationParameterBool ( " OPENCV_OPENCL_PERF_CHECK_BYPASS " , false ) ;
2014-06-19 19:18:52 +08:00
initialized = true ;
}
return value ;
}
2015-01-02 08:33:40 +08:00
bool internal : : isCLBuffer ( UMat & u )
{
void * h = u . handle ( ACCESS_RW ) ;
if ( ! h )
return true ;
CV_DbgAssert ( u . u - > currAllocator = = getOpenCLAllocator ( ) ) ;
# if 1
if ( ( u . u - > allocatorFlags_ & 0xffff0000 ) ! = 0 ) // OpenCL SVM flags are stored here
return false ;
# else
cl_mem_object_type type = 0 ;
cl_int ret = clGetMemObjectInfo ( ( cl_mem ) h , CL_MEM_TYPE , sizeof ( type ) , & type , NULL ) ;
if ( ret ! = CL_SUCCESS | | type ! = CL_MEM_OBJECT_BUFFER )
return false ;
# endif
return true ;
}
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
struct Timer : : Impl
{
const Queue queue ;
Impl ( const Queue & q )
: queue ( q )
{
}
2017-10-02 19:22:28 +08:00
~ Impl ( ) { }
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
void start ( )
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( ( cl_command_queue ) queue . ptr ( ) ) ) ;
2017-10-02 19:22:28 +08:00
timer . start ( ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
void stop ( )
{
2017-11-01 23:18:54 +08:00
CV_OCL_DBG_CHECK ( clFinish ( ( cl_command_queue ) queue . ptr ( ) ) ) ;
2017-10-02 19:22:28 +08:00
timer . stop ( ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
2017-10-18 20:59:16 +08:00
uint64 durationNS ( ) const
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
{
2017-10-18 20:59:16 +08:00
return ( uint64 ) ( timer . getTimeSec ( ) * 1e9 ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
2017-10-02 19:22:28 +08:00
TickMeter timer ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
} ;
2017-10-18 20:59:16 +08:00
Timer : : Timer ( const Queue & q ) : p ( new Impl ( q ) ) { }
Timer : : ~ Timer ( ) { delete p ; }
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
void Timer : : start ( )
{
2017-10-18 20:59:16 +08:00
CV_Assert ( p ) ;
p - > start ( ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
void Timer : : stop ( )
{
2017-10-18 20:59:16 +08:00
CV_Assert ( p ) ;
p - > stop ( ) ;
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
}
2017-10-18 20:59:16 +08:00
uint64 Timer : : durationNS ( ) const
{
CV_Assert ( p ) ;
return p - > durationNS ( ) ;
}
Merge pull request #9114 from pengli:dnn_rebase
add libdnn acceleration to dnn module (#9114)
* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>
* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>
* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>
* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>
* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>
* set OPENCL target when it is available
and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>
* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>
* Fixed tensorflow demo bug.
Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.
libdnn don't calculate output dimension anymore and just use one
passed in by config.
* split gemm ocl file
split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>
* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>
* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>
* update softmax layer
on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>
* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>
* update output message
Signed-off-by: Li Peng <peng.li@intel.com>
* update fully connected layer
fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ReLU OCL implementation
* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>
* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
* libdnn: update license and copyrights
Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* DNN: Don't link OpenCL library explicitly
* DNN: Make default preferableTarget to DNN_TARGET_CPU
User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.
Also don't fusion when using DNN_TARGET_OPENCL
* DNN: refine coding style
* Add getOpenCLErrorString
* DNN: Use int32_t/uint32_t instread of alias
* Use namespace ocl4dnn to include libdnn things
* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>
* Add prefer target property for layer class
It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>
* Add cl_event based timer for cv::ocl
* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* use UMat for ocl4dnn internal buffer
Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>
* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>
* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>
* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>
* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>
* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>
* remove buffer gemm of inner product layer
call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>
* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine auto-tuning mechanism.
- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
for fine-tuned kernel configuration.
e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
the cache directory will be /home/tmp/spatialkernels/ on Linux.
- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
auto-tuning.
- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
for OpenCL command queue. This fix basic kernel get wrong running
time, i.e. 0ms.
- If creating cache directory failed, disable auto-tuning.
* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>
* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix redundant swizzleWeights calling when use cached kernel config.
* Fix "out of resource" bug when auto-tuning too many kernels.
* replace cl_mem with UMat in ocl4dnnConvSpatial class
* OCL4DNN: reduce the tuning kernel candidate.
This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* remove weight_image_ of ocl4dnn inner product
Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>
* Various fixes for ocl4dnn
1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>
* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>
* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>
* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>
* Handle program compilation fail properly.
* Use std::numeric_limits<float>::infinity() for large float number
* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>
* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>
* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>
* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>
* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>
* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>
* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>
* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>
* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>
* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>
* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(build): don't use C++11, OPENCL_LIBRARIES fix
* dnn(ocl4dnn): remove unused OpenCL kernels
* dnn(ocl4dnn): extract OpenCL code into .cl files
* dnn(ocl4dnn): refine auto-tuning
Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.
Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50
If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.
* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups
* dnn(ocl4dnn): fix perf test
OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.
* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): fix bias bug for gemm like kernel
* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): Refine signature of kernel config
- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.
* dnn(ocl4dnn): swap width/height in configuration
* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only
* core: make configuration helper functions accessible from non-core modules
* dnn(ocl4dnn): update kernel auto-tuning behavior
Avoid unwanted creation of directories
* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash
* dnn(ocl4dnn): remove redundant code
* dnn(ocl4dnn): Add more clear message for simd size dismatch.
* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel
* dnn(ocl4dnn): drop unused tuneLocalSize()
* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method
* dnn(ocl4dnn): sanitize file names used for cache
* dnn(perf): enable Network tests with OpenCL
* dnn(ocl4dnn/conv): drop computeGlobalSize()
* dnn(ocl4dnn/conv): drop unused fields
* dnn(ocl4dnn/conv): simplify ctor
* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL
* dnn(ocl4dnn/conv): drop unsupported double / untested half types
* dnn(ocl4dnn/conv): drop unused variable
* dnn(ocl4dnn/conv): alignSize/divUp
* dnn(ocl4dnn/conv): use enum values
* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>
* dnn(ocl4dnn): add an generic function to check cl option support
* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>
2017-10-02 20:38:00 +08:00
2017-10-18 20:59:16 +08:00
} } // namespace
2020-08-31 17:30:06 +08:00
# endif // HAVE_OPENCL