Merge pull request #1750 from alalek:ocl_update_documentation

This commit is contained in:
Andrey Pavlenko 2013-11-06 13:32:00 +04:00 committed by OpenCV Buildbot
commit 43c9157220
6 changed files with 221 additions and 219 deletions

View File

@ -86,8 +86,6 @@ Enables the stereo correspondence operator that finds the disparity for the spec
:param disparity: Output disparity map. It is a ``CV_8UC1`` image with the same size as the input images.
:param stream: Stream for the asynchronous version.
ocl::StereoBM_OCL::checkIfGpuCallReasonable
-----------------------------------------------
@ -218,8 +216,6 @@ Enables the stereo correspondence operator that finds the disparity for the spec
:param disparity: Output disparity map. If ``disparity`` is empty, the output type is ``CV_16SC1`` . Otherwise, the type is retained.
:param stream: Stream for the asynchronous version.
ocl::StereoConstantSpaceBP
------------------------------
.. ocv:class:: ocl::StereoConstantSpaceBP
@ -330,5 +326,3 @@ Enables the stereo correspondence operator that finds the disparity for the spec
:param right: Right image with the same size and the same type as the left one.
:param disparity: Output disparity map. If ``disparity`` is empty, the output type is ``CV_16SC1`` . Otherwise, the output type is ``disparity.type()`` .
:param stream: Stream for the asynchronous version.

View File

@ -37,7 +37,7 @@ OpenCV C++ 1-D or 2-D dense array class ::
oclMat &operator = (const oclMat &m);
//! assignment operator. Perfom blocking upload to device.
oclMat &operator = (const Mat &m);
oclMat &operator = (const oclMatExpr& expr);
//! pefroms blocking upload data to oclMat.
void upload(const cv::Mat &m);
@ -47,6 +47,11 @@ OpenCV C++ 1-D or 2-D dense array class ::
operator Mat() const;
void download(cv::Mat &m) const;
//! convert to _InputArray
operator _InputArray();
//! convert to _OutputArray
operator _OutputArray();
//! returns a new oclMatrix header for the specified row
oclMat row(int y) const;
@ -61,24 +66,20 @@ OpenCV C++ 1-D or 2-D dense array class ::
//! returns deep copy of the oclMatrix, i.e. the data is copied
oclMat clone() const;
//! copies the oclMatrix content to "m".
//! copies those oclMatrix elements to "m" that are marked with non-zero mask elements.
// It calls m.create(this->size(), this->type()).
// It supports any data type
void copyTo( oclMat &m ) const;
//! copies those oclMatrix elements to "m" that are marked with non-zero mask elements.
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
void copyTo( oclMat &m, const oclMat &mask ) const;
void copyTo( oclMat &m, const oclMat &mask = oclMat()) const;
//! converts oclMatrix to another datatype with optional scalng. See cvConvertScale.
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
void convertTo( oclMat &m, int rtype, double alpha = 1, double beta = 0 ) const;
void assignTo( oclMat &m, int type = -1 ) const;
//! sets every oclMatrix element to s
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
oclMat& operator = (const Scalar &s);
//! sets some of the oclMatrix elements to s, according to the mask
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
oclMat& setTo(const Scalar &s, const oclMat &mask = oclMat());
//! creates alternative oclMatrix header for the same data, with different
// number of channels and/or different number of rows. see cvReshape.
@ -88,6 +89,11 @@ OpenCV C++ 1-D or 2-D dense array class ::
// previous data is unreferenced if needed.
void create(int rows, int cols, int type);
void create(Size size, int type);
//! allocates new oclMatrix with specified device memory type.
void createEx(int rows, int cols, int type, DevMemRW rw_type, DevMemType mem_type);
void createEx(Size size, int type, DevMemRW rw_type, DevMemType mem_type);
//! decreases reference counter;
// deallocate the data when reference counter reaches 0.
void release();
@ -104,6 +110,11 @@ OpenCV C++ 1-D or 2-D dense array class ::
oclMat operator()( Range rowRange, Range colRange ) const;
oclMat operator()( const Rect &roi ) const;
oclMat& operator+=( const oclMat& m );
oclMat& operator-=( const oclMat& m );
oclMat& operator*=( const oclMat& m );
oclMat& operator/=( const oclMat& m );
//! returns true if the oclMatrix data is continuous
// (i.e. when there are no gaps between successive rows).
// similar to CV_IS_oclMat_CONT(cvoclMat->type)
@ -176,14 +187,11 @@ OpenCV C++ 1-D or 2-D dense array class ::
int wholecols;
};
Basically speaking, the oclMat is the mirror of Mat with the extension of ocl feature, the members have the same meaning and useage of Mat except following:
Basically speaking, the ``oclMat`` is the mirror of ``Mat`` with the extension of OCL feature, the members have the same meaning and useage of ``Mat`` except following:
datastart and dataend are replaced with wholerows and wholecols
* ``datastart`` and ``dataend`` are replaced with ``wholerows`` and ``wholecols``
add clCxt for oclMat
* Only basic flags are supported in ``oclMat`` (i.e. depth number of channels)
Only basic flags are supported in oclMat(i.e. depth number of channels)
All the 3-channel matrix(i.e. RGB image) are represented by 4-channel matrix in oclMat. It means 3-channel image have 4-channel space with the last channel unused. We provide a transparent interface to handle the difference between OpenCV Mat and oclMat.
For example: If a oclMat has 3 channels, channels() returns 3 and oclchannels() returns 4
* All the 3-channel matrix (i.e. RGB image) are represented by 4-channel matrix in ``oclMat``. It means 3-channel image have 4-channel space with the last channel unused. We provide a transparent interface to handle the difference between OpenCV ``Mat`` and ``oclMat``.
For example: If a ``oclMat`` has 3 channels, ``channels()`` returns 3 and ``oclchannels()`` returns 4

View File

@ -146,7 +146,7 @@ Returns void
.. ocv:function:: void ocl::remap(const oclMat &src, oclMat &dst, oclMat &map1, oclMat &map2, int interpolation, int bordertype, const Scalar &value = Scalar())
:param src: Source image. Only CV_8UC1 and CV_32FC1 images are supported now.
:param src: Source image.
:param dst: Destination image containing cornerness values. It has the same size as src and CV_32FC1 type.
@ -156,11 +156,11 @@ Returns void
:param interpolation: The interpolation method
:param bordertype: Pixel extrapolation method. Only BORDER_CONSTANT are supported now.
:param bordertype: Pixel extrapolation method.
:param value: The border value if borderType==BORDER CONSTANT
The function remap transforms the source image using the specified map: dst (x ,y) = src (map1(x , y) , map2(x , y)) where values of pixels with non-integer coordinates are computed using one of available interpolation methods. map1 and map2 can be encoded as separate floating-point maps in map1 and map2 respectively, or interleaved floating-point maps of (x,y) in map1. Supports CV_8UC1, CV_8UC3, CV_8UC4, CV_32FC1 , CV_32FC3 and CV_32FC4 data types.
The function remap transforms the source image using the specified map: dst (x ,y) = src (map1(x , y) , map2(x , y)) where values of pixels with non-integer coordinates are computed using one of available interpolation methods. map1 and map2 can be encoded as separate floating-point maps in map1 and map2 respectively, or interleaved floating-point maps of (x,y) in map1.
ocl::resize
------------------
@ -250,7 +250,7 @@ Returns Threshold value
:param type: Thresholding type
The function applies fixed-level thresholding to a single-channel array. The function is typically used to get a bi-level (binary) image out of a grayscale image or for removing a noise, i.e. filtering out pixels with too small or too large values. There are several types of thresholding that the function supports that are determined by thresholdType. Supports only CV_32FC1 and CV_8UC1 data type.
The function applies fixed-level thresholding to a single-channel array. The function is typically used to get a bi-level (binary) image out of a grayscale image or for removing a noise, i.e. filtering out pixels with too small or too large values. There are several types of thresholding that the function supports that are determined by thresholdType.
ocl::buildWarpPlaneMaps
-----------------------

View File

@ -6,53 +6,68 @@ OpenCL Module Introduction
General Information
-------------------
The OpenCV OCL module contains a set of classes and functions that implement and accelerate select openCV functionality on OpenCL compatible devices. OpenCL is a Khronos standard, implemented by a variety of devices (CPUs, GPUs, FPGAs, ARM), abstracting the exact hardware details, while enabling vendors to provide native implementation for maximal acceleration on their hardware. The standard enjoys wide industry support, and the end user of the module will enjoy the data parallelism benefits that the specific platform/hardware may be capable of, in a platform/hardware independent manner.
The OpenCV OCL module contains a set of classes and functions that implement and accelerate OpenCV functionality on OpenCL compatible devices. OpenCL is a Khronos standard, implemented by a variety of devices (CPUs, GPUs, FPGAs, ARM), abstracting the exact hardware details, while enabling vendors to provide native implementation for maximal acceleration on their hardware. The standard enjoys wide industry support, and the end user of the module will enjoy the data parallelism benefits that the specific platform/hardware may be capable of, in a platform/hardware independent manner.
While in the future we hope to validate (and enable) the OCL module in all OpenCL capable devices, we currently develop and test on GPU devices only. This includes both discrete GPUs (NVidia, AMD), as well as integrated chips(AMD APU and intel HD devices). Performance of any particular algorithm will depend on the particular platform characteristics and capabilities. However, currently (as of 2.4.4), accuracy and mathematical correctness has been verified to be identical to that of the pure CPU implementation on all tested GPU devices and platforms (both windows and linux).
While in the future we hope to validate (and enable) the OCL module in all OpenCL capable devices, we currently develop and test on GPU devices only. This includes both discrete GPUs (NVidia, AMD), as well as integrated chips (AMD APU and Intel HD devices). Performance of any particular algorithm will depend on the particular platform characteristics and capabilities. However, currently, accuracy and mathematical correctness has been verified to be identical to that of the pure CPU implementation on all tested GPU devices and platforms (both Windows and Linux).
The OpenCV OCL module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advangtage of OCL whereas the high-level functionality (samples)includes some state-of-the-art algorithms (including LK Optical flow, and Face detection) ready to be used by the application developers. The module is also accompanied by an extensive performance and accuracy test suite.
The OpenCV OCL module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of OCL, whereas the high-level functionality (samples) includes some state-of-the-art algorithms (including LK Optical flow, and Face detection) ready to be used by the application developers. The module is also accompanied by an extensive performance and accuracy test suite.
The OpenCV OCL module is designed for ease of use and does not require any knowledge of OpenCL. At a minimuml level, it can be viewed as a set of accelerators, that can take advantage of the high compute throughput that GPU/APU devices can provide. However, it can also be viewed as a starting point to really integratethe built-in functionality with your own custom OpenCL kernels, with or without modifying the source of OpenCV-OCL. Of course, knowledge of OpenCL will certainly help, however we hope that OpenCV-OCL module, and the kernels it contains in source code, can be very useful as a means of actually learning openCL. Such a knowledge would be necessary to further fine-tune any of the existing OpenCL kernels, or for extending the framework with new kernels. As of OpenCV 2.4.4, we introduce interoperability with OpenCL, enabling easy use of custom OpenCL kernels within the OpenCV framework.
The OpenCV OCL module is designed for ease of use and does not require any knowledge of OpenCL. At a minimum level, it can be viewed as a set of accelerators, that can take advantage of the high compute throughput that GPU/APU devices can provide. However, it can also be viewed as a starting point to really integrate the built-in functionality with your own custom OpenCL kernels, with or without modifying the source of OpenCV-OCL. Of course, knowledge of OpenCL will certainly help, however we hope that OpenCV-OCL module, and the kernels it contains in source code, can be very useful as a means of actually learning openCL. Such a knowledge would be necessary to further fine-tune any of the existing OpenCL kernels, or for extending the framework with new kernels. As of OpenCV 2.4.4, we introduce interoperability with OpenCL, enabling easy use of custom OpenCL kernels within the OpenCV framework.
To use the OCL module, you need to make sure that you have the OpenCL SDK provided with your device vendor. To correctly run the OCL module, you need to have the OpenCL runtime provide by the device vendor, typically the device driver.
To correctly run the OCL module, you need to have the OpenCL runtime provided by the device vendor, typically the device driver.
To enable OCL support, configure OpenCV using CMake with WITH\_OPENCL=ON. When the flag is set and if OpenCL SDK is installed, the full-featured OpenCV OCL module is built. Otherwise, the module may be not built. If you have AMD'S FFT and BLAS library, you can select it with WITH\_OPENCLAMDFFT=ON, WITH\_OPENCLAMDBLAS=ON.
To enable OCL support, configure OpenCV using CMake with ``WITH_OPENCL=ON``. When the flag is set and if OpenCL SDK is installed, the full-featured OpenCV OCL module is built. Otherwise, the module may be not built. If you have AMD'S FFT and BLAS library, you can select it with ``WITH_OPENCLAMDFFT=ON``, ``WITH_OPENCLAMDBLAS=ON``.
The ocl module can be found under the "modules" directory. In "modules/ocl/src" you can find the source code for the cpp class that wrap around the direct kernel invocation. The kernels themselves can be found in "modules/ocl/src/kernels." Samples can be found under "samples/ocl." Accuracy tests can be found in "modules/ocl/test," and performance tests under "module/ocl/perf."
The ocl module can be found under the "modules" directory. In "modules/ocl/src" you can find the source code for the cpp class that wrap around the direct kernel invocation. The kernels themselves can be found in "modules/ocl/src/opencl". Samples can be found under "samples/ocl". Accuracy tests can be found in "modules/ocl/test", and performance tests under "module/ocl/perf".
Right now, the user can select OpenCL device by specifying the environment variable ``OPENCV_OPENCL_DEVICE``. Variable format:
Right now, the user should define the cv::ocl::Info class in the application and call cv::ocl::getDevice before any cv::ocl::func. This operation initialize OpenCL runtime and set the first found device as computing device. If there are more than one device and you want to use undefault device, you can call cv::ocl::setDevice then.
.. code-block:: cpp
In the current version, all the thread share the same context and device so the multi-devices are not supported. We will add this feature soon. If a function support 4-channel operator, it should support 3-channel operator as well, because All the 3-channel matrix(i.e. RGB image) are represented by 4-channel matrix in oclMat. It means 3-channel image have 4-channel space with the last channel unused. We provide a transparent interface to handle the difference between OpenCV Mat and oclMat.
<Platform>:<CPU|GPU|ACCELERATOR|nothing=GPU/CPU>:<DeviceName or ID>
**Note:** Device ID range is: 0..9 (only one digit, 10 - it is a part of name)
Samples:
.. code-block:: cpp
'' = ':' = '::' = ':GPU|CPU:'
'AMD:GPU|CPU:'
'AMD::Tahiti'
':GPU:1'
':CPU:2'
Also the user can use ``cv::ocl::setDevice`` function (with ``cv::ocl::getOpenCLPlatforms`` and ``cv::ocl::getOpenCLDevices``). This function initializes OpenCL runtime and setup the passed device as computing device.
In the current version, all the thread share the same context and device so the multi-devices are not supported. We will add this feature soon. If a function support 4-channel operator, it should support 3-channel operator as well, because All the 3-channel matrix(i.e. RGB image) are represented by 4-channel matrix in ``oclMat``. It means 3-channel image have 4-channel space with the last channel unused. We provide a transparent interface to handle the difference between OpenCV Mat and ``oclMat``.
Developer Notes
-------------------
In a heterogeneous device environment, there may be cost associated with data transfer. This would be the case, for example, when data needs to be moved from host memory (accessible to the CPU), to device memory (accessible to a discrete GPU). in the case of integrated graphics chips, there may be performance issues, relating to memory coherency between access from the GPU "part" of the integrated device, or the CPU "part." For best performance, in either case, it is recommended that you do not introduce dat transfers between CPU and the discrete GPU, except in the beginning and the end of the algorithmic pipeline.
In a heterogeneous device environment, there may be cost associated with data transfer. This would be the case, for example, when data needs to be moved from host memory (accessible to the CPU), to device memory (accessible to a discrete GPU). in the case of integrated graphics chips, there may be performance issues, relating to memory coherency between access from the GPU "part" of the integrated device, or the CPU "part." For best performance, in either case, it is recommended that you do not introduce data transfers between CPU and the discrete GPU, except in the beginning and the end of the algorithmic pipeline.
Some tidbits:
1. OpenCL version should be larger than 1.1 with FULL PROFILE.
2. Currently (2.4.4) the user call the cv::ocl::getDevice before any other function in the ocl module. This will initialize the OpenCL runtime and set the first found device as computing device. If there are more than one device and you want to use undefault device, you can call cv::ocl::setDevice thereafter.
2. Currently there's only one OpenCL context and command queue. We hope to implement multi device and multi queue support in the future.
3. Many kernels use 256 as its workgroup size if possible, so the max work group size of the device must larger than 256. All GPU devices we are aware of indeed support 256 workitems in a workgroup, however non GPU devices may not. This will be improved in the future.
4. If the device does not support double arithetic, we revert to float.
4. If the device does not support double arithmetic, then functions' implementation generates an error.
5. The oclMat uses buffer object, not image object.
5. The ``oclMat`` uses buffer object, not image object.
6. All the 3-channel matrices(i.e. RGB image) are represented by 4-channel matrices in oclMat, with the last channel unused. We provide a transparent interface to handle the difference between OpenCV Mat and oclMat.
6. All the 3-channel matrices (i.e. RGB image) are represented by 4-channel matrices in ``oclMat``, with the last channel unused. We provide a transparent interface to handle the difference between OpenCV Mat and ``oclMat``.
7. All the matrix in oclMat is aligned in column(now the alignment factor is 32 byte). It means, if a matrix is n columns m rows with the element size x byte, we will assign ALIGNMENT(x*n) bytes for each column with the last ALIGNMENT(x*n) - x*n bytes unused, so there's small holes after each column if its size is not the multiply of ALIGN.
7. All the matrix in ``oclMat`` is aligned in column (now the alignment factor for ``step`` is 32+ byte). It means, m.cols * m.elemSize() <= m.step.
8. Data transfer between Mat and oclMat: If the CPU matrix is aligned in column, we will use faster API to transfer between Mat and oclMat, otherwise, we will use clEnqueueRead/WriteBufferRect to transfer data to guarantee the alignment. 3-channel matrix is an exception, it's directly transferred to a temp buffer and then padded to 4-channel matrix(also aligned) when uploading and do the reverse operation when downloading.
8. Data transfer between Mat and ``oclMat``: If the CPU matrix is aligned in column, we will use faster API to transfer between Mat and ``oclMat``, otherwise, we will use clEnqueueRead/WriteBufferRect to transfer data to guarantee the alignment. 3-channel matrix is an exception, it's directly transferred to a temp buffer and then padded to 4-channel matrix(also aligned) when uploading and do the reverse operation when downloading.
9. Data transfer between Mat and oclMat: ROI is a feature of OpenCV, which allow users process a sub rectangle of a matrix. When a CPU matrix which has ROI will be transfered to GPU, the whole matrix will be transfered and set ROI as CPU's. In a word, we always transfer the whole matrix despite whether it has ROI or not.
9. Data transfer between Mat and ``oclMat``: ROI is a feature of OpenCV, which allow users process a sub rectangle of a matrix. When a CPU matrix which has ROI will be transfered to GPU, the whole matrix will be transfered and set ROI as CPU's. In a word, we always transfer the whole matrix despite whether it has ROI or not.
10. All the kernel file should locate in ocl/src/kernels/ with the extension ".cl". ALL the kernel files are transformed to pure characters at compilation time in kernels.cpp, and the file name without extension is the name of the characters.
10. All the kernel file should locate in "modules/ocl/src/opencl/" with the extension ".cl". All the kernel files are transformed to pure characters at compilation time in opencl_kernels.cpp, and the file name without extension is the name of the program sources.

View File

@ -117,7 +117,6 @@ Computes a dense optical flow using the Gunnar Farneback's algorithm.
:param frame1: Second 8-bit gray-scale input image
:param flowx: Flow horizontal component
:param flowy: Flow vertical component
:param s: Stream
.. seealso:: :ocv:func:`calcOpticalFlowFarneback`
@ -230,8 +229,6 @@ Interpolates frames (images) using provided optical flow (displacement field).
:param buf: Temporary buffer, will have width x 6*height size, CV_32FC1 type and contain 6 oclMat: occlusion masks for first frame, occlusion masks for second, interpolated forward horizontal flow, interpolated forward vertical flow, interpolated backward horizontal flow, interpolated backward vertical flow.
:param stream: Stream for the asynchronous version.
ocl::KalmanFilter
--------------------
.. ocv:class:: ocl::KalmanFilter
@ -418,8 +415,6 @@ Updates the background model and returns the foreground mask.
:param fgmask: The output foreground mask as an 8-bit binary image.
:param stream: Stream for the asynchronous version.
ocl::MOG::getBackgroundImage
--------------------------------
@ -429,8 +424,6 @@ Computes a background image.
:param backgroundImage: The output background image.
:param stream: Stream for the asynchronous version.
ocl::MOG::release
---------------------
@ -443,7 +436,9 @@ ocl::MOG2
-------------
.. ocv:class:: ocl::MOG2 : public ocl::BackgroundSubtractor
Gaussian Mixture-based Background/Foreground Segmentation Algorithm. ::
Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
The class discriminates between foreground and background pixels by building and maintaining a model of the background. Any pixel which does not fit this model is then deemed to be foreground. The class implements algorithm described in [MOG2004]_. ::
class CV_EXPORTS MOG2: public cv::ocl::BackgroundSubtractor
{
@ -485,10 +480,6 @@ Gaussian Mixture-based Background/Foreground Segmentation Algorithm. ::
/* hidden */
};
The class discriminates between foreground and background pixels by building and maintaining a model of the background. Any pixel which does not fit this model is then deemed to be foreground. The class implements algorithm described in [MOG2004]_.
Here are important members of the class that control the algorithm, which you can set after constructing the class instance:
.. ocv:member:: float backgroundRatio
Threshold defining whether the component is significant enough to be included into the background model. ``cf=0.1 => TB=0.9`` is default. For ``alpha=0.001``, it means that the mode should exist for approximately 105 frames before it is considered foreground.
@ -525,6 +516,7 @@ Gaussian Mixture-based Background/Foreground Segmentation Algorithm. ::
Parameter defining whether shadow detection should be enabled.
.. seealso:: :ocv:class:`BackgroundSubtractorMOG2`
@ -549,8 +541,6 @@ Updates the background model and returns the foreground mask.
:param fgmask: The output foreground mask as an 8-bit binary image.
:param stream: Stream for the asynchronous version.
ocl::MOG2::getBackgroundImage
---------------------------------
@ -560,8 +550,6 @@ Computes a background image.
:param backgroundImage: The output background image.
:param stream: Stream for the asynchronous version.
ocl::MOG2::release
----------------------

View File

@ -308,16 +308,13 @@ namespace cv
void copyTo( oclMat &m, const oclMat &mask = oclMat()) const;
//! converts oclMatrix to another datatype with optional scalng. See cvConvertScale.
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
void convertTo( oclMat &m, int rtype, double alpha = 1, double beta = 0 ) const;
void assignTo( oclMat &m, int type = -1 ) const;
//! sets every oclMatrix element to s
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
oclMat& operator = (const Scalar &s);
//! sets some of the oclMatrix elements to s, according to the mask
//It supports 8UC1 8UC4 32SC1 32SC4 32FC1 32FC4
oclMat& setTo(const Scalar &s, const oclMat &mask = oclMat());
//! creates alternative oclMatrix header for the same data, with different
// number of channels and/or different number of rows. see cvReshape.