mirror of
https://github.com/opencv/opencv.git
synced 2025-06-07 17:44:04 +08:00
Add OpenCV parallel_for_ tutorial.
This commit is contained in:
parent
23e53a32e5
commit
e16e141c38
@ -0,0 +1,183 @@
|
|||||||
|
How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_}
|
||||||
|
==================================================================
|
||||||
|
|
||||||
|
Goal
|
||||||
|
----
|
||||||
|
|
||||||
|
The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
|
||||||
|
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
|
||||||
|
exploiting almost all the CPU load available.
|
||||||
|
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||||
|
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
|
||||||
|
to remain simple.
|
||||||
|
|
||||||
|
Precondition
|
||||||
|
----
|
||||||
|
|
||||||
|
The first precondition is to have OpenCV built with a parallel framework.
|
||||||
|
In OpenCV 3.2, the following parallel frameworks are available in that order:
|
||||||
|
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
|
||||||
|
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
|
||||||
|
3. OpenMP (integrated to compiler, should be explicitly enabled)
|
||||||
|
4. APPLE GCD (system wide, used automatically (APPLE only))
|
||||||
|
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
|
||||||
|
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
|
||||||
|
7. Pthreads (if available)
|
||||||
|
|
||||||
|
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
|
||||||
|
are third party libraries and have to be explictly built and enabled in CMake (e.g. TBB, C=), others are
|
||||||
|
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
|
||||||
|
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
|
||||||
|
|
||||||
|
The second (weak) precondition is more related to the task you want to achieve as not all computations
|
||||||
|
are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be splitted
|
||||||
|
into multiple elementary operations with no memory dependency (no possible race condition) are easily
|
||||||
|
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
|
||||||
|
one pixel does not depend to the state of other pixels.
|
||||||
|
|
||||||
|
Simple example: drawing a Mandelbrot set
|
||||||
|
----
|
||||||
|
|
||||||
|
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
|
||||||
|
the code to parallize the computation.
|
||||||
|
|
||||||
|
Theory
|
||||||
|
-----------
|
||||||
|
|
||||||
|
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
|
||||||
|
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
|
||||||
|
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
|
||||||
|
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
|
||||||
|
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
|
||||||
|
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
|
||||||
|
|
||||||
|
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
|
||||||
|
> of the quadratic map
|
||||||
|
> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f]
|
||||||
|
> remains bounded.
|
||||||
|
> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying
|
||||||
|
> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets.
|
||||||
|
> This can also be represented as
|
||||||
|
> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f]
|
||||||
|
|
||||||
|
Pseudocode
|
||||||
|
-----------
|
||||||
|
|
||||||
|
A simple algorithm to generate a representation of the Mandelbrot set is called the
|
||||||
|
["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm).
|
||||||
|
For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not
|
||||||
|
under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas
|
||||||
|
we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will
|
||||||
|
produce a more detailed image but the computation time will increase accordingly. We use the number of iterations
|
||||||
|
needed to "escape" to depict the pixel value in the image.
|
||||||
|
|
||||||
|
```
|
||||||
|
For each pixel (Px, Py) on the screen, do:
|
||||||
|
{
|
||||||
|
x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
|
||||||
|
y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
|
||||||
|
x = 0.0
|
||||||
|
y = 0.0
|
||||||
|
iteration = 0
|
||||||
|
max_iteration = 1000
|
||||||
|
while (x*x + y*y < 2*2 AND iteration < max_iteration) {
|
||||||
|
xtemp = x*x - y*y + x0
|
||||||
|
y = 2*x*y + y0
|
||||||
|
x = xtemp
|
||||||
|
iteration = iteration + 1
|
||||||
|
}
|
||||||
|
color = palette[iteration]
|
||||||
|
plot(Px, Py, color)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
To relate between the pseudocode and the theory, we have:
|
||||||
|
* \f$ z = x + iy \f$
|
||||||
|
* \f$ z^2 = x^2 + i2xy - y^2 \f$
|
||||||
|
* \f$ c = x_0 + iy_0 \f$
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis.
|
||||||
|
You can see that the whole shape can be repeatedly visible if we zoom at particular locations.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Escape time algorithm implementation
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm
|
||||||
|
|
||||||
|
Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a
|
||||||
|
complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.
|
||||||
|
|
||||||
|
Sequential Mandelbrot implementation
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential
|
||||||
|
|
||||||
|
In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the
|
||||||
|
pixel is likely to belong to the Mandelbrot set or not.
|
||||||
|
|
||||||
|
Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation
|
||||||
|
|
||||||
|
Finally, to assign the grayscale value to the pixels, we use the following rule:
|
||||||
|
* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
|
||||||
|
* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value
|
||||||
|
|
||||||
|
Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost
|
||||||
|
the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his
|
||||||
|
[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)):
|
||||||
|
\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation
|
||||||
|
and you can observe how the lowest values will be boosted when looking at the slope at these positions.
|
||||||
|
|
||||||
|
Parallel Mandelbrot implementation
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the
|
||||||
|
computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern
|
||||||
|
processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework.
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel
|
||||||
|
|
||||||
|
The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the
|
||||||
|
`virtual void operator ()(const cv::Range& range) const`.
|
||||||
|
|
||||||
|
The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread.
|
||||||
|
This splitting is done automatically to distribuate equally the computation load. We have to convert the pixel index coordinate
|
||||||
|
to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place
|
||||||
|
the image.
|
||||||
|
|
||||||
|
The parallel execution is called with:
|
||||||
|
|
||||||
|
@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call
|
||||||
|
|
||||||
|
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
|
||||||
|
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
|
||||||
|
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
|
||||||
|
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
|
||||||
|
workload only on two threads.
|
||||||
|
|
||||||
|
Results
|
||||||
|
-----------
|
||||||
|
|
||||||
|
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
|
||||||
|
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
|
||||||
|
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
|
||||||
|
Main reasons should be mostly due to:
|
||||||
|
* the overhead to create and manage the threads,
|
||||||
|
* background processes running in parallel,
|
||||||
|
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
|
||||||
|
|
||||||
|
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
|
||||||
|
depending on the escaped iteration and using a color palette to get more aesthetic images):
|
||||||
|

|
Binary file not shown.
After Width: | Height: | Size: 16 KiB |
Binary file not shown.
After Width: | Height: | Size: 62 KiB |
Binary file not shown.
After Width: | Height: | Size: 33 KiB |
@ -106,3 +106,10 @@ understanding how to manipulate the images on a pixel level.
|
|||||||
*Author:* Elena Gvozdeva
|
*Author:* Elena Gvozdeva
|
||||||
|
|
||||||
You will see how to use the IPP Async with OpenCV.
|
You will see how to use the IPP Async with OpenCV.
|
||||||
|
|
||||||
|
|
||||||
|
- @subpage tutorial_how_to_use_OpenCV_parallel_for_
|
||||||
|
|
||||||
|
*Compatibility:* \>= OpenCV 2.4.3
|
||||||
|
|
||||||
|
You will see how to use the OpenCV parallel_for_ to easily parallelize your code.
|
||||||
|
@ -0,0 +1,122 @@
|
|||||||
|
#include <iostream>
|
||||||
|
#include <opencv2/core.hpp>
|
||||||
|
#include <opencv2/imgcodecs.hpp>
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
using namespace cv;
|
||||||
|
|
||||||
|
namespace
|
||||||
|
{
|
||||||
|
//! [mandelbrot-escape-time-algorithm]
|
||||||
|
int mandelbrot(const complex<float> &z0, const int max)
|
||||||
|
{
|
||||||
|
complex<float> z = z0;
|
||||||
|
for (int t = 0; t < max; t++)
|
||||||
|
{
|
||||||
|
if (z.real()*z.real() + z.imag()*z.imag() > 4.0f) return t;
|
||||||
|
z = z*z + z0;
|
||||||
|
}
|
||||||
|
|
||||||
|
return max;
|
||||||
|
}
|
||||||
|
//! [mandelbrot-escape-time-algorithm]
|
||||||
|
|
||||||
|
//! [mandelbrot-grayscale-value]
|
||||||
|
int mandelbrotFormula(const complex<float> &z0, const int maxIter=500) {
|
||||||
|
int value = mandelbrot(z0, maxIter);
|
||||||
|
if(maxIter - value == 0)
|
||||||
|
{
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
return cvRound(sqrt(value / (float) maxIter) * 255);
|
||||||
|
}
|
||||||
|
//! [mandelbrot-grayscale-value]
|
||||||
|
|
||||||
|
//! [mandelbrot-parallel]
|
||||||
|
class ParallelMandelbrot : public ParallelLoopBody
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
ParallelMandelbrot (Mat &img, const float x1, const float y1, const float scaleX, const float scaleY)
|
||||||
|
: m_img(img), m_x1(x1), m_y1(y1), m_scaleX(scaleX), m_scaleY(scaleY)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void operator ()(const Range& range) const
|
||||||
|
{
|
||||||
|
for (int r = range.start; r < range.end; r++)
|
||||||
|
{
|
||||||
|
int i = r / m_img.cols;
|
||||||
|
int j = r % m_img.cols;
|
||||||
|
|
||||||
|
float x0 = j / m_scaleX + m_x1;
|
||||||
|
float y0 = i / m_scaleY + m_y1;
|
||||||
|
|
||||||
|
complex<float> z0(x0, y0);
|
||||||
|
uchar value = (uchar) mandelbrotFormula(z0);
|
||||||
|
m_img.ptr<uchar>(i)[j] = value;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ParallelMandelbrot& operator=(const ParallelMandelbrot &) {
|
||||||
|
return *this;
|
||||||
|
};
|
||||||
|
|
||||||
|
private:
|
||||||
|
Mat &m_img;
|
||||||
|
float m_x1;
|
||||||
|
float m_y1;
|
||||||
|
float m_scaleX;
|
||||||
|
float m_scaleY;
|
||||||
|
};
|
||||||
|
//! [mandelbrot-parallel]
|
||||||
|
|
||||||
|
//! [mandelbrot-sequential]
|
||||||
|
void sequentialMandelbrot(Mat &img, const float x1, const float y1, const float scaleX, const float scaleY)
|
||||||
|
{
|
||||||
|
for (int i = 0; i < img.rows; i++)
|
||||||
|
{
|
||||||
|
for (int j = 0; j < img.cols; j++)
|
||||||
|
{
|
||||||
|
float x0 = j / scaleX + x1;
|
||||||
|
float y0 = i / scaleY + y1;
|
||||||
|
|
||||||
|
complex<float> z0(x0, y0);
|
||||||
|
uchar value = (uchar) mandelbrotFormula(z0);
|
||||||
|
img.ptr<uchar>(i)[j] = value;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
//! [mandelbrot-sequential]
|
||||||
|
}
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
//! [mandelbrot-transformation]
|
||||||
|
Mat mandelbrotImg(4800, 5400, CV_8U);
|
||||||
|
float x1 = -2.1f, x2 = 0.6f;
|
||||||
|
float y1 = -1.2f, y2 = 1.2f;
|
||||||
|
float scaleX = mandelbrotImg.cols / (x2 - x1);
|
||||||
|
float scaleY = mandelbrotImg.rows / (y2 - y1);
|
||||||
|
//! [mandelbrot-transformation]
|
||||||
|
|
||||||
|
double t1 = (double) getTickCount();
|
||||||
|
//! [mandelbrot-parallel-call]
|
||||||
|
ParallelMandelbrot parallelMandelbrot(mandelbrotImg, x1, y1, scaleX, scaleY);
|
||||||
|
parallel_for_(Range(0, mandelbrotImg.rows*mandelbrotImg.cols), parallelMandelbrot);
|
||||||
|
//! [mandelbrot-parallel-call]
|
||||||
|
t1 = ((double) getTickCount() - t1) / getTickFrequency();
|
||||||
|
cout << "Parallel Mandelbrot: " << t1 << " s" << endl;
|
||||||
|
|
||||||
|
Mat mandelbrotImgSequential(4800, 5400, CV_8U);
|
||||||
|
double t2 = (double) getTickCount();
|
||||||
|
sequentialMandelbrot(mandelbrotImgSequential, x1, y1, scaleX, scaleY);
|
||||||
|
t2 = ((double) getTickCount() - t2) / getTickFrequency();
|
||||||
|
cout << "Sequential Mandelbrot: " << t2 << " s" << endl;
|
||||||
|
cout << "Speed-up: " << t2/t1 << " X" << endl;
|
||||||
|
|
||||||
|
imwrite("Mandelbrot_parallel.png", mandelbrotImg);
|
||||||
|
imwrite("Mandelbrot_sequential.png", mandelbrotImgSequential);
|
||||||
|
|
||||||
|
return EXIT_SUCCESS;
|
||||||
|
}
|
Loading…
Reference in New Issue
Block a user