diff --git a/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.markdown b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.markdown new file mode 100644 index 0000000000..f2a511fc21 --- /dev/null +++ b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.markdown @@ -0,0 +1,183 @@ +How to use the OpenCV parallel_for_ to parallelize your code {#tutorial_how_to_use_OpenCV_parallel_for_} +================================================================== + +Goal +---- + +The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily +parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set +exploiting almost all the CPU load available. +The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). +If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended +to remain simple. + +Precondition +---- + +The first precondition is to have OpenCV built with a parallel framework. +In OpenCV 3.2, the following parallel frameworks are available in that order: +1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled) +2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled) +3. OpenMP (integrated to compiler, should be explicitly enabled) +4. APPLE GCD (system wide, used automatically (APPLE only)) +5. Windows RT concurrency (system wide, used automatically (Windows RT only)) +6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10)) +7. Pthreads (if available) + +As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries +are third party libraries and have to be explictly built and enabled in CMake (e.g. TBB, C=), others are +automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to +have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library. + +The second (weak) precondition is more related to the task you want to achieve as not all computations +are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be splitted +into multiple elementary operations with no memory dependency (no possible race condition) are easily +parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of +one pixel does not depend to the state of other pixels. + +Simple example: drawing a Mandelbrot set +---- + +We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt +the code to parallize the computation. + +Theory +----------- + +The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician +Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a +class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a +Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth +introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set). +Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article). + +> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration +> of the quadratic map +> \f[\begin{cases} z_0 = 0 \\ z_{n+1} = z_n^2 + c \end{cases}\f] +> remains bounded. +> That is, a complex number \f$ c \f$ is part of the Mandelbrot set if, when starting with \f$ z_0 = 0 \f$ and applying +> the iteration repeatedly, the absolute value of \f$ z_n \f$ remains bounded however large \f$ n \f$ gets. +> This can also be represented as +> \f[\limsup_{n\to\infty}|z_{n+1}|\leqslant2\f] + +Pseudocode +----------- + +A simple algorithm to generate a representation of the Mandelbrot set is called the +["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm). +For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not +under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas +we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will +produce a more detailed image but the computation time will increase accordingly. We use the number of iterations +needed to "escape" to depict the pixel value in the image. + +``` +For each pixel (Px, Py) on the screen, do: +{ + x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1)) + y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1)) + x = 0.0 + y = 0.0 + iteration = 0 + max_iteration = 1000 + while (x*x + y*y < 2*2 AND iteration < max_iteration) { + xtemp = x*x - y*y + x0 + y = 2*x*y + y0 + x = xtemp + iteration = iteration + 1 + } + color = palette[iteration] + plot(Px, Py, color) +} +``` + +To relate between the pseudocode and the theory, we have: +* \f$ z = x + iy \f$ +* \f$ z^2 = x^2 + i2xy - y^2 \f$ +* \f$ c = x_0 + iy_0 \f$ + +![](images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png) + +On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis. +You can see that the whole shape can be repeatedly visible if we zoom at particular locations. + +Implementation +----------- + +Escape time algorithm implementation +-------------------------- + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-escape-time-algorithm + +Here, we used the [`std::complex`](http://en.cppreference.com/w/cpp/numeric/complex) template class to represent a +complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration. + +Sequential Mandelbrot implementation +-------------------------- + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-sequential + +In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the +pixel is likely to belong to the Mandelbrot set or not. + +Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with: + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-transformation + +Finally, to assign the grayscale value to the pixels, we use the following rule: +* a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set), +* otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range. + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-grayscale-value + +Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost +the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his +[blog post](http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)): +\f$ f \left( x \right) = \sqrt{\frac{x}{\text{maxIter}}} \times 255 \f$ + +![](images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png) + +The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation +and you can observe how the lowest values will be boosted when looking at the slope at these positions. + +Parallel Mandelbrot implementation +-------------------------- + +When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the +computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern +processor. To achieve this easily, we will use the OpenCV @ref cv::parallel_for_ framework. + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel + +The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the +`virtual void operator ()(const cv::Range& range) const`. + +The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread. +This splitting is done automatically to distribuate equally the computation load. We have to convert the pixel index coordinate +to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place +the image. + +The parallel execution is called with: + +@snippet how_to_use_OpenCV_parallel_for_.cpp mandelbrot-parallel-call + +Here, the range represents the total number of operations to be executed, so the total number of pixels in the image. +To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the +nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)` +or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the +workload only on two threads. + +Results +----------- + +You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). +The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads +CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X. +Main reasons should be mostly due to: +* the overhead to create and manage the threads, +* background processes running in parallel, +* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores. + +The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color +depending on the escaped iteration and using a color palette to get more aesthetic images): +![Mandelbrot set with xMin=-2.1, xMax=0.6, yMin=-1.2, yMax=1.2, maxIterations=500](images/how_to_use_OpenCV_parallel_for_Mandelbrot.png) diff --git a/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png new file mode 100644 index 0000000000..2b63916d77 Binary files /dev/null and b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_640px-Mandelset_hires.png differ diff --git a/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_Mandelbrot.png b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_Mandelbrot.png new file mode 100644 index 0000000000..40eb579e78 Binary files /dev/null and b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_Mandelbrot.png differ diff --git a/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png new file mode 100644 index 0000000000..00b727f866 Binary files /dev/null and b/doc/tutorials/core/how_to_use_OpenCV_parallel_for_/images/how_to_use_OpenCV_parallel_for_sqrt_scale_transformation.png differ diff --git a/doc/tutorials/core/table_of_content_core.markdown b/doc/tutorials/core/table_of_content_core.markdown index 70d9c81a1b..2b9afb8b19 100644 --- a/doc/tutorials/core/table_of_content_core.markdown +++ b/doc/tutorials/core/table_of_content_core.markdown @@ -106,3 +106,10 @@ understanding how to manipulate the images on a pixel level. *Author:* Elena Gvozdeva You will see how to use the IPP Async with OpenCV. + + +- @subpage tutorial_how_to_use_OpenCV_parallel_for_ + + *Compatibility:* \>= OpenCV 2.4.3 + + You will see how to use the OpenCV parallel_for_ to easily parallelize your code. diff --git a/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp b/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp new file mode 100644 index 0000000000..c661b919b3 --- /dev/null +++ b/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp @@ -0,0 +1,122 @@ +#include +#include +#include + +using namespace std; +using namespace cv; + +namespace +{ +//! [mandelbrot-escape-time-algorithm] +int mandelbrot(const complex &z0, const int max) +{ + complex z = z0; + for (int t = 0; t < max; t++) + { + if (z.real()*z.real() + z.imag()*z.imag() > 4.0f) return t; + z = z*z + z0; + } + + return max; +} +//! [mandelbrot-escape-time-algorithm] + +//! [mandelbrot-grayscale-value] +int mandelbrotFormula(const complex &z0, const int maxIter=500) { + int value = mandelbrot(z0, maxIter); + if(maxIter - value == 0) + { + return 0; + } + + return cvRound(sqrt(value / (float) maxIter) * 255); +} +//! [mandelbrot-grayscale-value] + +//! [mandelbrot-parallel] +class ParallelMandelbrot : public ParallelLoopBody +{ +public: + ParallelMandelbrot (Mat &img, const float x1, const float y1, const float scaleX, const float scaleY) + : m_img(img), m_x1(x1), m_y1(y1), m_scaleX(scaleX), m_scaleY(scaleY) + { + } + + virtual void operator ()(const Range& range) const + { + for (int r = range.start; r < range.end; r++) + { + int i = r / m_img.cols; + int j = r % m_img.cols; + + float x0 = j / m_scaleX + m_x1; + float y0 = i / m_scaleY + m_y1; + + complex z0(x0, y0); + uchar value = (uchar) mandelbrotFormula(z0); + m_img.ptr(i)[j] = value; + } + } + + ParallelMandelbrot& operator=(const ParallelMandelbrot &) { + return *this; + }; + +private: + Mat &m_img; + float m_x1; + float m_y1; + float m_scaleX; + float m_scaleY; +}; +//! [mandelbrot-parallel] + +//! [mandelbrot-sequential] +void sequentialMandelbrot(Mat &img, const float x1, const float y1, const float scaleX, const float scaleY) +{ + for (int i = 0; i < img.rows; i++) + { + for (int j = 0; j < img.cols; j++) + { + float x0 = j / scaleX + x1; + float y0 = i / scaleY + y1; + + complex z0(x0, y0); + uchar value = (uchar) mandelbrotFormula(z0); + img.ptr(i)[j] = value; + } + } +} +//! [mandelbrot-sequential] +} + +int main() +{ + //! [mandelbrot-transformation] + Mat mandelbrotImg(4800, 5400, CV_8U); + float x1 = -2.1f, x2 = 0.6f; + float y1 = -1.2f, y2 = 1.2f; + float scaleX = mandelbrotImg.cols / (x2 - x1); + float scaleY = mandelbrotImg.rows / (y2 - y1); + //! [mandelbrot-transformation] + + double t1 = (double) getTickCount(); + //! [mandelbrot-parallel-call] + ParallelMandelbrot parallelMandelbrot(mandelbrotImg, x1, y1, scaleX, scaleY); + parallel_for_(Range(0, mandelbrotImg.rows*mandelbrotImg.cols), parallelMandelbrot); + //! [mandelbrot-parallel-call] + t1 = ((double) getTickCount() - t1) / getTickFrequency(); + cout << "Parallel Mandelbrot: " << t1 << " s" << endl; + + Mat mandelbrotImgSequential(4800, 5400, CV_8U); + double t2 = (double) getTickCount(); + sequentialMandelbrot(mandelbrotImgSequential, x1, y1, scaleX, scaleY); + t2 = ((double) getTickCount() - t2) / getTickFrequency(); + cout << "Sequential Mandelbrot: " << t2 << " s" << endl; + cout << "Speed-up: " << t2/t1 << " X" << endl; + + imwrite("Mandelbrot_parallel.png", mandelbrotImg); + imwrite("Mandelbrot_sequential.png", mandelbrotImgSequential); + + return EXIT_SUCCESS; +}