mirror of
https://github.com/opencv/opencv.git
synced 2024-11-24 11:10:21 +08:00
copyediting on python object detection tutorial
clarify some passages, fix grammar errors
This commit is contained in:
parent
21c8e6d02d
commit
db0a159229
@ -20,65 +20,64 @@ other images.
|
|||||||
|
|
||||||
Here we will work with face detection. Initially, the algorithm needs a lot of positive images
|
Here we will work with face detection. Initially, the algorithm needs a lot of positive images
|
||||||
(images of faces) and negative images (images without faces) to train the classifier. Then we need
|
(images of faces) and negative images (images without faces) to train the classifier. Then we need
|
||||||
to extract features from it. For this, haar features shown in below image are used. They are just
|
to extract features from it. For this, Haar features shown in the below image are used. They are just
|
||||||
like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels
|
like our convolutional kernel. Each feature is a single value obtained by subtracting sum of pixels
|
||||||
under white rectangle from sum of pixels under black rectangle.
|
under the white rectangle from sum of pixels under the black rectangle.
|
||||||
|
|
||||||
![image](images/haar_features.jpg)
|
![image](images/haar_features.jpg)
|
||||||
|
|
||||||
Now all possible sizes and locations of each kernel is used to calculate plenty of features. (Just
|
Now, all possible sizes and locations of each kernel are used to calculate lots of features. (Just
|
||||||
imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each
|
imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each
|
||||||
feature calculation, we need to find sum of pixels under white and black rectangles. To solve this,
|
feature calculation, we need to find the sum of the pixels under white and black rectangles. To solve
|
||||||
they introduced the integral images. It simplifies calculation of sum of pixels, how large may be
|
this, they introduced the integral image. However large your image, it reduces the calculations for a
|
||||||
the number of pixels, to an operation involving just four pixels. Nice, isn't it? It makes things
|
given pixel to an operation involving just four pixels. Nice, isn't it? It makes things super-fast.
|
||||||
super-fast.
|
|
||||||
|
|
||||||
But among all these features we calculated, most of them are irrelevant. For example, consider the
|
But among all these features we calculated, most of them are irrelevant. For example, consider the
|
||||||
image below. Top row shows two good features. The first feature selected seems to focus on the
|
image below. The top row shows two good features. The first feature selected seems to focus on the
|
||||||
property that the region of the eyes is often darker than the region of the nose and cheeks. The
|
property that the region of the eyes is often darker than the region of the nose and cheeks. The
|
||||||
second feature selected relies on the property that the eyes are darker than the bridge of the nose.
|
second feature selected relies on the property that the eyes are darker than the bridge of the nose.
|
||||||
But the same windows applying on cheeks or any other place is irrelevant. So how do we select the
|
But the same windows applied to cheeks or any other place is irrelevant. So how do we select the
|
||||||
best features out of 160000+ features? It is achieved by **Adaboost**.
|
best features out of 160000+ features? It is achieved by **Adaboost**.
|
||||||
|
|
||||||
![image](images/haar.png)
|
![image](images/haar.png)
|
||||||
|
|
||||||
For this, we apply each and every feature on all the training images. For each feature, it finds the
|
For this, we apply each and every feature on all the training images. For each feature, it finds the
|
||||||
best threshold which will classify the faces to positive and negative. But obviously, there will be
|
best threshold which will classify the faces to positive and negative. Obviously, there will be
|
||||||
errors or misclassifications. We select the features with minimum error rate, which means they are
|
errors or misclassifications. We select the features with minimum error rate, which means they are
|
||||||
the features that best classifies the face and non-face images. (The process is not as simple as
|
the features that most accurately classify the face and non-face images. (The process is not as simple as
|
||||||
this. Each image is given an equal weight in the beginning. After each classification, weights of
|
this. Each image is given an equal weight in the beginning. After each classification, weights of
|
||||||
misclassified images are increased. Then again same process is done. New error rates are calculated.
|
misclassified images are increased. Then the same process is done. New error rates are calculated.
|
||||||
Also new weights. The process is continued until required accuracy or error rate is achieved or
|
Also new weights. The process is continued until the required accuracy or error rate is achieved or
|
||||||
required number of features are found).
|
the required number of features are found).
|
||||||
|
|
||||||
Final classifier is a weighted sum of these weak classifiers. It is called weak because it alone
|
The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone
|
||||||
can't classify the image, but together with others forms a strong classifier. The paper says even
|
can't classify the image, but together with others forms a strong classifier. The paper says even
|
||||||
200 features provide detection with 95% accuracy. Their final setup had around 6000 features.
|
200 features provide detection with 95% accuracy. Their final setup had around 6000 features.
|
||||||
(Imagine a reduction from 160000+ features to 6000 features. That is a big gain).
|
(Imagine a reduction from 160000+ features to 6000 features. That is a big gain).
|
||||||
|
|
||||||
So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or
|
So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or
|
||||||
not. Wow.. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. Authors have a good
|
not. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. The authors have a good
|
||||||
solution for that.
|
solution for that.
|
||||||
|
|
||||||
In an image, most of the image region is non-face region. So it is a better idea to have a simple
|
In an image, most of the image is non-face region. So it is a better idea to have a simple
|
||||||
method to check if a window is not a face region. If it is not, discard it in a single shot. Don't
|
method to check if a window is not a face region. If it is not, discard it in a single shot, and don't
|
||||||
process it again. Instead focus on region where there can be a face. This way, we can find more time
|
process it again. Instead, focus on regions where there can be a face. This way, we spend more time
|
||||||
to check a possible face region.
|
checking possible face regions.
|
||||||
|
|
||||||
For this they introduced the concept of **Cascade of Classifiers**. Instead of applying all the 6000
|
For this they introduced the concept of **Cascade of Classifiers**. Instead of applying all 6000
|
||||||
features on a window, group the features into different stages of classifiers and apply one-by-one.
|
features on a window, the features are grouped into different stages of classifiers and applied one-by-one.
|
||||||
(Normally first few stages will contain very less number of features). If a window fails the first
|
(Normally the first few stages will contain very many fewer features). If a window fails the first
|
||||||
stage, discard it. We don't consider remaining features on it. If it passes, apply the second stage
|
stage, discard it. We don't consider the remaining features on it. If it passes, apply the second stage
|
||||||
of features and continue the process. The window which passes all stages is a face region. How is
|
of features and continue the process. The window which passes all stages is a face region. How is
|
||||||
the plan !!!
|
that plan!
|
||||||
|
|
||||||
Authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five
|
The authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in the first five
|
||||||
stages. (Two features in the above image is actually obtained as the best two features from
|
stages. (The two features in the above image are actually obtained as the best two features from
|
||||||
Adaboost). According to authors, on an average, 10 features out of 6000+ are evaluated per
|
Adaboost). According to the authors, on average 10 features out of 6000+ are evaluated per
|
||||||
sub-window.
|
sub-window.
|
||||||
|
|
||||||
So this is a simple intuitive explanation of how Viola-Jones face detection works. Read paper for
|
So this is a simple intuitive explanation of how Viola-Jones face detection works. Read the paper for
|
||||||
more details or check out the references in Additional Resources section.
|
more details or check out the references in the Additional Resources section.
|
||||||
|
|
||||||
Haar-cascade Detection in OpenCV
|
Haar-cascade Detection in OpenCV
|
||||||
--------------------------------
|
--------------------------------
|
||||||
@ -88,8 +87,8 @@ object like car, planes etc. you can use OpenCV to create one. Its full details
|
|||||||
[Cascade Classifier Training](@ref tutorial_traincascade).
|
[Cascade Classifier Training](@ref tutorial_traincascade).
|
||||||
|
|
||||||
Here we will deal with detection. OpenCV already contains many pre-trained classifiers for face,
|
Here we will deal with detection. OpenCV already contains many pre-trained classifiers for face,
|
||||||
eyes, smile etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let's create face
|
eyes, smiles, etc. Those XML files are stored in the opencv/data/haarcascades/ folder. Let's create a
|
||||||
and eye detector with OpenCV.
|
face and eye detector with OpenCV.
|
||||||
|
|
||||||
First we need to load the required XML classifiers. Then load our input image (or video) in
|
First we need to load the required XML classifiers. Then load our input image (or video) in
|
||||||
grayscale mode.
|
grayscale mode.
|
||||||
|
Loading…
Reference in New Issue
Block a user