mirror of
https://github.com/opencv/opencv.git
synced 2024-11-24 03:00:14 +08:00
Merge pull request #17313 from hunter-college-ossd-spr-2020:revise-knn-tutorials
* Revise and expand kNN Python tutorials * Correct NPTEL link
This commit is contained in:
parent
0e1c7eda39
commit
07c56f149f
@ -4,20 +4,20 @@ OCR of Hand-written Data using kNN {#tutorial_py_knn_opencv}
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will use our knowledge on kNN to build a basic OCR application.
|
||||
- We will try with Digits and Alphabets data available that comes with OpenCV.
|
||||
In this chapter:
|
||||
- We will use our knowledge on kNN to build a basic OCR (Optical Character Recognition) application.
|
||||
- We will try our application on Digits and Alphabets data that comes with OpenCV.
|
||||
|
||||
OCR of Hand-written Digits
|
||||
--------------------------
|
||||
|
||||
Our goal is to build an application which can read the handwritten digits. For this we need some
|
||||
train_data and test_data. OpenCV comes with an image digits.png (in the folder
|
||||
Our goal is to build an application which can read handwritten digits. For this we need some
|
||||
training data and some test data. OpenCV comes with an image digits.png (in the folder
|
||||
opencv/samples/data/) which has 5000 handwritten digits (500 for each digit). Each digit is
|
||||
a 20x20 image. So our first step is to split this image into 5000 different digits. For each digit,
|
||||
we flatten it into a single row with 400 pixels. That is our feature set, ie intensity values of all
|
||||
pixels. It is the simplest feature set we can create. We use first 250 samples of each digit as
|
||||
train_data, and next 250 samples as test_data. So let's prepare them first.
|
||||
a 20x20 image. So our first step is to split this image into 5000 different digit images. Then for each digit (20x20 image),
|
||||
we flatten it into a single row with 400 pixels. That is our feature set, i.e. intensity values of all
|
||||
pixels. It is the simplest feature set we can create. We use the first 250 samples of each digit as
|
||||
training data, and the other 250 samples as test data. So let's prepare them first.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
@ -28,10 +28,10 @@ gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
# Now we split the image to 5000 cells, each 20x20 size
|
||||
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
|
||||
|
||||
# Make it into a Numpy array. It size will be (50,100,20,20)
|
||||
# Make it into a Numpy array: its size will be (50,100,20,20)
|
||||
x = np.array(cells)
|
||||
|
||||
# Now we prepare train_data and test_data.
|
||||
# Now we prepare the training data and test data
|
||||
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
|
||||
@ -40,7 +40,7 @@ k = np.arange(10)
|
||||
train_labels = np.repeat(k,250)[:,np.newaxis]
|
||||
test_labels = train_labels.copy()
|
||||
|
||||
# Initiate kNN, train the data, then test it with test data for k=1
|
||||
# Initiate kNN, train it on the training data, then test it with the test data with k=1
|
||||
knn = cv.ml.KNearest_create()
|
||||
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)
|
||||
ret,result,neighbours,dist = knn.findNearest(test,k=5)
|
||||
@ -52,13 +52,15 @@ correct = np.count_nonzero(matches)
|
||||
accuracy = correct*100.0/result.size
|
||||
print( accuracy )
|
||||
@endcode
|
||||
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option
|
||||
improve accuracy is to add more data for training, especially the wrong ones. So instead of finding
|
||||
this training data every time I start application, I better save it, so that next time, I directly
|
||||
read this data from a file and start classification. You can do it with the help of some Numpy
|
||||
functions like np.savetxt, np.savez, np.load etc. Please check their docs for more details.
|
||||
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option to
|
||||
improve accuracy is to add more data for training, especially for the digits where we had more errors.
|
||||
|
||||
Instead of finding
|
||||
this training data every time I start the application, I better save it, so that the next time, I can directly
|
||||
read this data from a file and start classification. This can be done with the help of some Numpy
|
||||
functions like np.savetxt, np.savez, np.load, etc. Please check the NumPy docs for more details.
|
||||
@code{.py}
|
||||
# save the data
|
||||
# Save the data
|
||||
np.savez('knn_data.npz',train=train, train_labels=train_labels)
|
||||
|
||||
# Now load the data
|
||||
@ -71,36 +73,36 @@ In my system, it takes around 4.4 MB of memory. Since we are using intensity val
|
||||
features, it would be better to convert the data to np.uint8 first and then save it. It takes only
|
||||
1.1 MB in this case. Then while loading, you can convert back into float32.
|
||||
|
||||
OCR of English Alphabets
|
||||
OCR of the English Alphabet
|
||||
------------------------
|
||||
|
||||
Next we will do the same for English alphabets, but there is a slight change in data and feature
|
||||
Next we will do the same for the English alphabet, but there is a slight change in data and feature
|
||||
set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in
|
||||
opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look
|
||||
like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers
|
||||
following it are its different features. These features are obtained from [UCI Machine Learning
|
||||
like garbage. Actually, in each row, the first column is a letter which is our label. The next 16 numbers
|
||||
following it are the different features. These features are obtained from the [UCI Machine Learning
|
||||
Repository](http://archive.ics.uci.edu/ml/). You can find the details of these features in [this
|
||||
page](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition).
|
||||
|
||||
There are 20000 samples available, so we take first 10000 data as training samples and remaining
|
||||
10000 as test samples. We should change the alphabets to ascii characters because we can't work with
|
||||
alphabets directly.
|
||||
There are 20000 samples available, so we take the first 10000 as training samples and the remaining
|
||||
10000 as test samples. We should change the letters to ascii characters because we can't work with
|
||||
letters directly.
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
import numpy as np
|
||||
|
||||
# Load the data, converters convert the letter to a number
|
||||
# Load the data and convert the letters to numbers
|
||||
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
|
||||
converters= {0: lambda ch: ord(ch)-ord('A')})
|
||||
|
||||
# split the data to two, 10000 each for train and test
|
||||
# Split the dataset in two, with 10000 samples each for training and test sets
|
||||
train, test = np.vsplit(data,2)
|
||||
|
||||
# split trainData and testData to features and responses
|
||||
# Split trainData and testData into features and responses
|
||||
responses, trainData = np.hsplit(train,[1])
|
||||
labels, testData = np.hsplit(test,[1])
|
||||
|
||||
# Initiate the kNN, classify, measure accuracy.
|
||||
# Initiate the kNN, classify, measure accuracy
|
||||
knn = cv.ml.KNearest_create()
|
||||
knn.train(trainData, cv.ml.ROW_SAMPLE, responses)
|
||||
ret, result, neighbours, dist = knn.findNearest(testData, k=5)
|
||||
@ -110,10 +112,12 @@ accuracy = correct*100.0/10000
|
||||
print( accuracy )
|
||||
@endcode
|
||||
It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add
|
||||
error data in each level.
|
||||
more data.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
1. [Wikipedia article on Optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
1. Here we used k=5. What happens if you try other values of k? Can you find a value that maximizes accuracy (minimizes the number of errors)?
|
@ -4,61 +4,55 @@ Understanding k-Nearest Neighbour {#tutorial_py_knn_understanding}
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will understand the concepts of k-Nearest Neighbour (kNN) algorithm.
|
||||
In this chapter, we will understand the concepts of the k-Nearest Neighbour (kNN) algorithm.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
kNN is one of the simplest of classification algorithms available for supervised learning. The idea
|
||||
is to search for closest match of the test data in feature space. We will look into it with below
|
||||
kNN is one of the simplest classification algorithms available for supervised learning. The idea
|
||||
is to search for the closest match(es) of the test data in the feature space. We will look into it with the below
|
||||
image.
|
||||
|
||||
![image](images/knn_theory.png)
|
||||
|
||||
In the image, there are two families, Blue Squares and Red Triangles. We call each family as
|
||||
**Class**. Their houses are shown in their town map which we call feature space. *(You can consider
|
||||
a feature space as a space where all datas are projected. For example, consider a 2D coordinate
|
||||
space. Each data has two features, x and y coordinates. You can represent this data in your 2D
|
||||
coordinate space, right? Now imagine if there are three features, you need 3D space. Now consider N
|
||||
features, where you need N-dimensional space, right? This N-dimensional space is its feature space.
|
||||
In our image, you can consider it as a 2D case with two features)*.
|
||||
In the image, there are two families: Blue Squares and Red Triangles. We refer to each family as
|
||||
a **Class**. Their houses are shown in their town map which we call the **Feature Space**. You can consider
|
||||
a feature space as a space where all data are projected. For example, consider a 2D coordinate
|
||||
space. Each datum has two features, a x coordinate and a y coordinate. You can represent this datum in your 2D
|
||||
coordinate space, right? Now imagine that there are three features, you will need 3D space. Now consider N
|
||||
features: you need N-dimensional space, right? This N-dimensional space is its feature space.
|
||||
In our image, you can consider it as a 2D case with two features.
|
||||
|
||||
Now a new member comes into the town and creates a new home, which is shown as green circle. He
|
||||
should be added to one of these Blue/Red families. We call that process, **Classification**. What we
|
||||
do? Since we are dealing with kNN, let us apply this algorithm.
|
||||
Now consider what happens if a new member comes into the town and creates a new home, which is shown as the green circle. He
|
||||
should be added to one of these Blue or Red families (or *classes*). We call that process, **Classification**. How exactly should this new member be classified? Since we are dealing with kNN, let us apply the algorithm.
|
||||
|
||||
One method is to check who is his nearest neighbour. From the image, it is clear it is the Red
|
||||
Triangle family. So he is also added into Red Triangle. This method is called simply **Nearest
|
||||
Neighbour**, because classification depends only on the nearest neighbour.
|
||||
One simple method is to check who is his nearest neighbour. From the image, it is clear that it is a member of the Red
|
||||
Triangle family. So he is classified as a Red Triangle. This method is called simply **Nearest Neighbour** classification, because classification depends only on the *nearest neighbour*.
|
||||
|
||||
But there is a problem with that. Red Triangle may be the nearest. But what if there are lot of Blue
|
||||
Squares near to him? Then Blue Squares have more strength in that locality than Red Triangle. So
|
||||
just checking nearest one is not sufficient. Instead we check some k nearest families. Then whoever
|
||||
is majority in them, the new guy belongs to that family. In our image, let's take k=3, ie 3 nearest
|
||||
families. He has two Red and one Blue (there are two Blues equidistant, but since k=3, we take only
|
||||
But there is a problem with this approach! Red Triangle may be the nearest neighbour, but what if there are also a lot of Blue
|
||||
Squares nearby? Then Blue Squares have more strength in that locality than Red Triangles, so
|
||||
just checking the nearest one is not sufficient. Instead we may want to check some **k** nearest families. Then whichever family is the majority amongst them, the new guy should belong to that family. In our image, let's take k=3, i.e. consider the 3 nearest
|
||||
neighbours. The new member has two Red neighbours and one Blue neighbour (there are two Blues equidistant, but since k=3, we can take only
|
||||
one of them), so again he should be added to Red family. But what if we take k=7? Then he has 5 Blue
|
||||
families and 2 Red families. Great!! Now he should be added to Blue family. So it all changes with
|
||||
value of k. More funny thing is, what if k = 4? He has 2 Red and 2 Blue neighbours. It is a tie !!!
|
||||
So better take k as an odd number. So this method is called **k-Nearest Neighbour** since
|
||||
classification depends on k nearest neighbours.
|
||||
neighbours and 2 Red neighbours and should be added to the Blue family. The result will vary with the selected
|
||||
value of k. Note that if k is not an odd number, we can get a tie, as would happen in the above case with k=4. We would see that our new member has 2 Red and 2 Blue neighbours as his four nearest neighbours and we would need to choose a method for breaking the tie to perform classification. So to reiterate, this method is called **k-Nearest Neighbour** since
|
||||
classification depends on the *k nearest neighbours*.
|
||||
|
||||
Again, in kNN, it is true we are considering k neighbours, but we are giving equal importance to
|
||||
all, right? Is it justice? For example, take the case of k=4. We told it is a tie. But see, the 2
|
||||
Red families are more closer to him than the other 2 Blue families. So he is more eligible to be
|
||||
added to Red. So how do we mathematically explain that? We give some weights to each family
|
||||
depending on their distance to the new-comer. For those who are near to him get higher weights while
|
||||
those are far away get lower weights. Then we add total weights of each family separately. Whoever
|
||||
gets highest total weights, new-comer goes to that family. This is called **modified kNN**.
|
||||
all, right? Is this justified? For example, take the tied case of k=4. As we can see, the 2
|
||||
Red neighbours are actually closer to the new member than the other 2 Blue neighbours, so he is more eligible to be
|
||||
added to the Red family. How do we mathematically explain that? We give some weights to each neighbour
|
||||
depending on their distance to the new-comer: those who are nearer to him get higher weights, while
|
||||
those that are farther away get lower weights. Then we add the total weights of each family separately and classify the new-comer as part of whichever family
|
||||
received higher total weights. This is called **modified kNN** or **weighted kNN**.
|
||||
|
||||
So what are some important things you see here?
|
||||
|
||||
- You need to have information about all the houses in town, right? Because, we have to check
|
||||
the distance from new-comer to all the existing houses to find the nearest neighbour. If there
|
||||
are plenty of houses and families, it takes lots of memory, and more time for calculation
|
||||
also.
|
||||
- There is almost zero time for any kind of training or preparation.
|
||||
- Because we have to check
|
||||
the distance from the new-comer to all the existing houses to find the nearest neighbour(s), you need to have information about all of the houses in town, right? If there are plenty of houses and families, it takes a lot of memory, and also more time for calculation.
|
||||
- There is almost zero time for any kind of "training" or preparation. Our "learning" involves only memorizing (storing) the data, before testing and classifying.
|
||||
|
||||
Now let's see it in OpenCV.
|
||||
Now let's see this algorithm at work in OpenCV.
|
||||
|
||||
kNN in OpenCV
|
||||
-------------
|
||||
@ -67,11 +61,11 @@ We will do a simple example here, with two families (classes), just like above.
|
||||
chapter, we will do an even better example.
|
||||
|
||||
So here, we label the Red family as **Class-0** (so denoted by 0) and Blue family as **Class-1**
|
||||
(denoted by 1). We create 25 families or 25 training data, and label them either Class-0 or Class-1.
|
||||
We do all these with the help of Random Number Generator in Numpy.
|
||||
(denoted by 1). We create 25 neighbours or 25 training data, and label each of them as either part of Class-0 or Class-1.
|
||||
We can do this with the help of a Random Number Generator from NumPy.
|
||||
|
||||
Then we plot it with the help of Matplotlib. Red families are shown as Red Triangles and Blue
|
||||
families are shown as Blue Squares.
|
||||
Then we can plot it with the help of Matplotlib. Red neighbours are shown as Red Triangles and Blue
|
||||
neighbours are shown as Blue Squares.
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
import numpy as np
|
||||
@ -80,36 +74,36 @@ import matplotlib.pyplot as plt
|
||||
# Feature set containing (x,y) values of 25 known/training data
|
||||
trainData = np.random.randint(0,100,(25,2)).astype(np.float32)
|
||||
|
||||
# Labels each one either Red or Blue with numbers 0 and 1
|
||||
# Label each one either Red or Blue with numbers 0 and 1
|
||||
responses = np.random.randint(0,2,(25,1)).astype(np.float32)
|
||||
|
||||
# Take Red families and plot them
|
||||
# Take Red neighbours and plot them
|
||||
red = trainData[responses.ravel()==0]
|
||||
plt.scatter(red[:,0],red[:,1],80,'r','^')
|
||||
|
||||
# Take Blue families and plot them
|
||||
# Take Blue neighbours and plot them
|
||||
blue = trainData[responses.ravel()==1]
|
||||
plt.scatter(blue[:,0],blue[:,1],80,'b','s')
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
You will get something similar to our first image. Since you are using random number generator, you
|
||||
will be getting different data each time you run the code.
|
||||
You will get something similar to our first image. Since you are using a random number generator, you
|
||||
will get different data each time you run the code.
|
||||
|
||||
Next initiate the kNN algorithm and pass the trainData and responses to train the kNN (It constructs
|
||||
a search tree).
|
||||
Next initiate the kNN algorithm and pass the trainData and responses to train the kNN. (Underneath the hood, it constructs
|
||||
a search tree: see the Additional Resources section below for more information on this.)
|
||||
|
||||
Then we will bring one new-comer and classify him to a family with the help of kNN in OpenCV. Before
|
||||
going to kNN, we need to know something on our test data (data of new comers). Our data should be a
|
||||
Then we will bring one new-comer and classify him as belonging to a family with the help of kNN in OpenCV. Before
|
||||
running kNN, we need to know something about our test data (data of new comers). Our data should be a
|
||||
floating point array with size \f$number \; of \; testdata \times number \; of \; features\f$. Then we
|
||||
find the nearest neighbours of new-comer. We can specify how many neighbours we want. It returns:
|
||||
find the nearest neighbours of the new-comer. We can specify *k*: how many neighbours we want. (Here we used 3.) It returns:
|
||||
|
||||
-# The label given to new-comer depending upon the kNN theory we saw earlier. If you want Nearest
|
||||
Neighbour algorithm, just specify k=1 where k is the number of neighbours.
|
||||
2. The labels of k-Nearest Neighbours.
|
||||
3. Corresponding distances from new-comer to each nearest neighbour.
|
||||
1. The label given to the new-comer depending upon the kNN theory we saw earlier. If you want the *Nearest
|
||||
Neighbour* algorithm, just specify k=1.
|
||||
2. The labels of the k-Nearest Neighbours.
|
||||
3. The corresponding distances from the new-comer to each nearest neighbour.
|
||||
|
||||
So let's see how it works. New comer is marked in green color.
|
||||
So let's see how it works. The new-comer is marked in green.
|
||||
@code{.py}
|
||||
newcomer = np.random.randint(0,100,(1,2)).astype(np.float32)
|
||||
plt.scatter(newcomer[:,0],newcomer[:,1],80,'g','o')
|
||||
@ -124,21 +118,21 @@ print( "distance: {}\n".format(dist) )
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
I got the result as follows:
|
||||
I got the following results:
|
||||
@code{.py}
|
||||
result: [[ 1.]]
|
||||
neighbours: [[ 1. 1. 1.]]
|
||||
distance: [[ 53. 58. 61.]]
|
||||
@endcode
|
||||
It says our new-comer got 3 neighbours, all from Blue family. Therefore, he is labelled as Blue
|
||||
family. It is obvious from plot below:
|
||||
It says that our new-comer's 3 nearest neighbours are all from the Blue family. Therefore, he is labelled as part of the Blue
|
||||
family. It is obvious from the plot below:
|
||||
|
||||
![image](images/knn_simple.png)
|
||||
|
||||
If you have large number of data, you can just pass it as array. Corresponding results are also
|
||||
If you have multiple new-comers (test data), you can just pass them as an array. Corresponding results are also
|
||||
obtained as arrays.
|
||||
@code{.py}
|
||||
# 10 new comers
|
||||
# 10 new-comers
|
||||
newcomers = np.random.randint(0,100,(10,2)).astype(np.float32)
|
||||
ret, results,neighbours,dist = knn.findNearest(newcomer, 3)
|
||||
# The results also will contain 10 labels.
|
||||
@ -146,8 +140,11 @@ ret, results,neighbours,dist = knn.findNearest(newcomer, 3)
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [NPTEL notes on Pattern Recognition, Chapter
|
||||
11](http://www.nptel.iitm.ac.in/courses/106108057/12)
|
||||
1. [NPTEL notes on Pattern Recognition, Chapter
|
||||
11](https://nptel.ac.in/courses/106/108/106108057/)
|
||||
2. [Wikipedia article on Nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search)
|
||||
3. [Wikipedia article on k-d tree](https://en.wikipedia.org/wiki/K-d_tree)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
1. Try repeating the above with more classes and different choices of k. Does choosing k become harder with more classes in the same 2D feature space?
|
Loading…
Reference in New Issue
Block a user