2011-05-16 03:15:36 +08:00
Expectation Maximization
2011-02-23 04:43:26 +08:00
========================
2011-05-16 03:15:36 +08:00
The EM (Expectation Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
Consider the set of the
:math:`x_1, x_2,...,x_{N}` : N feature vectors?? from a d-dimensional Euclidean space drawn from a Gaussian mixture:
2011-02-23 04:43:26 +08:00
.. math ::
2011-02-26 19:05:10 +08:00
p(x;a_k,S_k, \pi _k) = \sum _{k=1}^{m} \pi _kp_k(x), \quad \pi _k \geq 0, \quad \sum _{k=1}^{m} \pi _k=1,
2011-02-23 04:43:26 +08:00
.. math ::
2011-02-26 19:05:10 +08:00
p_k(x)= \varphi (x;a_k,S_k)= \frac{1}{(2\pi)^{d/2}\mid{S_k}\mid^{1/2}} exp \left \{ - \frac{1}{2} (x-a_k)^TS_k^{-1}(x-a_k) \right \} ,
2011-02-23 04:43:26 +08:00
2011-02-26 19:05:10 +08:00
where
:math: `m` is the number of mixtures,
:math: `p_k` is the normal distribution
density with the mean
:math: `a_k` and covariance matrix
2011-05-16 03:15:36 +08:00
:math: `S_k` ,
:math: `\pi_k` is the weight of the k-th mixture. Given the number of mixtures
2011-02-26 19:05:10 +08:00
:math: `M` and the samples
2011-05-16 03:15:36 +08:00
:math: `x_i` ,
:math: `i=1..N` the algorithm finds the
maximum-likelihood estimates (MLE) of all the mixture parameters,
that is,
:math: `a_k` ,
:math: `S_k` and
2011-02-26 19:05:10 +08:00
:math: `\pi_k` :
2011-02-23 04:43:26 +08:00
.. math ::
2011-02-26 19:05:10 +08:00
L(x, \theta )=logp(x, \theta )= \sum _{i=1}^{N}log \left ( \sum _{k=1}^{m} \pi _kp_k(x) \right ) \to \max _{ \theta \in \Theta },
2011-02-23 04:43:26 +08:00
.. math ::
2011-02-26 19:05:10 +08:00
\Theta = \left \{ (a_k,S_k, \pi _k): a_k \in \mathbbm{R} ^d,S_k=S_k^T>0,S_k \in \mathbbm{R} ^{d \times d}, \pi _k \geq 0, \sum _{k=1}^{m} \pi _k=1 \right \} .
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
The EM algorithm is an iterative procedure. Each iteration includes
two steps. At the first step (Expectation step or E-step), you find a
2011-02-26 19:05:10 +08:00
probability
:math: `p_{i,k}` (denoted
:math: `\alpha_{i,k}` in the formula below) of
sample `` i `` to belong to mixture `` k `` using the currently
2011-02-23 04:43:26 +08:00
available mixture parameter estimates:
.. math ::
2011-02-26 19:05:10 +08:00
\alpha _{ki} = \frac{\pi_k\varphi(x;a_k,S_k)}{\sum\limits_{j=1}^{m}\pi_j\varphi(x;a_j,S_j)} .
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
At the second step (Maximization step or M-step), the mixture parameter estimates are refined using the computed probabilities:
2011-02-23 04:43:26 +08:00
.. math ::
2011-05-16 03:15:36 +08:00
\pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki}, \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} , \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}}
2011-02-23 04:43:26 +08:00
2011-02-26 19:05:10 +08:00
Alternatively, the algorithm may start with the M-step when the initial values for
:math: `p_{i,k}` can be provided. Another alternative when
2011-05-16 03:15:36 +08:00
:math: `p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math: `p_{i,k}` . Often (including ML) the
2011-03-09 06:22:24 +08:00
:ref: `kmeans` algorithm is used for that purpose.
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
One of the main problems?? the EM algorithm should deal with is a large number
of parameters to estimate. The majority of the parameters reside in
2011-02-26 19:05:10 +08:00
covariance matrices, which are
:math: `d \times d` elements each
2011-05-16 03:15:36 +08:00
where
:math: `d` is the feature space dimensionality. However, in
many practical problems, the covariance matrices are close to diagonal
2011-02-26 19:05:10 +08:00
or even to
:math: `\mu_k*I` , where
2011-05-16 03:15:36 +08:00
:math: `I` is an identity matrix and
:math: `\mu_k` is a mixture-dependent "scale" parameter. So, a robust computation
scheme could start with harder constraints on the covariance
2011-02-23 04:43:26 +08:00
matrices and then use the estimated parameters as an input for a less
constrained optimization problem (often a diagonal covariance matrix is
already a good enough approximation).
**References:**
*
2011-05-16 03:15:36 +08:00
Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models* . Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
2011-02-23 04:43:26 +08:00
.. index :: CvEMParams
.. _CvEMParams:
CvEMParams
----------
2011-03-01 05:26:43 +08:00
.. c:type :: CvEMParams
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
Parameters of the EM algorithm ::
2011-02-23 04:43:26 +08:00
struct CvEMParams
{
CvEMParams() : nclusters(10), cov_mat_type(CvEM::COV_MAT_DIAGONAL),
2011-02-26 19:05:10 +08:00
start_step(CvEM::START_AUTO_STEP), probs(0), weights(0), means(0),
2011-02-23 04:43:26 +08:00
covs(0)
{
2011-02-26 19:05:10 +08:00
term_crit=cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS,
2011-02-23 04:43:26 +08:00
100, FLT_EPSILON );
}
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
CvEMParams( int _nclusters, int _cov_mat_type=1/*CvEM::COV_MAT_DIAGONAL* /,
int _start_step=0/*CvEM::START_AUTO_STEP* /,
CvTermCriteria _term_crit=cvTermCriteria(
2011-02-26 19:05:10 +08:00
CV_TERMCRIT_ITER+CV_TERMCRIT_EPS,
2011-02-23 04:43:26 +08:00
100, FLT_EPSILON),
2011-02-26 19:05:10 +08:00
CvMat* _probs=0, CvMat* _weights=0,
2011-02-23 04:43:26 +08:00
CvMat* _means=0, CvMat* * _covs=0 ) :
2011-02-26 19:05:10 +08:00
nclusters(_nclusters), cov_mat_type(_cov_mat_type),
2011-02-23 04:43:26 +08:00
start_step(_start_step),
2011-02-26 19:05:10 +08:00
probs(_probs), weights(_weights), means(_means), covs(_covs),
2011-02-23 04:43:26 +08:00
term_crit(_term_crit)
{}
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
int nclusters;
int cov_mat_type;
int start_step;
const CvMat* probs;
const CvMat* weights;
const CvMat* means;
const CvMat** covs;
CvTermCriteria term_crit;
};
2011-03-03 15:29:55 +08:00
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
The structure has two constructors. The default one represents a rough rule-of-the-thumb. With another one it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
2011-02-23 04:43:26 +08:00
.. index :: CvEM
.. _CvEM:
CvEM
----
2011-03-01 05:26:43 +08:00
.. c:type :: CvEM
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
EM model ::
2011-02-23 04:43:26 +08:00
class CV_EXPORTS CvEM : public CvStatModel
{
public:
// Type of covariance matrices
enum { COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2 };
2011-02-26 19:05:10 +08:00
2011-05-16 03:15:36 +08:00
// Initial step
2011-02-23 04:43:26 +08:00
enum { START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0 };
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
CvEM();
CvEM( const CvMat* samples, const CvMat* sample_idx=0,
CvEMParams params=CvEMParams(), CvMat* labels=0 );
virtual ~CvEM();
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
virtual bool train( const CvMat* samples, const CvMat* sample_idx=0,
CvEMParams params=CvEMParams(), CvMat* labels=0 );
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
virtual float predict( const CvMat* sample, CvMat* probs ) const;
virtual void clear();
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
int get_nclusters() const { return params.nclusters; }
const CvMat* get_means() const { return means; }
const CvMat** get_covs() const { return covs; }
const CvMat* get_weights() const { return weights; }
const CvMat* get_probs() const { return probs; }
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
protected:
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
virtual void set_params( const CvEMParams& params,
const CvVectors& train_data );
virtual void init_em( const CvVectors& train_data );
virtual double run_em( const CvVectors& train_data );
virtual void init_auto( const CvVectors& samples );
virtual void kmeans( const CvVectors& train_data, int nclusters,
CvMat* labels, CvTermCriteria criteria,
const CvMat* means );
CvEMParams params;
double log_likelihood;
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
CvMat* means;
CvMat** covs;
CvMat* weights;
CvMat* probs;
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
CvMat* log_weight_div_det;
CvMat* inv_eigen_values;
CvMat** cov_rotate_mats;
};
2011-03-03 15:29:55 +08:00
2011-02-23 04:43:26 +08:00
.. index :: CvEM::train
.. _CvEM :: train:
CvEM::train
-----------
2011-03-01 05:26:43 +08:00
.. c:function :: void CvEM::train( const CvMat* samples, const CvMat* sample_idx=0, CvEMParams params=CvEMParams(), CvMat* labels=0 )
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
Estimates the Gaussian mixture parameters from a sample set.
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
:math: `p_{i,k}` in `` probs `` ,
:math: `a_k` in `` means `` ,
:math: `S_k` in `` covs[k] `` ,
:math: `\pi_k` in `` weights `` , and optionally computes the output "class label" for each sample:
:math: `\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture for each sample).
2011-02-23 04:43:26 +08:00
2011-05-16 03:15:36 +08:00
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
2011-03-09 06:22:24 +08:00
:ref: `Bayes classifier` .
2011-02-23 04:43:26 +08:00
2011-02-26 19:05:10 +08:00
Example: Clustering random samples of multi-Gaussian distribution using EM ::
2011-02-23 04:43:26 +08:00
#include "ml.h"
#include "highgui.h"
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
int main( int argc, char** argv )
{
const int N = 4;
const int N1 = (int)sqrt((double)N);
const CvScalar colors[] = {{0,0,255}},{{0,255,0}},
{{0,255,255}},{{255,255,0}
;
int i, j;
int nsamples = 100;
CvRNG rng_state = cvRNG(-1);
CvMat* samples = cvCreateMat( nsamples, 2, CV_32FC1 );
CvMat* labels = cvCreateMat( nsamples, 1, CV_32SC1 );
IplImage* img = cvCreateImage( cvSize( 500, 500 ), 8, 3 );
float _sample[2];
CvMat sample = cvMat( 1, 2, CV_32FC1, _sample );
CvEM em_model;
CvEMParams params;
CvMat samples_part;
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
cvReshape( samples, samples, 2, 0 );
for( i = 0; i < N; i++ )
{
CvScalar mean, sigma;
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
// form the training samples
2011-02-26 19:05:10 +08:00
cvGetRows( samples, &samples_part, i*nsamples/N,
2011-02-23 04:43:26 +08:00
(i+1)*nsamples/N );
mean = cvScalar(((i
((i/N1)+1.)*img->height/(N1+1));
sigma = cvScalar(30,30);
2011-02-26 19:05:10 +08:00
cvRandArr( &rng_state, &samples_part, CV_RAND_NORMAL,
2011-02-23 04:43:26 +08:00
mean, sigma );
}
cvReshape( samples, samples, 1, 0 );
2011-02-26 19:05:10 +08:00
2011-05-16 03:15:36 +08:00
// initialize model parameters
2011-02-23 04:43:26 +08:00
params.covs = NULL;
params.means = NULL;
params.weights = NULL;
params.probs = NULL;
params.nclusters = N;
params.cov_mat_type = CvEM::COV_MAT_SPHERICAL;
params.start_step = CvEM::START_AUTO_STEP;
params.term_crit.max_iter = 10;
params.term_crit.epsilon = 0.1;
params.term_crit.type = CV_TERMCRIT_ITER|CV_TERMCRIT_EPS;
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
// cluster the data
em_model.train( samples, 0, params, labels );
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
#if 0
// the piece of code shows how to repeatedly optimize the model
2011-02-26 19:05:10 +08:00
// with less-constrained parameters
2011-02-23 04:43:26 +08:00
//(COV_MAT_DIAGONAL instead of COV_MAT_SPHERICAL)
2011-05-16 03:15:36 +08:00
// when the output of the first stage is used as input for the second one.
2011-02-23 04:43:26 +08:00
CvEM em_model2;
params.cov_mat_type = CvEM::COV_MAT_DIAGONAL;
params.start_step = CvEM::START_E_STEP;
params.means = em_model.get_means();
params.covs = (const CvMat**)em_model.get_covs();
params.weights = em_model.get_weights();
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
em_model2.train( samples, 0, params, labels );
2011-02-26 19:05:10 +08:00
// to use em_model2, replace em_model.predict()
2011-02-23 04:43:26 +08:00
// with em_model2.predict() below
#endif
// classify every image pixel
cvZero( img );
for( i = 0; i < img->height; i++ )
{
for( j = 0; j < img->width; j++ )
{
CvPoint pt = cvPoint(j, i);
sample.data.fl[0] = (float)j;
sample.data.fl[1] = (float)i;
int response = cvRound(em_model.predict( &sample, NULL ));
CvScalar c = colors[response];
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
cvCircle( img, pt, 1, cvScalar(c.val[0]*0.75,
c.val[1]*0.75,c.val[2]* 0.75), CV_FILLED );
}
}
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
//draw the clustered samples
for( i = 0; i < nsamples; i++ )
{
CvPoint pt;
pt.x = cvRound(samples->data.fl[i*2]);
pt.y = cvRound(samples->data.fl[i*2+1]);
cvCircle( img, pt, 1, colors[labels->data.i[i]], CV_FILLED );
}
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
cvNamedWindow( "EM-clustering result", 1 );
cvShowImage( "EM-clustering result", img );
cvWaitKey(0);
2011-02-26 19:05:10 +08:00
2011-02-23 04:43:26 +08:00
cvReleaseMat( &samples );
cvReleaseMat( &labels );
return 0;
}
2011-03-03 15:29:55 +08:00
2011-02-23 04:43:26 +08:00