opencv/doc/cv_object_recognition.tex

\section{Object Recognition}

\subsection{Bag of Visual Words Matching}

The functions and classes described in this section are used to allow OpenCV's 2D feature descriptors to be used in a bag of words framework, first described in \cite{sivic_zisserman_2003}.

\cvclass{BasicBOWTrainer}
Class used for training visual vocabularies using the bag of words approach.

\begin{lstlisting}
class BasicBOWTrainer : public BOWTrainer
{
public:
    BasicBOWTrainer(const int ClusterCenters);
    //generate vocabulary - input should contain one row per descriptor
    void compute(const Mat& descriptors, Mat& vocabulary);
    void saveVocabulary(const std::string filename, const Mat& vocabulary);
};
\end{lstlisting}

The class must be initialized using \texttt{BasicBOWTrainer(clusterCenterCount)}, where clusterCenterCount specifies how many visual words to learn during the training stage.

\cvCppFunc{BasicBOWTrainer::compute}
Computes a code-book of visual words or \emph{vocabulary} given a set of input descriptor vectors.

\cvdefCpp{void compute(const Mat\& descriptors, Mat\& vocabulary);}
\begin{description}
\cvarg{descriptors}{ Matrix of type CV\_32F containing the features (descriptors) to cluster to generate the code book. The size of the matrix is num\_features x feature\_dimensionality.}
\cvarg{vocabulary}{ Matrix of type CV\_32F which is filled with the code book visual words trained from the input descriptor set. The size of the matrix is cluster\_center\_count x feature\_dimensionality.}
\end{description}

\cvCppFunc{BasicBOWTrainer::saveVocabulary}
Saves a trained vocabulary to file for use later e.g. by the BOWGenerator class.

\cvdefCpp{void saveVocabulary(const std::string filename, const Mat\& vocabulary);}
\begin{description}
\cvarg{filename}{ Filename to save the vocabulary to.}
\cvarg{vocabulary}{ Matrix of type CV\_32F as returned from BasicBOWTrainer::compute.}
\end{description}

\cvclass{BOWGenerator}
Class used for generating image descriptors or `bag-of-visual-words' vectors for a given image given a set of keypoints and a vocabulary of visual words.

\begin{lstlisting}
template<class dExtractor>
class BOWGenerator: public ImagedescGenerator
{
public:
    /* constructors */
    BOWGenerator(const Mat& vocabulary);
    BOWGenerator(const std::string vocabulary);
    /* 'Bag of visual word' descriptor computation */
    void compute(KeyPointCollection& keypoints, Mat& image_descs);
    void compute(const Mat& image, std::vector<KeyPoint>& points,
        Mat& image_desc);
    void compute(const Mat& image, std::vector<KeyPoint>& points,
        Mat& image_desc, std::vector<std::vector<KeyPoint> >& keypoint_data);
    void compute(KeyPointCollection& keypoints, Mat& image_descs,
        std::vector<std::vector<std::vector<KeyPoint> > >& keypoint_data);
};
\end{lstlisting}

The class must first be initialized with a vocabulary of visual words trained using the BasicBOWTrainer class. Such a vocabulary can be specified directly by calling the \texttt{BOWGenerator(const Mat\& vocabulary)} with a pre-computed vocabulary in the form of an OpenCV matrix. Alternatively, the \texttt{BOWGenerator(const std::string vocabulary)} constructor can be used which loads in a visual vocabulary previously saved to file using \texttt{BasicBOWTrainer::saveVocabulary}.

This is a template class, and it must also be initialized with a class type parameter derived from the features\_2d::DescriptorExtractor abstract base class. A simple example of usage in conjunction with the BasicBowTrainer class and using SURF descriptors might be as follows:

\begin{lstlisting}
cv::Mat all_descriptors, voc_vocab
//-- load descriptors from training images into all_descriptors matrix --
//Train a vocabulary of visual words using BasicBOWTrainer class
cv::BasicBOWTrainer bow_trainer(5000);
bow_trainer.compute(all_descriptors, voc_vocab);
//Initialize BOWGenerator using the trained visual vocabulary and
// specify that SURF visual features should be used when extracting
// feature descriptors
cv::BOWGenerator<cv::SurfDescriptorExtractor> bow_gen(voc_vocab);
\end{lstlisting}

Once the class has been properly initialized with a visual vocabulary and extractor type, the \texttt{BOWGenerator::compute} member function can be used to compute image descriptors using the vocabulary.

\cvCppFunc{BOWGenerator::compute}
Computes the `bag-of-visual-words' vector for a set of keypoints using the currently loaded visual vocabulary. There are several different ways in which this can be called. The most basic way is to compute the image descriptor for a single image using:

\cvdefCpp{void compute(const Mat\& image, std::vector<KeyPoint>\& points,
    Mat\& image\_desc);}
\begin{description}
\cvarg{image}{ Source image for which to compute the image descriptor.}
\cvarg{points}{ A vector of keypoints extracted from the image using a class derived from features\_2d::FeatureDetector.}
\cvarg{image\_desc}{ A vector of type CV\_32F in which the bag of words vector for \texttt{image} is returned. The vector is of the same length as the size of the visual vocabulary in use.}
\end{description}

If information about the specific keypoint to which a given visual word occurence in the returned image descriptor relates is required (e.g. to incorporate a spatial verification stage when matching bag of words vectors) the following overloaded version of the function can be used:

\cvdefCpp{void compute(const Mat\& image, std::vector<KeyPoint>\& points,
    Mat\& image\_desc, std::vector<std::vector<KeyPoint> >\& keypoint\_data);}
A three dimensional vector of keypoints is returned in \texttt{keypoint\_data} which can be used to establish which keypoint in the \texttt{points} input array each visual word occurence in \texttt{image\_desc} relates to. Keypoints can be indexed in the form:
\[
\texttt{keypoint\_data}[\texttt{visual\_word\_index}][\texttt{occurence\_index}]
\]

Finally, bag of words vectors can be returned for multiple images at the same time by passing image and keypoint data to the function using a KeyPointCollection structure. There are two versions which correspond to the overloaded function calls for single images above:

\cvdefCpp{void compute(KeyPointCollection\& keypoints, Mat\& image\_descs);
void compute(KeyPointCollection\& keypoints, Mat\& image\_descs, std::vector<std::vector<std::vector<KeyPoint> > >\& keypoint\_data);}

In the case of the version which returns keypoint data, as the keypoints from multiple images have now been used \texttt{keypoint\_data} is now indexed in the form:
\[
\texttt{keypoint\_data}[\texttt{image\_index}][\texttt{visual\_word\_index}][\texttt{occurence\_index}]
\].

\subsection{PASCAL VOC Datasets}

This section documents OpenCV's interface to the PASCAL Visual Object Classes Challenge datasets\footnote{http://pascallin.ecs.soton.ac.uk/challenges/VOC/}. This can be used to load in data from all VOC datasets from VOC2007 up to and including the most recent (VOC 2010) and evaluate the performance of a given approach to object recognition in a standardized manner. The VOC2005 and VOC2006 dataset are currently unsupported due to differences in the way these older datasets store ground truth data.

The interface conforms to the guidelines provided by the PASCAL VOC development kit\footnote{http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/index.html\#devkit} and can be used to evaluate (and output standard VOC results files) for both the classification and detection tasks as well as output standard classification performance metrics such as precision/recall/ap for a given object classification/query result.

\cvclass{VocData}
Class used to encapsulate all input/output operations to the PASCAL VOC dataset and compute standard performance metrics for a given object classification/query result.

\begin{lstlisting}
class VocData : public ObdData
{
public:
    /* constructors */
    VocData(std::string rootDir, bool useTestDataset,
        VocDataset dataset = CV_VOC2010);
    /* functions for returning classification/object data for multiple
        images given an object class */
    void getClassImages(const std::string& obj_class,
        const ObdDatasetType dataset, vector<ObdImage>& images,
        vector<bool>& object_present);
    void getClassObjects(const std::string& obj_class,
        const ObdDatasetType dataset, vector<ObdImage>& images,
        vector<vector<ObdObject> >& objects);
    void getClassObjects(const std::string& obj_class,
        const ObdDatasetType dataset, vector<ObdImage>& images,
        vector<vector<ObdObject> >& objects,
        vector<vector<VocObjectData> >& object_data,
        vector<VocGT>& ground_truth);
    /* functions for returning object data for a single image
        given an image id */
    ObdImage getObjects(const std::string& id, vector<ObdObject>& objects);
    ObdImage getObjects(const std::string& id, vector<ObdObject>& objects,
        vector<VocObjectData>& object_data);
    ObdImage getObjects(const std::string& obj_class, const std::string id,
        vector<ObdObject>& objects, vector<VocObjectData>& object_data,
        VocGT& ground_truth);
    /* functions for returning the ground truth (present/absent) for
        groups of images */
    void getClassifierGroundTruth(const std::string& obj_class,
        const vector<ObdImage>& images, vector<bool>& ground_truth);
    void getClassifierGroundTruth(const std::string& obj_class,
        const vector<std::string>& images, vector<bool>& ground_truth);
    void getDetectorGroundTruth(const std::string& obj_class,
        const ObdDatasetType dataset, const vector<ObdImage>& images,
        const vector<vector<Rect> >& bounding_boxes,
        const vector<vector<float> >& scores,
        vector<vector<bool> >& ground_truth,
        vector<vector<bool> >& detection_difficult,
        bool ignore_difficult = true);
    /* functions for writing VOC-compatible results files */
    void writeClassifierResultsFile(const std::string& obj_class,
        const ObdDatasetType dataset, const vector<ObdImage>& images,
        const vector<float>& scores, const int competition = 1,
        const bool overwrite_ifexists = false);
    void writeDetectorResultsFile(const std::string& obj_class,
        const ObdDatasetType dataset, const vector<ObdImage>& images,
        const vector<vector<float> >& scores,
        const vector<vector<Rect> >& bounding_boxes,
        const int competition = 3,
        const bool overwrite_ifexists = false);
    /* functions for calculating metrics from a set of
        classification/detection results */
    string getResultsFilename(const std::string& obj_class,
        const VocTask task, const ObdDatasetType dataset,
        const int competition = -1, const int number = -1);
    void calcClassifierPrecRecall(const std::string& obj_class,
        const vector<ObdImage>& images, const vector<float>& scores,
        vector<float>& precision, vector<float>& recall, float& ap);
    void calcClassifierPrecRecall(const std::string& obj_class,
        const vector<ObdImage>& images, const vector<float>& scores,
        vector<float>& precision, vector<float>& recall, float& ap,
        vector<size_t>& ranking);
    void calcClassifierPrecRecall(const std::string& input_file,
        vector<float>& precision, vector<float>& recall, float& ap,
        bool outputRankingFile = false);
    void calcDetectorPrecRecall(const std::string& obj_class,
        const ObdDatasetType dataset, const vector<ObdImage>& images,
        const vector<vector<float> >& scores,
        const vector<vector<Rect> >& bounding_boxes,
        vector<float>& precision, vector<float>& recall, float& ap,
        bool ignore_difficult = true);
    void calcDetectorPrecRecall(const std::string& input_file,
        vector<float>& precision, vector<float>& recall, float& ap,
        bool ignore_difficult = true);
    /* functions for calculating confusion matrices */
    void calcClassifierConfMatRow(const std::string& obj_class,
        const vector<ObdImage>& images, const vector<float>& scores,
        const VocConfCond cond, const float threshold,
        vector<string>& output_headers, vector<float>& output_values);
    void calcDetectorConfMatRow(const std::string& obj_class,
        const ObdDatasetType dataset, const vector<ObdImage>& images,
        const vector<vector<float> >& scores,
        const vector<vector<Rect> >& bounding_boxes, const VocConfCond cond,
        const float threshold, vector<string>& output_headers,
        vector<float>& output_values, bool ignore_difficult = true);
    /* functions for outputting gnuplot output files */
    void savePrecRecallToGnuplot(const std::string output_file,
        const vector<float>& precision, const vector<float>& recall,
        const float ap, const std::string title = std::string(),
        const VocPlotType plot_type = CV_VOC_PLOT_SCREEN);
    /* functions for reading in result/ground truth files */
    void readClassifierGroundTruth(const std::string& obj_class,
        const ObdDatasetType dataset, vector<ObdObject>& images,
        vector<bool>& object_present);
    void readClassifierResultsFile(const std:: string& input_file,
        vector<ObdImage>& images, vector<float>& socres);
    void readDetectorResultsFile(const std::string& input_file,
        vector<ObdImage>& images, vector<vector<float> >& scores,
        vector<vector<Rect> >& bounding_boxes);
    /* functions for getting dataset info */
    std::vector<std::string> getObjectClasses();
    std::string getResultsDirectory();
};
\end{lstlisting}
The first step in using the class is to initialize it with the desired VOC dataset and the path to the root directory where the VOC ground truth data is stored. Below is the description of the class constructor.

\cvCppFunc{VocData::VocData}

\cvdefCpp{VocData(std::string rootDir, bool useTestDataset, VocDataset dataset = CV\_VOC2010)}
\begin{description}
\cvarg{rootDir}{ The path to the directory which contains the ground truth data for the VOC dataset to load. For example, in the case of the VOC2010 dataset, this would be set to the location of the `VOC2010' directory. The VOC datasets can be downloaded from the PASCAL VOC website\footnote{http://pascallin.ecs.soton.ac.uk/challenges/VOC/}.}
\cvarg{useTestDataset}{ Determines whether the VOC test dataset is also available in the VOC dataset folder. This in general needs to be obtained seperately from the VOC training/validation set and is the dataset used to evaluate performance in the final challenge. If the VOC test dataset is available, the combination of the VOC training and validation datasets are used as the class `training' dataset when retrieving ground truth data using the interface and the VOC test dataset is used as the class `test' dataset. If the VOC test dataset is not available, the VOC training dataset is used as the class `training' dataset and the VOC validation dataset is used as the class `test' dataset.}
\cvarg{VocDataset}{ Specifies the VOC dataset to use. Must correspond to the ground truth data available at the location specified by \texttt{rootDir}. Can be one of the following values: \texttt{\{CV\_VOC2007, CV\_VOC2008, CV\_VOC2009, CV\_VOC2010\}}.}
\end{description}

\cvCppFunc{VocData::getClassImages}

Return the classification ground truth data for all images of a given VOC object class.

\cvdefCpp{void getClassImages(const std::string\& obj\_class, const ObdDatasetType dataset, vector<ObdImage>\& images, vector<bool>\& object\_present)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to retrieve ground truth data.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract images from the training or test set.}
\cvarg{images}{ Used to return an array of \texttt{ObdImage} containing info of all images extracted from the ground truth file for the given object class.}
\cvarg{object\_present}{ An array of bools specifying whether the object specified by \texttt{obj\_class} is present in each image or not.}
\end{description}

This function is primarily useful for the classification task, where only whether a given object is present or not in an image is required, and not each object instance's position etc. For the detection task \texttt{getClassObjects} is more suitable.

\cvCppFunc{VocData::getClassObjects}

Return the object data for all images of a given VOC object class. This function returns extended object information in addition to the absent/present classification data returned by \texttt{getClassImages}.

\cvdefCpp{void getClassObjects(const std::string\& obj\_class, const ObdDatasetType dataset,
vector<ObdImage>\& images, vector<vector<ObdObject> >\& objects)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to retrieve ground truth data.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract images from the training or test set.}
\cvarg{images}{ Used to return an array of \texttt{ObdImage} containing info of all images extracted from the ground truth file for the given object class.}
\cvarg{objects}{ A 2D vector returning the extended object info (bounding box etc.) for each object instance in each image. The first dimension indexes the image, and the second the objects within that image. See \texttt{ObdObject} for more details.}
\end{description}

There is a further overloaded version of the function which returns extended information in addition to the basic object bounding box data encapsulated in the array of \texttt{ObdObject}'s:

\cvdefCpp{void getClassObjects(const std::string\& obj\_class, const ObdDatasetType dataset,
vector<ObdImage>\& images, vector<vector<ObdObject> >\& objects,
vector<vector<VocObjectData> >\& object\_data, vector<VocGT>\& ground\_truth}
\begin{description}
\cvarg{object\_data}{ A 2D vector returning VOC-specific extended object info (marked difficult etc.). See \texttt{VocObjectData} for more details.}
\cvarg{ground\_truth}{ Returns whether there are any difficult/non-difficult instances of the current object class within each image. If there are non-difficult instances, the value corresponding to any image is set to \texttt{CV\_VOC\_GT\_PRESENT}. If there are only difficult instances it is set to \texttt{CV\_VOC\_GT\_DIFFICULT}. Otherwise the object is not present, and it is set to \texttt{CV\_VOC\_GT\_NONE}.}
\end{description}

\cvCppFunc{VocData::getObjects}

Return ground truth data for the objects present in an image with a given VOC image code. This is used to retrieve the ground truth data for a specific image from the VOC dataset given it's identifier in the format \texttt{YYYY\_XXXXXX} where \texttt{YYYY} specifies the year of the VOC dataset the image was originally from (e.g. 2010 in the case of the VOC 2010 dataset) and \texttt{XXXXXX} is a unique identifying code\footnote{The VOC2007 dataset lacks the year portion of the code}.

\cvdefCpp{ObdImage getObjects(const std::string\& id, vector<ObdObject>\& objects)}
\begin{description}
\cvarg{id}{ VOC unique identifier of the image for which ground truth data should be retrieved (string code in form YYYY\_XXXXXX where YYYY is the year)}
\cvarg{objects}{ Returns the extended object info (bounding box etc.) for each object in the image. See \texttt{ObdObject} for more details.}
\end{description}

The function returns an instance of \texttt{ObdImage} containing the path of the image in the filesystem with the given code. There are also two extended versions of this function which return additional information:

\cvdefCpp{ObdImage getObjects(const std::string\& id, vector<ObdObject>\& objects, vector<VocObjectData>\& object\_data)}
\begin{description}
\cvarg{object\_data}{ Returns VOC-specific extended object info (marked difficult etc.) for the objects in the image. See \texttt{VocObjectData} for more details.}
\end{description}

\cvdefCpp{ObdImage getObjects(const std::string\& obj\_class, const std::string id, vector<ObdObject>\& objects,
vector<VocObjectData>\& object\_data, VocGT\& ground\_truth)}
\begin{description}
\cvarg{ground\_truth}{ Returns whether there are any difficult/non-difficult instances of the object class specified by \texttt{obj\_class} within the image. If there are non-difficult instances, the value corresponding to any image is set to \texttt{CV\_VOC\_GT\_PRESENT}. If there are only difficult instances it is set to \texttt{CV\_VOC\_GT\_DIFFICULT}. Otherwise the object is not present, and it is set to \texttt{CV\_VOC\_GT\_NONE}.}
\end{description}

\cvCppFunc{VocData::getClassifierGroundTruth}

Return ground truth classification data for the presence/absence of a given object class in an arbitrary array of images.

\cvdefCpp{void getClassifierGroundTruth(const std::string\& obj\_class, const vector<ObdImage>\& images,
vector<bool>\& ground\_truth);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to retrieve ground truth data for.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which ground truth data will be returned.}
\cvarg{ground\_truth}{ An output array indicating the presence/absence of \texttt{obj\_class} within each image}
\end{description}

There is also an overloaded version which accepts a vector of image code strings instead of a vector of ObdImage:

\cvdefCpp{void getClassifierGroundTruth(const std::string\& obj\_class, const vector<std::string>\& images,
vector<bool>\& ground\_truth);}

\cvCppFunc{VocData::getDetectorGroundTruth}

Return ground truth detection data for the accuracy of an array of object detections.

\cvdefCpp{void getDetectorGroundTruth(const std::string\& obj\_class, const ObdDatasetType dataset, const vector<ObdImage>\& images, const vector<vector<Rect> >\& bounding\_boxes, const vector<vector<float> >\& scores, vector<vector<bool> >\& ground\_truth, vector<vector<bool> >\& detection\_difficult, bool ignore\_difficult = true);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class represented by the detections in \texttt{bounding\_boxes}.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract ground truth for the training or test set.}
\cvarg{images}{ An input array of image code strings relating to the images in which objects have been detected.}
\cvarg{bounding\_boxes}{ A 2D input array of detection bounding boxes. The first dimension relates to the image in which the object was detected, and the second dimension relates to the index of the detected object.}
\cvarg{scores}{ An input array containing the pre-calculated match score for each detection. This is used as in the case of multiple detections of the same object (see below), the detection with the highest match will be assigned as a true positive with all others being marked as false positives.}
\cvarg{ground\_truth}{ A 2D output array of booleans which is set to \texttt{true} for every successful detection and \texttt{false} otherwise.}
\cvarg{detection\_difficult}{ A 2D output array indicating whether the detection fired on an object marked as `difficult'. This allows it to be ignored if necessary (the VOC documentation specifies objects marked as difficult have no effects on the results and are effectively ignored).}
\cvarg{ignore\_difficult}{ Determines whether objects marked as 'difficult' should be ignored for the purposes of evaluation or not (default true - as specified in the voc documentation, in this case objects marked as difficult have no effects on the results, and even accurate detections of difficult objects are marked as \texttt{false}.)}
\end{description}

Note that as specified in the VOC development kit documentation, multiple detections of the same object in an image are considered FALSE detections e.g. 5 detections of a single object is counted as one true positive (the detection with the highest score, as per the implementation in the VOC development kit) with the remaining 4 detections being marked as false positives. This is generally not the behaviour desired, and as such it is the responsibility of the participant's system to filter such multiple detections from its output.

\cvCppFunc{VocData::writeClassifierResultsFile}

Write VOC-compliant classifier results file to the current dataset results directory (at the location defined by the VOC documentation).

\cvdefCpp{void writeClassifierResultsFile(const std::string\& obj\_class, const ObdDatasetType dataset, const vector<ObdImage>\& images, const vector<float>\& scores, const int competition = 1, const bool overwrite\_ifexists = false)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to write a results file.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract images from the training or test set.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which data will be saved to the result file.}
\cvarg{scores}{ A corresponding input array of confidence scores for the presence of the specified object class in each image of the \texttt{images} array.}
\cvarg{competition}{ If specified, defines which competition the results are for (see VOC development kit documentation -- default 1).}
\cvarg{overwrite\_ifexists}{ Specifies whether the classifier results file should be overwritten if it exists. By default, this is false and instead a new file with a numbered postfix will be created.}
\end{description}

Note that if the dataset results directory does not exist, the function call will fail. Therefore, it is important to make sure that this directory is created beforehand. Details as to it's location can be found in the VOC documentation, but in general it is a sub-directory named `results' within the dataset root directory.

\cvCppFunc{VocData::writeDetectorResultsFile}

Write VOC-compliant detector results file to the current dataset results directory (at the location defined by the VOC documentation).

\cvdefCpp{void writeDetectorResultsFile(const std::string\& obj\_class, const ObdDatasetType dataset, const vector<ObdImage>\& images, const vector<vector<float> >\& scores, const vector<vector<Rect> >\& bounding\_boxes, const int competition = 3, const bool overwrite\_ifexists = false)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to write a results file.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract images from the training or test set.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which data will be saved to the result file.}
\cvarg{scores}{ A corresponding input array of confidence scores for the presence of the specified object class in each object detection within each image of the \texttt{images} array (the first array dimension corresponds to a given image, and the second dimension corresponds to a given object detection).}
\cvarg{bounding\_boxes}{ A corresponding input array of bounding boxes for the presence of the specified object class in each object detection within each image of the \texttt{images} array.}
\cvarg{competition}{ If specified, defines which competition the results are for (see VOC development kit documentation -- default 3).}
\cvarg{overwrite\_ifexists}{ Specifies whether the classifier results file should be overwritten if it exists. By default, this is false and instead a new file with a numbered postfix will be created.}
\end{description}

Note that as with \texttt{writeClassifierResultsFile} if the dataset results directory does not exist, the function call will fail. Therefore, it is important to make sure that this directory is created beforehand. Details as to it's location can be found in the VOC documentation, but in general it is a sub-directory named `results' within the dataset root directory.

\cvCppFunc{VocData::getResultsFilename}

Used to construct the filename of a VOC-standard classification/detection results file from the object class and active dataset (see the VOC development kit documentation for more details). By default \texttt{writeClassifierResultsFile} and \texttt{writeDetectorResultsFile} both save a file in this format to the current dataset results directory (again, at the location defined by the VOC documentation), and this function can be used to reconstruct this filename to allow the saved results to again be loaded to, for example, calculate the precision-recall for the result set. An example of this usage might be as follows:

\begin{lstlisting}
VocData voc_data("/home/user/VOC/",false);
voc_data.writeClassifierResultsFile("chair", cv::CV_OBD_TEST, images,
    confidences);
/* -- later read in results written by writeClassifierResultsFile and
    calculate precision-recall for the result set */
const std::string result_file =
    voc_data.getResultsFilename("chair", cv::CV_VOC_TASK_CLASSIFICATION,
    cv::CV_OBD_TEST);
voc_data.calcClassifierPrecRecall(result_file, precision, recall, ap);
\end{lstlisting}

\cvdefCpp{std::string getResultsFilename(const std::string obj\_class, const VocTask task,
const ObdDatasetType dataset, const int competition, const int number)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to construct a filename.}
\cvarg{task}{ Specifies whether to generate a filename for the classification (\texttt{CV\_VOC\_TASK\_CLASSIFICATION}) or detection (\texttt{CV\_VOC\_TASK\_DETECTION}) task.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract images from the training or test set.}
\cvarg{competition}{ If specified, defines which competition the results are for (see VOC development kit documentation -- default -1 sets competition number 1 for the classification task or competition number 3 for the detection task).}
\cvarg{number}{If specified and above 0, defines which of a number of duplicate results file produced for a given set of settings should be used (this number will be added as a postfix to the filename. Default -1)}
\end{description}

\cvCppFunc{VocData::calcClassifierPrecRecall}

Used to calculate precision, recall and average precision (AP) over a given set of classification results. The most straightforward way to use this function is to provide the filename of a VOC standard classification results file:

\cvdefCpp{void calcClassifierPrecRecall(const std::string input\_file, vector<float>\& precision,
vector<float>\& recall, float\& ap, bool outputRankingFile = false);}
\begin{description}
\cvarg{input\_file}{ The VOC standard classification results file from which to read data and calculate precision/recall. If a full path is not specified, it is assumed that this file is in the current dataset results directory. The filename itself can be constructed using \texttt{getResultsFilename}.}
\cvarg{precision}{ Returns a vector containing the precision calculated at each datapoint of a p-r curve generated from the result set.}
\cvarg{recall}{ Returns a vector containing the recall calculated at each datapoint of a p-r curve generated from the result set.}
\cvarg{ap}{ Returns the AP (average precision) metric calculated from the result set. This is equivalent to the area under the precision-recall curve.}
\cvarg{outputRankingFile}{ If true, also outputs a plain-text file in the same directory as the input file containing the ranking order (with scores) of the images contained in \texttt{input\_file}. This filename will be named `scoregt\_$\langle$ class$\rangle$ \_name.txt'.}
\end{description}

There is also a version of the function which can be used to calculate precision and recall from a set of input arrays instead of a VOC results file:

\cvdefCpp{void calcClassifierPrecRecall(const std::string\& obj\_class, const vector<ObdImage>\& images,
const vector<float>\& scores, vector<float>\& precision, vector<float>\& recall, float\& ap)}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to calculate precision/recall metrics.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which precision/recall will be calculated.}
\cvarg{scores}{ An input vector containing the similarity score for each input image (higher is more similar).}
\end{description}

There is no need for the input arrays (images and scores) to be sorted in any way. However, internally both are sorted in order of descending score. This ordering may be useful for constructing an ordered ranking list of results, and so there is another version of the function which returns this sorting order:

\cvdefCpp{void calcClassifierPrecRecall(const std::string\& obj\_class, const vector<ObdImage>\& images,
const vector<float>\& scores, vector<float>\& precision, vector<float>\& recall, float\& ap, vector<size\_t>\& ranking)}
\begin{description}
\cvarg{ranking}{ A output vector containing indices which can subsequently be used to retrieve elements of \texttt{images} and \texttt{scores} in descending order of similarity score. For example, to access the first sorted item in the ranked list in the \texttt{images} array use:
\[
\texttt{images}[\texttt{ranking}[\texttt{0}]]}
\]
\end{description}

Note that to calculate the average precision (AP) instead of taking the area beneath the precision-recall curve as-is, a monotonically decreasing version of the curve is generated with the precision $p_o$ at a given recall $r_o$ given by the maximum precision acheived at any recall $r \ge r_o$. Furthermore, for datasets prior to VOC2010, this curve is then sampled at discrete points $r = 0.0, 0.1, 0.2, \cdots , 0.9, 1.0$ when calculating the bounded area.

\cvCppFunc{VocData::calcDetectorPrecRecall}

Used to calculate precision, recall and average precision (AP) over a given set of detection results. The most straightforward way to use this function is to provide the filename of a VOC standard detection results file:

\cvdefCpp{void calcDetectorPrecRecall(const std::string\& input\_file, vector<float>\& precision,
vector<float>\& recall, float\& ap, bool ignore\_difficult = true);}
\begin{description}
\cvarg{input\_file}{ The VOC standard detection results file from which to read data and calculate precision/recall. If a full path is not specified, it is assumed that this file is in the current dataset results directory. The filename itself can be constructed using \texttt{getResultsFilename}.}
\cvarg{precision}{ Returns a vector containing the precision calculated at each datapoint of a p-r curve generated from the result set.}
\cvarg{recall}{ Returns a vector containing the recall calculated at each datapoint of a p-r curve generated from the result set.}
\cvarg{ap}{ Returns the AP (average precision) metric calculated from the result set. This is equivalent to the area under the precision-recall curve.}
\cvarg{ignore\_difficult}{ Determines whether objects marked as `difficult' should be ignored for the purposes of evaluation or not (default true - as specified in the voc documentation, in this case objects marked as difficult have no effects on the results.)}
\end{description}

There is also a version of the function which can be used to calculate precision and recall from a set of input arrays instead of a VOC results file:

\cvdefCpp{void calcDetectorPrecRecall(const std::string\& obj\_class, const ObdDatasetType dataset, const vector<ObdImage>\& images, const vector<vector<float> >\& scores, const vector<vector<Rect> >\& bounding\_boxes, vector<float>\& precision, vector<float>\& recall, float\& ap, bool ignore\_difficult = true);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to calculate precision/recall metrics.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract ground truth for the training or test set.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which precision/recall will be calculated.}
\cvarg{scores}{ A 2D input vector containing the similarity score for each detected object (higher is more similar -- the first dimension indexes the image within which the object was detected, and the second dimension indexes the collection of detected objects within each image).}
\cvarg{bounding\_boxes}{ A 2D input vector containing the predicted boundary box for each detected object.}
\end{description}

In both cases, the validity of a detection in the results set is calculated internally using \texttt{getDetectorGroundTruth} and the overlap criterion specified in the VOC documentation is used to determine whether a particular detection is accurate or not.

\cvCppFunc{VocData::calcClassifierConfMatRow}

Used to calculate the row of a confusion matrix given a set of classifier results for a VOC object class.

\cvdefCpp{void calcClassifierConfMatRow(const std::string\& obj\_class,
const vector<ObdImage>\& images, const vector<float>\& scores, const VocConfCond cond,
const float threshold, vector<string>\& output\_headers, vector<float>\& output\_values);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to calculate the confusion matrix row.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which the confusion matrix row will be calculated.}
\cvarg{scores}{ An input vector containing the similarity score for each input image (higher is more similar).}
\cvarg{cond}{ The condition to use when determining the number of images which should be taken into account when calculating the confusion matrix row. If set to \texttt{CV\_VOC\_CCOND\_RECALL} all images up to a proportion recall specified by \texttt{threshold} are considered. If set to \texttt{CV\_VOC\_CCOND\_SCORETHRESH} all images with a score above the value specified by \texttt{threshold} are considered.}
\cvarg{threshold}{ The threshold to use when determining the number of images which should be taken into account when calculating the confusion matrix row. Used in conjunction with \texttt{cond}.}
\cvarg{output\_headers}{ An output vector of object class headers for the confusion matrix row.}
\cvarg{output\_values}{ An output vector of values for the confusion matrix row corresponding to the classes defined in \texttt{output\_headers}. This is normalized such that $\sum output\_headers = 1$.}
\end{description}

For the \texttt{cond} parameter \texttt{CV\_VOC\_CCOND\_SCORETHRESH} is particularly useful when the scores for each image are the output of a classifier, with +1 defining positives, -1 defining negatives and 0 being the class boundary. In this case, to account for only cases in which the object class was detected, use \texttt{CV\_VOC\_CCOND\_SCORETHRESH} and set \texttt{threshold} to 0. Alternatively, when setting \texttt{cond} to \texttt{CV\_VOC\_CCOND\_RECALL} the confusion matrix row at 50\% recall could be calculated by setting \texttt{threshold} to 0.5, for example.

The methodology used by the classifier version of this function is that true positives have a single unit added to the column corresponding to \texttt{obj\_class} in the confusion matrix row, whereas false positives have a single unit distributed in proportion between all the columns in the confusion matrix row corresponding to the objects present in the image.

A full confusion matrix can be constructed by calling this function recursively with classification results for each one of the classes of the current dataset (retrieved using \texttt{getObjectClasses}) The individual rows calculated in this way can then be concatenated into a single confusion matrix. This can be useful for inspecting the performance of a given approach to classification more thoroughly, allowing frequently confused object classes to be identified and the classification algorithm to be optimised.

\cvCppFunc{VocData::calcDetectorConfMatRow}

Used to calculate the row of a confusion matrix given a set of detection results for a VOC object class.

\cvdefCpp{virtual void calcDetectorConfMatRow(const std::string\& obj\_class, ObdDatasetType dataset, const vector<ObdImage>\& images, const vector<vector<float> >\& scores, const vector<vector<Rect> >\& bounding\_boxes, const VocConfCond cond, const float threshold, vector<string>\& output\_headers, vector<float>\& output\_values, bool ignore\_difficult = true);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to calculate the confusion matrix row.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract ground truth for the training or test set.}
\cvarg{images}{ An input array of \texttt{ObdImage} containing the images for which the confusion matrix row will be calculated.}
\cvarg{scores}{ A 2D input vector containing the similarity score for each object detected in each input image (higher is more similar).}
\cvarg{bounding\_boxes}{ A 2D input vector containing the predicted boundary box for each detected object.}
\cvarg{cond}{ The condition to use when determining the number of images which should be taken into account when calculating the confusion matrix row. If set to \texttt{CV\_VOC\_CCOND\_RECALL} all images up to a proportion recall specified by \texttt{threshold} are considered. If set to \texttt{CV\_VOC\_CCOND\_SCORETHRESH} all images with a score above the value specified by \texttt{threshold} are considered.}
\cvarg{threshold}{ The threshold to use when determining the number of images which should be taken into account when calculating the confusion matrix row. Used in conjunction with \texttt{cond}.}
\cvarg{output\_headers}{ An output vector of object class headers for the confusion matrix row. In addition to the object classes present in the currently active dataset, a further additional object class labelled `background' is added to the end of this array and object detections are assigned to this when their bounding boxes do not overlap any objects in their parent images (as defined by the dataset groundtruth) and thus are `background'.}
\cvarg{output\_values}{ An output vector of values for the confusion matrix row corresponding to the classes defined in \texttt{output\_headers}. This is normalized such that $\sum output\_headers = 1$.}
\cvarg{ignore\_difficult}{ Determines whether objects marked as `difficult' should be ignored for the purposes of evaluation or not (default true - as specified in the voc documentation, in this case objects marked as difficult have no effects on the results.)}
\end{description}

The methodology used by the detection version of this function is as follows: each object detection is assigned to the closest matching object in the ground truth as specified by the overlap score defined in the voc development kit documentation. If the object detection passes the overlap condition, and thus counts as a true positive detection, for the class defined by \texttt{obj\_class} but the overlap score is higher for a class \emph{other} than \texttt{obj\_class}, the detection will not be assigned to \texttt{obj\_class} but to the second object category. Furthermore, unlike the ground truth returned by \texttt{getDetectorGroundTruth} and used in the \texttt{calcDetectorPrecRecall} function, multiple detections \emph{are not} accounted for explicitly. This means that if three detections overlap the same object in the ground truth, they will all be assigned to that object's class (whereas when using \texttt{getDetectorGroundTruth} only the first such detection would be counted as a valid detection). This is another reason to follow the guidelines specified by the voc development kit detection and first filter multiple detections from the input arrays before calling this function.

A full confusion matrix can be constructed by calling this function recursively with detection results for each one of the classes of the current dataset (retrieved using \texttt{getObjectClasses}) The individual rows calculated in this way can then be concatenated into a single confusion matrix. This can be useful for inspecting the performance of a given approach to detection more thoroughly, allowing frequently confused object classes to be identified and the detection algorithm to be optimised.

For further notes about the \texttt{cond} and \texttt{threshold} parameters, see the documentation for the \texttt{calcClassifierConfMatRow} function.

\cvCppFunc{VocData::savePrecRecallToGnuplot}

Used to output a set of precision-recall results (generated using \texttt{calcClassifierPrecRecall} or \texttt{calcDetectorPrecRecall}) to a GNUPlot\footnote{http://www.gnuplot.info/} compatible data file. In conjunction with GNUPlot, this can be used to easily produce a precision-recall plot and display that plot on screen or save it to a PDF file by passing the data file to GNUPlot as a parameter. For example:
\[
\texttt{>> gnuplot "datafile.dat"}
\]

\cvdefCpp{void savePrecRecallToGnuplot(const std::string\& output\_file, const vector<float>\& precision,
const vector<float>\& recall, const float ap, const std::string title, const VocPlotType plot\_type)}
\begin{description}
\cvarg{output\_file}{ The filename to save the GNUPlot datafile. If a full path is not specified, it is assumed that this file should be save to the current dataset results directory.}
\cvarg{precision}{ An input vector of precision values as returned from \texttt{calcClassifierPrecRecall} or \texttt{calcDetectorPrecRecall}.}
\cvarg{recall}{ An input vector of recall values as returned from \texttt{calcClassifierPrecRecall} or \texttt{calcDetectorPrecRecall}.}
\cvarg{ap}{ The AP (average precision) as returned from \texttt{calcClassifierPrecRecall} or \texttt{calcDetectorPrecRecall}.}
\cvarg{title}{ Title to use for the plot (if not specified, just the AP is printed as the title). This also specifies the filename of the output file if printing to pdf.}
\cvarg{plot\_type}{ Specifies whether to instruct GNUPlot to save to a PDF file (\texttt{CV\_VOC\_PLOT\_PDF}) or directly to screen (\texttt{CV\_VOC\_PLOT\_SCREEN}) in the datafile.}
\end{description}

Note that no plot file is produced nor is any plot displayed on the screen when this function is called. In order to do this, pass the generated datafile to GNUPlot via the command line as described above.

\cvCppFunc{VocData::readClassifierGroundTruth}

Utility function which extracts data from the classification ground truth file for a given object class and dataset into a set of output vectors.

\cvdefCpp{void readClassifierGroundTruth(const std::string\& obj\_class,
const ObdDatasetType dataset, vector<ObdObject>\& images, vector<bool>\& object\_present);}
\begin{description}
\cvarg{obj\_class}{ The VOC object class identifier string for the object class for which to retrieve the classifier ground truth.}
\cvarg{dataset}{ Either \texttt{CV\_OBD\_TRAIN} or \texttt{CV\_OBD\_TEST}. Specifies whether to extract ground truth for the training or test set.}
\cvarg{images}{ An output array of \texttt{ObdImage} containing the images extracted from the ground truth file.}
\cvarg{object\_present}{ An output array of bools specifying whether the object specified by \texttt{obj\_class} is present in each image extracted from the ground truth file or not.}
\end{description}

\cvCppFunc{VocData::readClassifierResultsFile}

Utility function which extracts data from a given classifier results file into a set of output vectors.

\cvdefCpp{void readClassifierResultsFile(const std:: string\& input\_file,
vector<ObdImage>\& images, vector<float>\& scores);}
\begin{description}
\cvarg{input\_file}{ The VOC standard classification results file from which to read data. If a full path is not specified, it is assumed that this file is in the current dataset results directory. The filename itself can be constructed using \texttt{getResultsFilename}.}
\cvarg{images}{ An output array of \texttt{ObdImage} containing the images extracted from the results file.}
\cvarg{scores}{ An output array containing the similarity scores of each image extracted from the results file.}
\end{description}

\cvCppFunc{VocData::readDetectorResultsFile}

Utility function which extracts data from a given detector results file into a set of output vectors.

\cvdefCpp{void readDetectorResultsFile(const std::string\& input\_file,
vector<ObdImage>\& images, vector<vector<float> >\& scores,
vector<vector<Rect> >\& bounding\_boxes);}
\begin{description}
\cvarg{input\_file}{ The VOC standard detection results file from which to read data and calculate precision/recall. If a full path is not specified, it is assumed that this file is in the current dataset results directory. The filename itself can be constructed using \texttt{getResultsFilename}.}
\cvarg{images}{ An output array of \texttt{ObdImage} containing the images extracted from the results file.}
\cvarg{scores}{ A 2D output array containing the similarity scores of each object extracted from the results file.}
\cvarg{bounding\_boxes}{ A 2D output array containing the bounding boxes of each object extracted from the results file.}
\end{description}

\cvCppFunc{VocData::getObjectClasses}

\cvdefCpp{std::vector<std::string> getObjectClasses()}

Returns an array of valid object categroy class identifiers for the current dataset.

\cvCppFunc{VocData::getResultsDirectory}

\cvdefCpp{std::string getResultsDirectory()}

Returns the path to the results directory of the current dataset.

\cvclass{ObdImage}

Used to store information related to a single image within a dataset.

\begin{lstlisting}
class ObdImage
{
public:
    ObdImage(std::string p_id, std::string p_path):
        id(p_id), path(p_path) {};
    /* unique identifier code e.g. for VOC is in the form YYYY\_XXXXXX */
    std::string id;
    /* path to the image in the filesystem */
    std::string path;
};
\end{lstlisting}

\cvclass{ObdObject}
Used to store bounding box information about object instances within a parent image.

\begin{lstlisting}
class ObdObject
{
public:
    /* object class of the defined object */
    std::string object_class;
    /* bounding box coordinates of the object in the parent image */
    Rect boundingBox;
};
\end{lstlisting}

\cvclass{VocObjectData}
Used to store VOC-specific datafields related to object instances as described in the VOC development kit documentation.

\begin{lstlisting}
class VocObjectData
{
public:
    /* determines whether the object is marked difficult or not */
    bool difficult;
    /* determines whether the object is mainly occluded by other
        objects or not */
    bool occluded;
    /* determines whether the object is truncated by the edge of the
        image frame or not */
    bool truncated;
    /* the pose of the object. Can be one of: CV_VOC_POSE_UNSPECIFIED,
        CV_VOC_POSE_FRONTAL, CV_VOC_POSE_REAR, CV_VOC_POSE_LEFT,
        CV_VOC_POSE_RIGHT */
    VocPose pose;
};
\end{lstlisting}

\subsection{Extended Visual Features}

\cvclass{DenseFeatureDetector}
Class used for extracting features densely from an image.

\begin{lstlisting}
class DenseFeatureDetector : public FeatureDetector
{
public:
    DenseFeatureDetector(const float feature_scale, const int bound,
        const int sampling_step = 6, const int scale_levels = 1,
        const float scale_mul = 0.1,
        const bool step_varies_with_scale = true,
        const bool bound_varies_with_scale = false);

    virtual void detectImpl(const cv::Mat& image, const cv::Mat& mask,
        std::vector<cv::KeyPoint>& keypoints) const;

    int getStep();

    void setStep(const int step);
\end{lstlisting}

\cvclass{ColorSurfDescriptorExtractor}
Class used for extracting color SURF features (calculated in the Opponent color space) from an image.

\begin{lstlisting}
class ColorSurfDescriptorExtractor : public SurfDescriptorExtractor
{
public:
    ColorSurfDescriptorExtractor(int nOctaves=4,
            int nOctaveLayers=2, bool extended=false);

    virtual void compute( const Mat& image,
        vector<KeyPoint>& keypoints, Mat& descriptors) const;
};
\end{lstlisting}