tesseract/vs2008/sphinx/overview.rst

:version: $RCSfile: index.rst,v $ $Revision: 76e0bf38aaba $ $Date: 2011/03/22 00:48:41 $

.. default-role:: fs

==========
 Overview
==========

The recommended audience for this document are developers who want to
use Microsoft Visual Studio 2008 with `Tesseract-OCR
<http://code.google.com/p/tesseract-ocr/>`_. If you simply want to *run*
`tesseract` or its various language training applications, then see the
`ReadMe <http://code.google.com/p/tesseract-ocr/wiki/ReadMe>`_. You'll
find instructions there on how to download tesseract's Windows
installer.

|Tesseractocr| consists of:

+ `libtesseract` -- the static (or dynamic) library that does all the
  actual work. As of February 2012 it consists of 260+ `C++` files
  along with 290+ header files.

+ `tesseract.exe` -- the command-line OCR engine. It's built from a
  single, small `C++` file that just calls functions in
  `libtesseract`. There currently isn't very much documentation on how
  to use `tesseract.exe`, but you can look at what's there in the
  repository's `doc
  <http://code.google.com/p/tesseract-ocr/source/browse/#svn%2Ftrunk%2Fdoc>`_
  subdirectory.

+ Language packs -- needed by `tesseract.exe` in order to recognize
  particular languages.

.. _training-applications:

+ Language training applications -- used to teach `tesseract.exe` new
  languages. Each has their own (very brief) man page in the `doc
  <http://code.google.com/p/tesseract-ocr/source/browse/#svn%2Ftrunk%2Fdoc>`_
  subdirectory and include:

  + `ambiguous_words.exe` -- generate sets of words Tesseract is likely
    to find ambiguous

  + `classifier_tester` -- tests a Tesseract character classifier on
    data as formatted for training

  + `cntraining.exe` -- character normalization training

  + `combine_tessdata.exe` -- combine/extract/overwrite Tesseract data

  + `dawg2wordlist.exe` -- convert a Tesseract DAWG to a wordlist

  + `mftraining.exe` -- feature training

  + `shapeclustering.exe` -- shape clustering training

  + `unicharset_extractor.exe` -- extract unicharset from Tesseract
    boxfiles

  + `wordlist2dawg.exe` -- convert a wordlist to a DAWG

  Their use is described in the `TrainingTesseract3
  <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>`_
  Wiki page.

This document explains how to:

+ :doc:`Setup <setup>` the proper directory structure required to use
  the supplied Visual Studio 2008 Solution

* :doc:`Build <building>` `libtesseract`, `tesseract.exe`, and the
  training apps

* :doc:`Write <programming>` programs that link with `libtesseract`


..         
   Local Variables:
   coding: utf-8
   mode: rst
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 72
   mode: auto-fill
   standard-indent: 3
   tab-stop-list: (3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60)
   End:
vs2008 and vs2010 replaced with Tom Powers solution git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@681 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2012-02-26 23:30:05 +08:00			`:version: $RCSfile: index.rst,v $ $Revision: 76e0bf38aaba $ $Date: 2011/03/22 00:48:41 $`

			`.. default-role:: fs`

			`==========`
			`Overview`
			`==========`

			`The recommended audience for this document are developers who want to`
			use Microsoft Visual Studio 2008 with `Tesseract-OCR
			<http://code.google.com/p/tesseract-ocr/>`_. If you simply want to run
			`tesseract` or its various language training applications, then see the
			`ReadMe <http://code.google.com/p/tesseract-ocr/wiki/ReadMe>`_. You'll
			`find instructions there on how to download tesseract's Windows`
			`installer.`

			`\|Tesseractocr\| consists of:`

			+ `libtesseract` -- the static (or dynamic) library that does all the
			actual work. As of February 2012 it consists of 260+ `C++` files
			`along with 290+ header files.`

			+ `tesseract.exe` -- the command-line OCR engine. It's built from a
			single, small `C++` file that just calls functions in
			`libtesseract`. There currently isn't very much documentation on how
			to use `tesseract.exe`, but you can look at what's there in the
			repository's `doc
			<http://code.google.com/p/tesseract-ocr/source/browse/#svn%2Ftrunk%2Fdoc>`_
			`subdirectory.`

			+ Language packs -- needed by `tesseract.exe` in order to recognize
			`particular languages.`

			`.. _training-applications:`

			+ Language training applications -- used to teach `tesseract.exe` new
			languages. Each has their own (very brief) man page in the `doc
			<http://code.google.com/p/tesseract-ocr/source/browse/#svn%2Ftrunk%2Fdoc>`_
			`subdirectory and include:`

			+ `ambiguous_words.exe` -- generate sets of words Tesseract is likely
			`to find ambiguous`

			+ `classifier_tester` -- tests a Tesseract character classifier on
			`data as formatted for training`

			+ `cntraining.exe` -- character normalization training

			+ `combine_tessdata.exe` -- combine/extract/overwrite Tesseract data

			+ `dawg2wordlist.exe` -- convert a Tesseract DAWG to a wordlist

			+ `mftraining.exe` -- feature training

			+ `shapeclustering.exe` -- shape clustering training

			+ `unicharset_extractor.exe` -- extract unicharset from Tesseract
			`boxfiles`

			+ `wordlist2dawg.exe` -- convert a wordlist to a DAWG

			Their use is described in the `TrainingTesseract3
			<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>`_
			`Wiki page.`

			`This document explains how to:`

			+ :doc:`Setup <setup>` the proper directory structure required to use
			`the supplied Visual Studio 2008 Solution`

			* :doc:`Build <building>` `libtesseract`, `tesseract.exe`, and the
			`training apps`

			* :doc:`Write <programming>` programs that link with `libtesseract`


			`..`
			`Local Variables:`
			`coding: utf-8`
			`mode: rst`
			`indent-tabs-mode: nil`
			`sentence-end-double-space: t`
			`fill-column: 72`
			`mode: auto-fill`
			`standard-indent: 3`
			`tab-stop-list: (3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60)`
			`End:`