mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-12-21 13:47:50 +08:00
281 lines
9.0 KiB
Plaintext
281 lines
9.0 KiB
Plaintext
|
:version: $RCSfile: index.rst,v $ $Revision: 76e0bf38aaba $ $Date: 2011/03/22 00:48:41 $
|
||
|
|
||
|
.. default-role:: fs
|
||
|
|
||
|
===========================
|
||
|
Setting up |Tesseractocr|
|
||
|
===========================
|
||
|
|
||
|
The Visual Studio 2008 Solutions included with |Tesseractocr|, rely on
|
||
|
*relative paths* to reference files and directories --- including
|
||
|
locations that are *outside* of the `tesseract-3.0x` tree. It is
|
||
|
therefore vitally important to correctly set up the directories for the
|
||
|
various components. This section describes how to do this.
|
||
|
|
||
|
|
||
|
.. _directory-setup:
|
||
|
|
||
|
Initial "Build" directory setup
|
||
|
===============================
|
||
|
|
||
|
First create an empty directory where you will unpack all the required
|
||
|
downloads. Assume you call this directory `C:\\BuildFolder`.
|
||
|
|
||
|
.. _download-leptonica:
|
||
|
|
||
|
1. Download the |Leptonica| 1.68 pre-built binary package
|
||
|
(`leptonica-1.68-win32-lib-include-dirs.zip`) from:
|
||
|
|
||
|
http://code.google.com/p/leptonica/downloads/detail?name=leptonica-1.68-win32-lib-include-dirs.zip
|
||
|
|
||
|
and unpack it to `C:\\BuildFolder`.
|
||
|
|
||
|
2. |Leptonica|, even on Windows as of v1.68, still requires a few unix
|
||
|
utilities (like `rm`, `diff`, `sleep`). The easiest way to deal with
|
||
|
this is to follow the instructions at `Installing Cygwin coreutils
|
||
|
<http://tpgit.github.com/UnOfficialLeptDocs/vs2008/installing-cygwin.html>`_.
|
||
|
|
||
|
At this point, if all you want to do is link with `libtesseract` you can
|
||
|
`download <http://code.google.com/p/tesseract-ocr/downloads/list>`_ the
|
||
|
file that just contains the "public" |Tesseractocr| headers along with
|
||
|
the precompiled library binaries for Windows. Unpack it to
|
||
|
`C:\\BuildFolder` and you'll now have::
|
||
|
|
||
|
C:\BuildFolder\
|
||
|
|
||
|
include\
|
||
|
leptonica\
|
||
|
tesseract\
|
||
|
|
||
|
leptonica_versionnumbers.vsprops
|
||
|
tesseract_versionnumbers.vsprops
|
||
|
|
||
|
lib\
|
||
|
giflib416-static-mtdll-debug.lib
|
||
|
giflib416-static-mtdll.lib
|
||
|
libjpeg8c-static-mtdll-debug.lib
|
||
|
libjpeg8c-static-mtdll.lib
|
||
|
liblept168-static-mtdll-debug.lib
|
||
|
liblept168-static-mtdll.lib
|
||
|
liblept168.dll
|
||
|
liblept168.lib
|
||
|
liblept168d.dll
|
||
|
liblept168d.lib
|
||
|
libpng143-static-mtdll-debug.lib
|
||
|
libpng143-static-mtdll.lib
|
||
|
libtesseract302.dll
|
||
|
libtesseract302.lib
|
||
|
libtesseract302d.dll
|
||
|
libtesseract302d.lib
|
||
|
libtesseract302-static.lib
|
||
|
libtesseract302-static-debug.lib
|
||
|
libtiff394-static-mtdll-debug.lib
|
||
|
libtiff394-static-mtdll.lib
|
||
|
zlib125-static-mtdll-debug.lib
|
||
|
zlib125-static-mtdll.lib
|
||
|
|
||
|
and you can skip the rest of this page and go directly to
|
||
|
:doc:`programming`.
|
||
|
|
||
|
The recommended action, however, is to download the |Tesseractocr|
|
||
|
sources and build them yourself. Therefore...
|
||
|
|
||
|
3. Download the |Tesseractocr| Visual Studio 2008 source files from the
|
||
|
`downloads page
|
||
|
<http://code.google.com/p/tesseract-ocr/downloads/list>`_. If, for
|
||
|
example, you'd like to build v3.02 you would use the following link:
|
||
|
|
||
|
http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-vs2008.zip
|
||
|
|
||
|
Unpack the file to `C:\\BuildFolder`
|
||
|
|
||
|
You would now have the following directory structure::
|
||
|
|
||
|
C:\BuildFolder\
|
||
|
|
||
|
include\
|
||
|
leptonica\
|
||
|
|
||
|
leptonica_versionnumbers.vsprops
|
||
|
tesseract_versionnumbers.vsprops
|
||
|
|
||
|
lib\
|
||
|
giflib416-static-mtdll-debug.lib
|
||
|
giflib416-static-mtdll.lib
|
||
|
libjpeg8c-static-mtdll-debug.lib
|
||
|
libjpeg8c-static-mtdll.lib
|
||
|
liblept168-static-mtdll-debug.lib
|
||
|
liblept168-static-mtdll.lib
|
||
|
liblept168.dll
|
||
|
liblept168.lib
|
||
|
liblept168d.dll
|
||
|
liblept168d.lib
|
||
|
libpng143-static-mtdll-debug.lib
|
||
|
libpng143-static-mtdll.lib
|
||
|
libtiff394-static-mtdll-debug.lib
|
||
|
libtiff394-static-mtdll.lib
|
||
|
zlib125-static-mtdll-debug.lib
|
||
|
zlib125-static-mtdll.lib
|
||
|
|
||
|
tesseract-3.02\
|
||
|
vs2008\
|
||
|
ambiguous_words\
|
||
|
APITest\
|
||
|
APIExamples\
|
||
|
classifier_tester\
|
||
|
cntraining\
|
||
|
combine_tessdata\
|
||
|
dawg2wordlist\
|
||
|
doc\
|
||
|
include\
|
||
|
libtesseract\
|
||
|
libtesseract.vcproj
|
||
|
mftraining\
|
||
|
port\
|
||
|
shapeclustering\
|
||
|
sphinx\
|
||
|
tesseract\
|
||
|
tesseract.vcproj
|
||
|
unicharset_extractor\
|
||
|
wordlist2dawg\
|
||
|
|
||
|
tesseract.sln
|
||
|
tesshelper.py
|
||
|
|
||
|
4. Download the |Tesseractocr| source files for the same version as the
|
||
|
VS2008 files you just unpacked. In this case, the proper link would
|
||
|
be:
|
||
|
|
||
|
http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.tar.gz
|
||
|
|
||
|
Unpack the file to `C:\\BuildFolder`
|
||
|
|
||
|
This will add a bunch of directories to your already existing
|
||
|
`C:\\BuildFolder\\tesseract-3.0x` directory. You should now have (for
|
||
|
v3.02)::
|
||
|
|
||
|
C:\BuildFolder\
|
||
|
|
||
|
include\
|
||
|
leptonica\
|
||
|
lib\
|
||
|
tesseract-3.02\
|
||
|
api\
|
||
|
ccmain\
|
||
|
ccstruct\
|
||
|
ccutil\
|
||
|
classify\
|
||
|
config\
|
||
|
contrib\
|
||
|
cube\
|
||
|
cutil\
|
||
|
dict\
|
||
|
doc\
|
||
|
image\
|
||
|
java\
|
||
|
image\
|
||
|
neural_networks\
|
||
|
tessdata\
|
||
|
testing\
|
||
|
textord\
|
||
|
training\
|
||
|
viewer\
|
||
|
vs2008\
|
||
|
wordrec\
|
||
|
|
||
|
.. _copying-headers:
|
||
|
|
||
|
If you are planning on writing applications that link with
|
||
|
|Tesseractocr|, and you don't want to add all the `tesseract-3.0x`
|
||
|
directories to your project's list of ``include`` directories, then do
|
||
|
this additional step:
|
||
|
|
||
|
5. Copy all the required headers to the "public" include folder.
|
||
|
|
||
|
If you already have a `C:\\BuildFolder\\include\\tesseract`
|
||
|
directory you should delete it in case some of the files have been
|
||
|
removed.
|
||
|
|
||
|
Then use the python `tess-helper.py` script to copy (possibly updated
|
||
|
versions of) the required headers by doing::
|
||
|
|
||
|
cd C:\BuildFolder\tesseract-3.02\vs2008
|
||
|
python tesshelper.py .. copy ..\..\include
|
||
|
|
||
|
See :ref:`tesshelper` for more details.
|
||
|
|
||
|
You are now ready to :doc:`build <building>` |Tesseractocr| using Visual
|
||
|
Studio 2008.
|
||
|
|
||
|
|
||
|
.. _using-latest-sources:
|
||
|
|
||
|
Using the latest |Tesseractocr| sources
|
||
|
=======================================
|
||
|
|
||
|
If you'd like to try the absolute latest version of |Tesseractocr|,
|
||
|
here's how to download the source files from its SVN repository:
|
||
|
|
||
|
1. Follow Steps 1 and 2 :ref:`above <directory-setup>`.
|
||
|
|
||
|
#. `Checkout <http://code.google.com/p/tesseract-ocr/source/checkout>`_
|
||
|
the |Tesseractocr| sources to a directory on your computer. This
|
||
|
directory should :bi:`not` be `C:\\BuildFolder`!
|
||
|
|
||
|
If you are unfamiliar with `SVN <http://subversion.apache.org/>`_,
|
||
|
the easiest way to do this is to first download and install
|
||
|
`TortoiseSVN <http://tortoisesvn.net/>`_ and then:
|
||
|
|
||
|
a. Right-click the (empty) directory where you want the working copy
|
||
|
and choose :menuselection:`SVN Chec&kout...` from
|
||
|
the pop-up menu.
|
||
|
|
||
|
#. Enter ``http://tesseract-ocr.googlecode.com/svn/trunk/`` for
|
||
|
:guilabel:`&URL of repository`. You can keep all the other
|
||
|
settings at their defaults.
|
||
|
|
||
|
.. image:: images/tortoisesvn_checkout.png
|
||
|
:align: center
|
||
|
:alt: TortoiseSVN Checkout Dialog Box
|
||
|
|
||
|
#. Click the :guilabel:`&OK` button to commence downloading the
|
||
|
|Tesseractocr| sources to your computer. This might take a while as
|
||
|
the language data in the `tessdata` directory is quite large. As
|
||
|
of February 2012, about 335MB needs to be transferred for the
|
||
|
initial checkout. The total size of the resulting working copy is
|
||
|
about 1.2GB.
|
||
|
|
||
|
#. Keeping your working copy up to date after this is as simple as
|
||
|
right-clicking its directory and choosing :menuselection:`SVN
|
||
|
&Update`. Unlike the initial checkout, this will usually finish
|
||
|
very quickly.
|
||
|
|
||
|
#. Copy the :bi:`contents` of your working directory, except for the
|
||
|
`tessdata` directory, to `C:\\BuildFolder\\tesseract-3.0x`, where
|
||
|
``x`` should probably be the latest stable release + ``alpha``,
|
||
|
``beta``, etc.
|
||
|
|
||
|
#. Optionally, follow Step 5 from :ref:`above <copying-headers>`.
|
||
|
|
||
|
#. You'll probably want to set an environment varible named
|
||
|
``TESSDATA_PREFIX`` to point at your working copy directory (since
|
||
|
that now contains the latest `tessdata` directory).
|
||
|
|
||
|
#. If someone hasn't already done so, you have to proceed to
|
||
|
:ref:`updating-vs2008-directory`. You can skip all the steps that
|
||
|
relate to updating the version number. Otherwise, depending on how
|
||
|
many changes have been made since the last stable release, you may
|
||
|
have little or no work to do.
|
||
|
|
||
|
..
|
||
|
Local Variables:
|
||
|
coding: utf-8
|
||
|
mode: rst
|
||
|
indent-tabs-mode: nil
|
||
|
sentence-end-double-space: t
|
||
|
fill-column: 72
|
||
|
mode: auto-fill
|
||
|
standard-indent: 3
|
||
|
tab-stop-list: (3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60)
|
||
|
End:
|