:version: $RCSfile: index.rst,v $ $Revision: 76e0bf38aaba $ $Date: 2011/03/22 00:48:41 $ .. default-role:: fs =========================== Setting up |Tesseractocr| =========================== The Visual Studio 2008 Solutions included with |Tesseractocr|, rely on *relative paths* to reference files and directories --- including locations that are *outside* of the `tesseract-3.0x` tree. It is therefore vitally important to correctly set up the directories for the various components. This section describes how to do this. .. _directory-setup: Initial "Build" directory setup =============================== First create an empty directory where you will unpack all the required downloads. Assume you call this directory `C:\\BuildFolder`. .. _download-leptonica: 1. Download the |Leptonica| 1.68 pre-built binary package (`leptonica-1.68-win32-lib-include-dirs.zip`) from: http://code.google.com/p/leptonica/downloads/detail?name=leptonica-1.68-win32-lib-include-dirs.zip and unpack it to `C:\\BuildFolder`. 2. |Leptonica|, even on Windows as of v1.68, still requires a few unix utilities (like `rm`, `diff`, `sleep`). The easiest way to deal with this is to follow the instructions at `Installing Cygwin coreutils `_. At this point, if all you want to do is link with `libtesseract` you can `download `_ the file that just contains the "public" |Tesseractocr| headers along with the precompiled library binaries for Windows. Unpack it to `C:\\BuildFolder` and you'll now have:: C:\BuildFolder\ include\ leptonica\ tesseract\ leptonica_versionnumbers.vsprops tesseract_versionnumbers.vsprops lib\ giflib416-static-mtdll-debug.lib giflib416-static-mtdll.lib libjpeg8c-static-mtdll-debug.lib libjpeg8c-static-mtdll.lib liblept168-static-mtdll-debug.lib liblept168-static-mtdll.lib liblept168.dll liblept168.lib liblept168d.dll liblept168d.lib libpng143-static-mtdll-debug.lib libpng143-static-mtdll.lib libtesseract302.dll libtesseract302.lib libtesseract302d.dll libtesseract302d.lib libtesseract302-static.lib libtesseract302-static-debug.lib libtiff394-static-mtdll-debug.lib libtiff394-static-mtdll.lib zlib125-static-mtdll-debug.lib zlib125-static-mtdll.lib and you can skip the rest of this page and go directly to :doc:`programming`. The recommended action, however, is to download the |Tesseractocr| sources and build them yourself. Therefore... 3. Download the |Tesseractocr| Visual Studio 2008 source files from the `downloads page `_. If, for example, you'd like to build v3.02 you would use the following link: http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-vs2008.zip Unpack the file to `C:\\BuildFolder` You would now have the following directory structure:: C:\BuildFolder\ include\ leptonica\ leptonica_versionnumbers.vsprops tesseract_versionnumbers.vsprops lib\ giflib416-static-mtdll-debug.lib giflib416-static-mtdll.lib libjpeg8c-static-mtdll-debug.lib libjpeg8c-static-mtdll.lib liblept168-static-mtdll-debug.lib liblept168-static-mtdll.lib liblept168.dll liblept168.lib liblept168d.dll liblept168d.lib libpng143-static-mtdll-debug.lib libpng143-static-mtdll.lib libtiff394-static-mtdll-debug.lib libtiff394-static-mtdll.lib zlib125-static-mtdll-debug.lib zlib125-static-mtdll.lib tesseract-3.02\ vs2008\ ambiguous_words\ classifier_tester\ cntraining\ combine_tessdata\ dawg2wordlist\ doc\ include\ libtesseract\ libtesseract.vcproj mftraining\ port\ shapeclustering\ sphinx\ tesseract\ tesseract.vcproj unicharset_extractor\ wordlist2dawg\ tesseract.sln tesshelper.py 4. Download the |Tesseractocr| source files for the same version as the VS2008 files you just unpacked. In this case, the proper link would be: http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.tar.gz Unpack the file to `C:\\BuildFolder` This will add a bunch of directories to your already existing `C:\\BuildFolder\\tesseract-3.0x` directory. You should now have (for v3.02):: C:\BuildFolder\ include\ leptonica\ lib\ tesseract-3.02\ api\ ccmain\ ccstruct\ ccutil\ classify\ config\ contrib\ cube\ cutil\ dict\ doc\ image\ java\ image\ neural_networks\ tessdata\ testing\ textord\ training\ viewer\ vs2008\ wordrec\ .. _copying-headers: If you are planning on writing applications that link with |Tesseractocr|, and you don't want to add all the `tesseract-3.0x` directories to your project's list of ``include`` directories, then do this additional step: 5. Copy all the required headers to the "public" include folder. If you already have a `C:\\BuildFolder\\include\\tesseract` directory you should delete it in case some of the files have been removed. Then use the python `tess-helper.py` script to copy (possibly updated versions of) the required headers by doing:: cd C:\BuildFolder\tesseract-3.02\vs2008 python tesshelper.py .. copy ..\..\include See :ref:`tesshelper` for more details. You are now ready to :doc:`build ` |Tesseractocr| using Visual Studio 2008. .. _using-latest-sources: Using the latest |Tesseractocr| sources ======================================= If you'd like to try the absolute latest version of |Tesseractocr|, here's how to download the source files from its SVN repository: 1. Follow Steps 1 and 2 :ref:`above `. #. `Checkout `_ the |Tesseractocr| sources to a directory on your computer. This directory should :bi:`not` be `C:\\BuildFolder`! If you are unfamiliar with `SVN `_, the easiest way to do this is to first download and install `TortoiseSVN `_ and then: a. Right-click the (empty) directory where you want the working copy and choose :menuselection:`SVN Chec&kout...` from the pop-up menu. #. Enter ``http://tesseract-ocr.googlecode.com/svn/trunk/`` for :guilabel:`&URL of repository`. You can keep all the other settings at their defaults. .. image:: images/tortoisesvn_checkout.png :align: center :alt: TortoiseSVN Checkout Dialog Box #. Click the :guilabel:`&OK` button to commence downloading the |Tesseractocr| sources to your computer. This might take a while as the language data in the `tessdata` directory is quite large. As of February 2012, about 335MB needs to be transferred for the initial checkout. The total size of the resulting working copy is about 1.2GB. #. Keeping your working copy up to date after this is as simple as right-clicking its directory and choosing :menuselection:`SVN &Update`. Unlike the initial checkout, this will usually finish very quickly. #. Copy the :bi:`contents` of your working directory, except for the `tessdata` directory, to `C:\\BuildFolder\\tesseract-3.0x`, where ``x`` should probably be the latest stable release + ``alpha``, ``beta``, etc. #. Optionally, follow Step 5 from :ref:`above `. #. You'll probably want to set an environment varible named ``TESSDATA_PREFIX`` to point at your working copy directory (since that now contains the latest `tessdata` directory). #. If someone hasn't already done so, you have to proceed to :ref:`updating-vs2008-directory`. You can skip all the steps that relate to updating the version number. Otherwise, depending on how many changes have been made since the last stable release, you may have little or no work to do. .. Local Variables: coding: utf-8 mode: rst indent-tabs-mode: nil sentence-end-double-space: t fill-column: 72 mode: auto-fill standard-indent: 3 tab-stop-list: (3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60) End: