To use libtesseract in your own application you need to include Leptonica’s allheaders.h, and Tesseract-OCR’s baseapi.h and strngs.h.
Tesseract-OCR uses liblept mainly for image I/O, but you can also use any of Leptonica’s many image processing functions on PIX, while at the same time calling TessBaseAPI methods. See the Leptonica documentation for more details.
There doesn’t seem to be any documentation on api\baseapi.h, but it has extensive comments. You can also look at the APITest Sample and APIExamples projects.
See the APITest Sample project for an example of which compiler and linker settings you need for various build configurations. The easiest way to begin a new application is to just make a copy of the APITest directory. See this step for detailed instructions (skip the last step about adding Project Dependencies).
If you want to manually set the required settings, then here’s the list of things to do:
Add the following Preprocessor Definitions when compiling any files that include baseapi.h and you are linking with the static library versions of libtesseract:
USE_STD_NAMESPACE
If you are linking with the DLL versions of libtesseract instead add:
USE_STD_NAMESPACE;TESSDLL_IMPORTS;CCUTIL_IMPORTS;LIBLEPT_IMPORTS
Be sure to add the following to Additional Include Directories:
C:\BuildFolder\include
C:\BuildFolder\include\leptonica
C:\BuildFolder\include\tesseract or
<tesseract-3.0x dir> (all its sub-directories that contain header files)
Add C:\BuildFolder\lib to your Additional Library Directories.
In the C:\BuildFolder\include directory are two Visual Studio Property Sheet files:
tesseract_versionnumbers.vsprops
leptonica_versionnumbers.vsprops
Using tesseract_versionnumbers.vsprops (which automatically inherits leptonica_versionnumbers.vsprops) can make it easier to specify the libraries you need to import. For example, when creating a staticly linked debug executable you can say:
zlib$(ZLIB_VERSION)-static-mtdll-debug.lib
libpng$(LIBPNG_VERSION)-static-mtdll-debug.lib
libjpeg$(LIBJPEG_VERSION)-static-mtdll-debug.lib
giflib$(GIFLIB_VERSION)-static-mtdll-debug.lib
libtiff$(LIBTIFF_VERSION)-static-mtdll-debug.lib
liblept$(LIBLEPT_VERSION)-static-mtdll-debug.lib
libtesseract$(LIBTESS_VERSION)-static-debug.lib
to make your application less dependent on library version numbers.
To add the Property Sheet to a Project, open its Properties Pages Dialog, and set the Configuration Properties | General | Inherited Project Property Sheets item to:
..\..\..\include\tesseract_versionnumbers.vsprops
Choosing View ‣ Other Windows ‣ Property Manager from the menubar will let you see the Properties attached to each Project’s configurations.
Note
The DLL versions of libtesseract currently only export the TessBaseAPI C++ class from baseapi.h, there is no C function interface yet.
Note
The DLL versions of libtesseract currently only export the TessBaseAPI and STRING classes. In theory, all you need is are those classes. However, if you find yourself having to manipulate other “internal” tesseract objects then you currently have to link with the static library versions of libtesseract.
Warning
The Release versions of liblept, by design, never print out any possibly helpful messages to the console. Therefore, it is highly recommended that you do your initial development using the Debug versions of liblept. See Compile-time control over stderr output for details.
<<<Need to add the URL of the zip file that contains include & lib directory contents for those people who don’t want to build libtesseract themselves>>>
Before debugging programs written with libtesseract, you should first download the latest Leptonica sources (currently leptonica-1.68.tar.gz) and VS2008 source package (vs2008-1.68.zip) from:
http://code.google.com/p/leptonica/downloads/detail?name=leptonica-1.68.tar.gz
http://code.google.com/p/leptonica/downloads/detail?name=vs2008-1.68.zip
Unpack them to C:\BuildFolder to get the following directory structure:
C:\BuildFolder\
include\
lib\
leptonica-1.68\
vs2008\
tesseract-3.02\
vs2008\
testing\
tessdata\
(see Building the liblept library for more information)
Tesseract-OCR uses Leptonica “under the hood” for all (most? some?) of its image processing operations. Having the source available (and compiling it in debug mode) will make it easier to see what’s really going on.
You might want to add C:\BuildFolder\leptonica-1.68\vs2008\leptonica.vcproj and C:\BuildFolder\tesseract-3.02\vs2008\libtesseract\libtesseract.vcproj to your solution by right-clicking it and choosing Add ‣ Existing Project.... This seems to make VS2008’s Intellisense work better when finding “external” source files.
Definitely create a TESSDATA_PREFIX``x environment variable so that it contains the absolute path of the directory that contains the ``tessdata directory. Otherwise you’ll have to put a tessdata directory in every temporary build folder which quickly becomes painful (especially since tessdata has gotten very big — 600MB!).
The APITest Solution contains the minimal settings needed to link with libtesseract. It demonstrates the typical situation, where the “external” application’s source files reside outside of the tesseract-3.0x directory tree.
To build the vs2008\APITest Solution, first copy it to your C:\BuildFolder directory. This should now look like:
C:\BuildFolder\
include\
leptonica\
tesseract\
leptonica_versionnumbers.vsprops
tesseract_versionnumbers.vsprops
lib\
giflib416-static-mtdll-debug.lib
giflib416-static-mtdll.lib
libjpeg8c-static-mtdll-debug.lib
libjpeg8c-static-mtdll.lib
liblept168-static-mtdll-debug.lib
liblept168-static-mtdll.lib
liblept168.dll
liblept168.lib
liblept168d.dll
liblept168d.lib
libpng143-static-mtdll-debug.lib
libpng143-static-mtdll.lib
libtesseract302.dll
libtesseract302.lib
libtesseract302d.dll
libtesseract302d.lib
libtesseract302-static.lib
libtesseract302-static-debug.lib
libtiff394-static-mtdll-debug.lib
libtiff394-static-mtdll.lib
zlib125-static-mtdll-debug.lib
zlib125-static-mtdll.lib
tesseract-3.02\
APITest\
baseapitester\
baseapitester.cpp
baseapitester.rc
baseapitester.vcproj
resource.h
stdafx.cpp
stdafx.h
targetver.h
APITest.sln
The APITest contains just the baseapitester project. This was created using the VS2008 Win32 Console Application Project Wizard and then just copying most of tesseractmain.cpp and making minor edits. Its settings correctly refer to the “public” include and lib directories using relative paths.
It assumes that the C:\BuildFolder\include directory has been properly setup. See this for more details.
The C:\BuildFolder\lib directory will automatically get libtesseract copied to it whenever it is built.
The include\tesseract_versionnumbers.vsprops Property Sheet is used to avoid explicit library version number dependencies. Precompiled headers are used. LIB_Release, LIB_Debug, DLL_Release, and DLL_Debug build configurations are supported.
The following are the compiler command lines and linker options used. See Compiling a C/C++ Program | Compiler Options for a detailed explanation of these options.
LIB_Release C/C++ Command Line:
/O2
/I "." /I "..\..\include" /I "..\..\include\leptonica"
/I "..\..\include\tesseract"
/D "WIN32" /D "_WINDOWS" /D "NDEBUG"
/D "USE_STD_NAMESPACE" /D "_MBCS"
/FD /EHsc /MD /Yc"stdafx.h"
/Fp"LIB_Release\baseapitester.pch" /Fo"LIB_Release\\"
/Fd"LIB_Release\vc90.pdb"
/W3 /nologo /c
/wd4244 /wd4305 /wd4018 /wd4267 /wd4996 /wd4800 /wd4005 /wd4355 /wd4099 /wd4566
/errorReport:prompt
LIB_Release Linker Additional Dependencies:
ws2_32.lib
user32.lib
zlib$(ZLIB_VERSION)-static-mtdll.lib
libpng$(LIBPNG_VERSION)-static-mtdll.lib
libjpeg$(LIBJPEG_VERSION)-static-mtdll.lib
giflib$(GIFLIB_VERSION)-static-mtdll.lib
libtiff$(LIBTIFF_VERSION)-static-mtdll.lib
liblept$(LIBLEPT_VERSION)-static-mtdll.lib
libtesseract$(LIBTESS_VERSION)-static.lib
LIB_Debug C/C++ Command Line:
/Od
/I "." /I "..\..\include" /I "..\..\include\leptonica"
/I "..\..\include\tesseract"
/D "WIN32" /D "_WINDOWS" /D "_DEBUG"
/D "USE_STD_NAMESPACE" /D "_MBCS"
/FD /EHsc /RTC1 /MDd /Yc"stdafx.h"
/Fp"LIB_Debug\baseapitesterd.pch" /Fo"LIB_Debug\\"
/Fd"LIB_Debug\vc90.pdb"
/W3 /nologo /c /Z7
/wd4244 /wd4305 /wd4018 /wd4267 /wd4996 /wd4800 /wd4005 /wd4355 /wd4099 /wd4566
/errorReport:prompt
LIB_Debug Linker Additional Dependencies:
ws2_32.lib
user32.lib
zlib$(ZLIB_VERSION)-static-mtdll-debug.lib
libpng$(LIBPNG_VERSION)-static-mtdll-debug.lib
libjpeg$(LIBJPEG_VERSION)-static-mtdll-debug.lib
giflib$(GIFLIB_VERSION)-static-mtdll-debug.lib
libtiff$(LIBTIFF_VERSION)-static-mtdll-debug.lib
liblept$(LIBLEPT_VERSION)-static-mtdll-debug.lib
libtesseract$(LIBTESS_VERSION)-static-debug.lib
DLL_Release C/C++ Command Line:
/O2
/I "." /I "..\..\include" /I "..\..\include\leptonica"
/I "..\..\include\tesseract"
/D "WIN32" /D "_WINDOWS" /D "NDEBUG"
/D "USE_STD_NAMESPACE" /D "_MBCS"
/D "TESSDLL_IMPORTS" /D "CCUTIL_IMPORTS" /D "LIBLEPT_IMPORTS"
/FD /EHsc /MD /Yc"stdafx.h"
/Fp"DLL_Release\baseapitester-dll.pch" /Fo"DLL_Release\\"
/Fd"DLL_Release\vc90.pdb"
/W3 /nologo /c
/wd4244 /wd4305 /wd4018 /wd4267 /wd4996 /wd4800 /wd4005 /wd4355 /wd4099 /wd4566
/errorReport:prompt
DLL_Release Linker Additional Dependencies:
ws2_32.lib
user32.lib
liblept$(LIBLEPT_VERSION).lib
libtesseract$(LIBTESS_VERSION).lib
DLL_Debug C/C++ Command Line:
/Od
/I "." /I "..\..\include" /I "..\..\include\leptonica"
/I "..\..\include\tesseract"
/D "WIN32" /D "_WINDOWS" /D "_DEBUG"
/D "USE_STD_NAMESPACE" /D "_MBCS"
/D "TESSDLL_IMPORTS" /D "CCUTIL_IMPORTS" /D "LIBLEPT_IMPORTS"
/FD /EHsc /RTC1 /MDd /Yc"stdafx.h"
/Fp"DLL_Debug\baseapitester-dlld.pch" /Fo"DLL_Debug\\"
/Fd"DLL_Debug\vc90.pdb"
/W3 /nologo /c /Z7
/wd4244 /wd4305 /wd4018 /wd4267 /wd4996 /wd4800 /wd4005 /wd4355 /wd4099 /wd4566
/errorReport:prompt
DLL_Debug Linker Additional Dependencies:
ws2_32.lib
user32.lib
liblept$(LIBLEPT_VERSION)d.lib
libtesseract$(LIBTESS_VERSION)d.lib
<<<NEEDS WORK>>>
Currently two Projects are in this solution:
preprocessing – Demonstrates how to use Leptonica’s image processing functions to clean up images before calling TessBaseAPI::SetImage().
getinfo – Demonstrates calling various TessBaseAPI methods to get back information on the OCR process.
Only defined when building under Linux. This causes the inclusion of config_auto.h, which is only auto-generated during the ./configure process and thus not visible on Windows.
This is what sets the VERSION macro (and lots of other configuration related macros).
Should be defined when building apps that link to a DLL version of libtesseract. Used as follows in baseapi.h:
#ifdef TESSDLL_EXPORTS
#define TESSDLL_API __declspec(dllexport)
#elif defined(TESSDLL_IMPORTS)
#define TESSDLL_API __declspec(dllimport)
#else
#define TESSDLL_API
#endif
If you don’t define this then you’ll get “undefined external symbol” errors.
Should be defined when building apps that link to a DLL version of libtesseract. Used as follows in strngs.h:
#ifdef CCUTIL_EXPORTS
#define CCUTIL_API __declspec(dllexport)
#elif defined(CCUTIL_IMPORTS)
#define CCUTIL_API __declspec(dllimport)
#else
#define CCUTIL_API
#endif
If you don’t define this then you’ll get “undefined external symbol STRING” errors.
Should be defined when building apps that link to a DLL version of Leptonica. Used as follows in environ.h:
#if defined(LIBLEPT_EXPORTS) || defined(LEPTONLIB_EXPORTS)
#define LEPT_DLL __declspec(dllexport)
#elif defined(LIBLEPT_IMPORTS) || defined(LEPTONLIB_IMPORTS)
#define LEPT_DLL __declspec(dllimport)
#else
#define LEPT_DLL
#endif
If you don’t define this then you’ll get “undefined external symbol” errors.
Causes the following to be done:
#ifdef USE_STD_NAMESPACE
using std::string;
using std::vector;
#endif
Used to indicate that the build target is Windows 32-bit or 64-bit (WIN32 and WINDOWS are also added by the New Project Wizards).
See C/C+ Preprocessor Reference | The Preprocessor | Macros | Predefined Macros for the complete list for Visual Studio 2008.