Tesseract supports hierarchies of languages and uses them since
the new files best/*.traineddata were added.
Now `tesseract --list-langs` also shows any traineddata files in
subdirectories of the tessdata directory.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The default resolution is used for images without an explicit resolution
or with an unreasonable resolution (smaller than 70 or larger than 2400).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
api/baseapi.cpp:1621:17: warning:
variable 'font_name' set but not used [-Wunused-but-set-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The indentation is wrong since commit
fd0683f9e0 and results in a gcc warning:
api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)':
api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
if (tessedit_page_number >= 0)
^~
api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset)
^~~
Signed-off-by: Stefan Weil <sw@weilnetz.de>
See issue #424.
The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class. Generally it won't
be possible for a regular C compiler to do this properly.
It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.
This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'. Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
api/baseapi.cpp:1743:11: warning:
unused variable 'kBytesPerBlob' [-Wunused-const-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It conflicts with a previous 'class' declaration for ETEXT_DESC:
include/tesseract/baseapi.h:594:21:
Struct 'ETEXT_DESC' was previously declared as a class
Signed-off-by: Stefan Weil <sw@weilnetz.de>
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf