This fixes a compiler warning:
api/baseapi.cpp:1621:17: warning:
variable 'font_name' set but not used [-Wunused-but-set-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The indentation is wrong since commit
fd0683f9e0 and results in a gcc warning:
api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)':
api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
if (tessedit_page_number >= 0)
^~
api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset)
^~~
Signed-off-by: Stefan Weil <sw@weilnetz.de>
See issue #424.
The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class. Generally it won't
be possible for a regular C compiler to do this properly.
It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.
This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'. Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
api/baseapi.cpp:1743:11: warning:
unused variable 'kBytesPerBlob' [-Wunused-const-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It conflicts with a previous 'class' declaration for ETEXT_DESC:
include/tesseract/baseapi.h:594:21:
Struct 'ETEXT_DESC' was previously declared as a class
Signed-off-by: Stefan Weil <sw@weilnetz.de>
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
Takes advantage of inheritance and dir="ltr" default to:
- only generate paragraph dirs which are not ltr
- only generate word dirs which don't match enclosing paragraph
Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).
This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.