See issue #424.
The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class. Generally it won't
be possible for a regular C compiler to do this properly.
It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.
This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'. Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
api/baseapi.cpp:1743:11: warning:
unused variable 'kBytesPerBlob' [-Wunused-const-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It conflicts with a previous 'class' declaration for ETEXT_DESC:
include/tesseract/baseapi.h:594:21:
Struct 'ETEXT_DESC' was previously declared as a class
Signed-off-by: Stefan Weil <sw@weilnetz.de>
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
Takes advantage of inheritance and dir="ltr" default to:
- only generate paragraph dirs which are not ltr
- only generate word dirs which don't match enclosing paragraph
Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).
This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.
Eliminated the flexfx scheme for calling global feature extractor functions
through an array of function pointers.
Deleted dead code I found as a by-product.
This CL does not change BlobToTrainingSample or ExtractFeatures to be full
members of Classify (the eventual goal) as that would make it even bigger,
since there are a lot of callers to these functions.
When ExtractFeatures and BlobToTrainingSample are members of Classify they
will be able to access control parameters in Classify, which will greatly
simplify developing variations to the feature extraction process.