This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently
supports only LSTM and not the Tesseract 3 training mode.
I attempted to keep source changes minimal so it would be easy to compare
bash to Python in code review and confirm equivalence.
Python 3.6+ is required. Ubuntu 18.04 ships Python 3.6 and it is a mandatory
package (the package manager is also written in Python), so it is available
in the baseline Tesseract 4.0 system.
There are minor output and behavioral changes, and advantages. Python's loggingis used. Temporary files are only deleted on success, so they can be inspected
if training files. Console output is more terse and the log file is more
verbose. And there are progress bars! (The python3-tqdm package is required.)
Where tesstrain.sh would sometimes fail without explanation and return an error
code of 1, it is much easier to find the point of failure in this version.
That was also the main motivation for this work.
Argument checking is also more comprehensive.
Compiler warning on macOS:
tesscallback.h:29:7: warning:
'TessClosure' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
globaloc.cpp:33:6: warning: no previous extern declaration for
non-static variable 'global_crash_pixes'
[-Wmissing-variable-declarations]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes some compiler warnings:
mainblk.cpp:28:9: warning: macro is not used [-Wunused-macros]
mainblk.cpp:29:9: warning: macro is not used [-Wunused-macros]
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.
Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The format string which builds the command only takes one or two
string arguments, so the function allocated too much memory and
passed too many arguments to snprintf.
This also fixes a compiler warning (clang).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes two warnings from LGTM:
Parameter feature_defs hides a global variable with the same name.
Parameter Config hides a global variable with the same name.
Signed-off-by: Stefan Weil <sw@weilnetz.de>