This fixes several gcc warnings:
warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Assertions are good for programming errors, but not for wrong user input.
The new code no longer needs File::ReadFileToStringOrDie, so remove that
method.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This changes the error message for a missing font from
Could not find font named Times New Roman.Please correct --font arg.
(missing space after first sentence) to
Could not find font named Times New Roman.
Please correct --font arg.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
gcc reported this warning:
../training/stringrenderer.cpp:
In member function ‘void tesseract::StringRenderer::SetLayoutProperties()’:
../training/stringrenderer.cpp:211:42: warning:
ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
set_features("liga, clig, dlig, hlig");
^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit 65504c8cd2 misplaced the #endif.
The definition of _GNU_SOURCE is only needed for Cygwin.
Defining _GNU_SOURCE on Linux results in compiler warnings because this
macro is already defined by the compiler.
Fix this by moving the #endif to the right place. In addition the code
for Cygwin is made more robust: If a future Cygwin compiler defines
_GNU_SOURCE, too, the code will still work.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
On msys2 pango seems to always returns empty string for the suggested
font. It's a good idea to check that the string is not empty before
printing it - on all platforms.
Pango's opentype feature selection functions are only available
from version 1.38+, which is still quite new, so ensure it's just
ignored if using an older version.
This enables all OpenType ligatures for a specific font, where
available. Specifically, it explicitly enables the OpenType
features liga (standard ligatures), hlig (historical ligatures),
clig (contextual ligatures), and dlig (discretionary ligatures).
This feature requires Pango 1.38 or newer.
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.
So whereas previously one would have used this:
--fontlist Times New Roman + Arial Black
Now they should be specified like this:
--fontlist "Times New Roman" "Arial Black"
Character properties are autogenerated only if wctype is found on the
system. However, it is not possible to know if a version of
unicharset_extractor was compiled with this support (especially if it
was installed as a pre-compiled binary).
This commit adds a print to the usage details to output if the binary
was compiled with wctype support.
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
mktemp is a better idea for security, as well as enabling users to
specify a different directory using the TMPDIR environment variable,
which is useful if /tmp is a small tmpfs.
Also fix a bug where the first few log messages were failing as the
workspace directory wasn't been created early enough.
This flag can be used to specify multiple different exposure levels
for a training. There was some code already in tesstrain_utils.sh
to deal with multiple exposure levels, so it looks like this
functionality was always intended.
The default usage does not change, with exposure level 0 being the
only one used if --exposures is not used.