The size() method returns a size_type value which is an unsigned type.
As there is no portable format string for that type, a type cast is needed.
Fix also several signed / unsigned mismatches which resulted in compiler
warnings.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.
As a result of this file removal, we now use:
std::unordered_map
std::unordered_set
std::unique_ptr
directly in the codebase with '#include' for the needed headers.
They were removed in commit d70f3c3663.
The old code implicitly added `-llept` by using the `AC_CHECK_LIB` macro.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Examples:
In training/tesstrain.sh line 64:
if (( ${LINEDATA} )); then
^-- SC2004: $/${} is unnecessary on arithmetic variables.
In training/tesstrain.sh line 56:
source `dirname $0`/language-specific.sh
^-- SC2006: Use $(..) instead of legacy `..`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
training/commontraining.cpp:824:3: warning:
'register' storage class specifier is deprecated and incompatible with C++1z [-Wdeprecated-register]
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes several gcc warnings:
warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Assertions are good for programming errors, but not for wrong user input.
The new code no longer needs File::ReadFileToStringOrDie, so remove that
method.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This changes the error message for a missing font from
Could not find font named Times New Roman.Please correct --font arg.
(missing space after first sentence) to
Could not find font named Times New Roman.
Please correct --font arg.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
gcc reported this warning:
../training/stringrenderer.cpp:
In member function ‘void tesseract::StringRenderer::SetLayoutProperties()’:
../training/stringrenderer.cpp:211:42: warning:
ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
set_features("liga, clig, dlig, hlig");
^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit 65504c8cd2 misplaced the #endif.
The definition of _GNU_SOURCE is only needed for Cygwin.
Defining _GNU_SOURCE on Linux results in compiler warnings because this
macro is already defined by the compiler.
Fix this by moving the #endif to the right place. In addition the code
for Cygwin is made more robust: If a future Cygwin compiler defines
_GNU_SOURCE, too, the code will still work.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
On msys2 pango seems to always returns empty string for the suggested
font. It's a good idea to check that the string is not empty before
printing it - on all platforms.
Pango's opentype feature selection functions are only available
from version 1.38+, which is still quite new, so ensure it's just
ignored if using an older version.
This enables all OpenType ligatures for a specific font, where
available. Specifically, it explicitly enables the OpenType
features liga (standard ligatures), hlig (historical ligatures),
clig (contextual ligatures), and dlig (discretionary ligatures).
This feature requires Pango 1.38 or newer.
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.
So whereas previously one would have used this:
--fontlist Times New Roman + Arial Black
Now they should be specified like this:
--fontlist "Times New Roman" "Arial Black"
Character properties are autogenerated only if wctype is found on the
system. However, it is not possible to know if a version of
unicharset_extractor was compiled with this support (especially if it
was installed as a pre-compiled binary).
This commit adds a print to the usage details to output if the binary
was compiled with wctype support.
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
mktemp is a better idea for security, as well as enabling users to
specify a different directory using the TMPDIR environment variable,
which is useful if /tmp is a small tmpfs.
Also fix a bug where the first few log messages were failing as the
workspace directory wasn't been created early enough.
This flag can be used to specify multiple different exposure levels
for a training. There was some code already in tesstrain_utils.sh
to deal with multiple exposure levels, so it looks like this
functionality was always intended.
The default usage does not change, with exposure level 0 being the
only one used if --exposures is not used.
The --bin_dir option to tesstrain.sh is not useful, as $PATH does the
same job much better, so switch to relying on that instead.
This also makes the code a bit more readable, as it removes the need
to refer to binaries as COMMAND_NAME_EXE rather than just command_name.
The fontconfig initialisation hardcodes using Arial. However it may
not be available, whereas the fonts being used later will be, so use
one of them for initialisation instead.
Previously the fonts specified in language-selection.sh would override
any specified on the command line.
This changes language-specific.sh from overriding a user request to
just setting the default fonts if none are specified with --fontlist.
The --fontlist argument to tesstrain.sh was always ignored, even if
the language had no specific fonts specified in language-specific.sh.
Change this behaviour so the --fontlist argument is used if no specifc
fonts are selected by language-specific.sh.