Commit Graph

20 Commits

Author SHA1 Message Date
Ray Smith
a912967cc3 Rewrote unicharset_extractor to use the new string normalizer and read plain text as well as box files. 2017-09-08 11:49:57 +01:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Stefan Weil
b75beda7f9 Fix some issues reported by shellcheck (SC2004, SC2006)
Examples:

In training/tesstrain.sh line 64:
if (( ${LINEDATA} )); then
      ^-- SC2004: $/${} is unnecessary on arithmetic variables.

In training/tesstrain.sh line 56:
source `dirname $0`/language-specific.sh
       ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-14 14:11:24 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
a987e6d87c Major bug fixes to pango renderer and resolved issue of hash_map vs unordered_map 2016-11-07 11:35:45 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
zdenop
d025616af5 Merge pull request #148 from nickjwhite/tesstrainbetterargs
Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
2015-11-27 21:56:40 +01:00
Nick White
306610c5ec Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.

So whereas previously one would have used this:
  --fontlist Times New Roman + Arial Black
Now they should be specified like this:
  --fontlist "Times New Roman" "Arial Black"
2015-10-30 13:26:45 +00:00
Nick White
b389e2782c Set default exposure settings for grc training 2015-09-11 13:30:05 +01:00
Nick White
de789ac8ea Use mktemp to create workspace directory
mktemp is a better idea for security, as well as enabling users to
specify a different directory using the TMPDIR environment variable,
which is useful if /tmp is a small tmpfs.

Also fix a bug where the first few log messages were failing as the
workspace directory wasn't been created early enough.
2015-09-10 15:05:07 +01:00
Nick White
c0133ecfa6 Add --exposures option to tesstrain.sh
This flag can be used to specify multiple different exposure levels
for a training. There was some code already in tesstrain_utils.sh
to deal with multiple exposure levels, so it looks like this
functionality was always intended.

The default usage does not change, with exposure level 0 being the
only one used if --exposures is not used.
2015-09-10 14:57:17 +01:00
Nick White
8e71c79dc2 Remove --bin_dir option from tesstrain.sh (should use $PATH instead)
The --bin_dir option to tesstrain.sh is not useful, as $PATH does the
same job much better, so switch to relying on that instead.

This also makes the code a bit more readable, as it removes the need
to refer to binaries as COMMAND_NAME_EXE rather than just command_name.
2015-08-26 18:49:14 +01:00
Nick White
e110b14465 tesstrain.sh: Initialise fontconfig even if Arial isn't available
The fontconfig initialisation hardcodes using Arial. However it may
not be available, whereas the fonts being used later will be, so use
one of them for initialisation instead.
2015-08-26 18:32:44 +01:00
Nick White
8d0f59d09d tesstrain.sh: Only fall back to default Latin fonts if none were provided
The --fontlist argument to tesstrain.sh was always ignored, even if
the language had no specific fonts specified in language-specific.sh.

Change this behaviour so the --fontlist argument is used if no specifc
fonts are selected by language-specific.sh.
2015-08-26 18:14:30 +01:00
Zdenko Podobný
bf3f125ad5 change links from code.google.com to github.com 2015-07-11 09:43:31 +02:00
Ray Smith
19c26f0646 Removed warning about ligatures and fixed root font dir 2015-07-10 15:56:33 -07:00
Ray Smith
a303ab9d00 Misc fixes, mostly clang formatting, but some bug fixes in matrix, werd, and tesstrain_utils. Also updates unicharset to match traineddata files. 2015-07-09 14:28:20 -07:00
Ray Smith
4c7ab0caea Fixed font lists, improved wordlist management 2015-06-12 10:56:40 -07:00
Jim O'Regan
16ac3b0a20 /usr/share/fonts is the wrong path on Mac 2015-05-18 09:53:14 +01:00
Ray Smith
6be25156f7 Major updates to training system as a result of extensive testing on 100 languages 2015-05-12 18:04:31 -07:00