Commit Graph

282 Commits

Author SHA1 Message Date
Raf Schietekat
7f382df5ec Fewer g++ -Wsign-compare warnings (cont.) 2017-05-11 23:14:52 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Stefan Weil
0c88b72909 training: Fix format error and some compiler warnings
The size() method returns a size_type value which is an unsigned type.
As there is no portable format string for that type, a type cast is needed.

Fix also several signed / unsigned mismatches which resulted in compiler
warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Egor Pugin
2ea946d11c Turn on building of text2image. 2017-05-07 20:05:12 +03:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
1d6dd03bfc training: Replace memfree by free
free also accepts a nullptr argument, so the code can be simplified.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 18:14:00 +02:00
Stefan Weil
445befd3cb Remove unused include statements for freelist.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 17:12:43 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
500bfaf315 Added std:: to some stl types 2017-04-27 17:15:35 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Egor Pugin
0dcb6b3547 Rename cppan/cmake projects. 2017-02-23 15:39:58 +03:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Mikhail Solomennik
e2974cf953 err -> err_exit 2017-01-20 18:50:47 +03:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Egor Pugin
442b5b731a Fix building of training tools in shared configuration. 2016-12-17 16:19:35 +03:00
Zdenko Podobný
f8dffecf41 fix training build addition to 7c684be724 (Add missing linker flags for Leptonica) 2016-12-15 22:20:35 +01:00
Stefan Weil
7c684be724 Add missing linker flags for Leptonica
They were removed in commit d70f3c3663.
The old code implicitly added `-llept` by using the `AC_CHECK_LIB` macro.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-15 17:28:01 +01:00
zdenop
831e161066 Merge pull request #569 from stweil/nullptr
training: Replace NULL by nullptr
2016-12-15 09:05:20 +01:00
zdenop
a0201831c3 Merge pull request #576 from stweil/shellcheck
Fix some issues reported by shellcheck (SC2004, SC2006)
2016-12-15 08:30:30 +01:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Stefan Weil
cb6e9e0071 training: Replace NULL by nullptr
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-14 21:08:36 +01:00
Ray Smith
81ebba0394 More makefile changes to remove cube 2016-12-14 11:17:06 -08:00
Ray Smith
9f5ba9105f Removed dependency on cube from the code 2016-12-14 10:55:15 -08:00
Stefan Weil
b75beda7f9 Fix some issues reported by shellcheck (SC2004, SC2006)
Examples:

In training/tesstrain.sh line 64:
if (( ${LINEDATA} )); then
      ^-- SC2004: $/${} is unnecessary on arithmetic variables.

In training/tesstrain.sh line 56:
source `dirname $0`/language-specific.sh
       ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-14 14:11:24 +01:00
Stefan Weil
a9b300dc1d Use pkg-config for icu compiler and linker flags
The old settings are used as fallback if there is no configuration.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-13 13:29:34 +01:00
Stefan Weil
7755e05e50 training: Update Makefile for current Mingw-w64
Mingw-w64 no longer needs special linker options,
builds with those options fail.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-06 23:02:47 +01:00
Stefan Weil
70c6f1624c Fix #define guards in header files
Some guards were missing, others were not the first statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Egor Pugin
afd069c219 Fix build. 2016-12-01 12:51:03 +03:00
Egor Pugin
68aa285dcc Update CMakeLists.txt 2016-12-01 12:38:45 +03:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
9d9056716f Added std:: to vector 2016-11-30 15:45:36 -08:00
Ray Smith
53003f9074 Formatting changes from clang_tidy on latest pull 2016-11-30 15:44:25 -08:00
Stefan Weil
6158f7eae2 Simplify calls of free
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
67deea5703 Fix unix build. 2016-11-24 17:39:16 +03:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
zdenop
ac3b40de2f Merge pull request #478 from stweil/w
Fix some compiler warnings
2016-11-22 08:30:57 +01:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Stefan Weil
4f45940050 training: Fix compiler warnings (deprecated register keyword)
training/commontraining.cpp:824:3: warning:
 'register' storage class specifier is deprecated and incompatible with C++1z [-Wdeprecated-register]
...

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-14 22:34:15 +01:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
5d21ecfad3 Rendering/hash map changes part 2 2016-11-07 11:56:07 -08:00
Ray Smith
a987e6d87c Major bug fixes to pango renderer and resolved issue of hash_map vs unordered_map 2016-11-07 11:35:45 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
34af6155eb training: Remove unnecessary const qualifiers
This fixes several gcc warnings:

warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-08 11:28:22 +02:00
Zdenko Podobný
61032d9b14 set fonts_dir to system default font location. Fixes #409 2016-09-01 18:27:00 +02:00
Zdenko Podobný
916897da1b print text2image info to stdout instead of strerr 2016-09-01 13:38:06 +02:00
Stefan Weil
6ec1a0a09b fileio: Replace assert with tprintf() and exit(1)
Assertions are good for programming errors, but not for wrong user input.

The new code no longer needs File::ReadFileToStringOrDie, so remove that
method.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-30 07:13:56 +02:00
Stefan Weil
1950fec7a2 tlog: Remove unused macro TLOG_FATAL
The implementation was also wrong because it did not use __VA_ARGS__.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-29 19:11:01 +02:00
Stefan Weil
3420acabe5 text2image: Add linefeed to error message
This changes the error message for a missing font from

  Could not find font named Times New Roman.Please correct --font arg.

(missing space after first sentence) to

  Could not find font named Times New Roman.
  Please correct --font arg.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-28 21:33:52 +02:00
Stefan Weil
34ed8ddf62 stringrenderer: Fix compiler warning (-Wwrite-strings)
gcc reported this warning:

../training/stringrenderer.cpp:
 In member function ‘void tesseract::StringRenderer::SetLayoutProperties()’:
../training/stringrenderer.cpp:211:42: warning:
 ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
     set_features("liga, clig, dlig, hlig");
                                          ^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-28 14:07:03 +02:00
zdenop
939023ffb9 Merge pull request #391 from vidiecan/issue_390
fixed #390 by introducing new rotate_image flag
2016-08-15 20:04:30 +02:00
jm
b69561c802 fixed #390 by introducing new rotate_image flag 2016-08-15 18:16:35 +02:00
jm
941e1c4c84 fixes #388 by using raw bytes utf8 encoding 2016-08-15 18:11:01 +02:00
jm
8d2d94e4ed fixes some of the windows issue with text2image, see #380 2016-08-05 20:11:01 +02:00
zdenop
5ca73cca26 Merge pull request #355 from amitdo/pango-name-is-empty
Check that pango's suggested font name is not an empty string
2016-06-20 10:26:11 +02:00
Stefan Weil
ed053aab94 Fix Cygwin compatibility – part III
Commit 65504c8cd2 misplaced the #endif.
The definition of _GNU_SOURCE is only needed for Cygwin.

Defining _GNU_SOURCE on Linux results in compiler warnings because this
macro is already defined by the compiler.

Fix this by moving the #endif to the right place. In addition the code
for Cygwin is made more robust: If a future Cygwin compiler defines
_GNU_SOURCE, too, the code will still work.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-06-19 22:38:03 +02:00
amitdo
724fb894ac Check that pango's suggested font name is not an empty string
On msys2 pango seems to always returns empty string for the suggested
font. It's a good idea to check that the string is not empty before
printing it - on all platforms.
2016-06-19 13:40:17 +03:00
Amit
96720c785d Merge pull request #351 from amitdo/cygwin-compat
Fix Cygwin compatibility
2016-06-19 12:43:35 +03:00
Stefan Weil
65504c8cd2 Fix Cygwin compatibility - Part II 2016-06-19 11:59:58 +03:00
Amit Dovev
13d789d4df Merge pull request #288 from nickjwhite/opentypeligatures
Enable all ligatures available in a font for text2image rendering
2016-06-19 03:33:32 +03:00
Amit Dovev
034d666e7a Replace use of TLOG_FATAL() with tprintf() and exit(1) (#349)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-16 12:10:53 +03:00
Shreeshrii
c3a7fab349 Replace asserts with tprintf() and exit(1)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-14 14:35:05 +03:00
amitdo
cd1a14450c Training tools: Print help message when (argv == 1) 2016-05-22 11:16:42 +03:00
Zdenko Podobný
cab6de1740 remove unused GlyphLessFont files 2016-05-20 21:19:00 +02:00
Nick White
76ed9decb3 Only enable extra ligatures with recent Pango versions
Pango's opentype feature selection functions are only available
from version 1.38+, which is still quite new, so ensure it's just
ignored if using an older version.
2016-03-21 13:03:03 +00:00
Nick White
9100adcbde Enable all ligatures available in a font for text2image rendering
This enables all OpenType ligatures for a specific font, where
available. Specifically, it explicitly enables the OpenType
features liga (standard ligatures), hlig (historical ligatures),
clig (contextual ligatures), and dlig (discretionary ligatures).

This feature requires Pango 1.38 or newer.
2016-03-21 11:41:36 +00:00
Amit Dovev
96c2f637fd Add missing % char from format specifier in tlog()
- In training/ango_font_info.cpp
2016-03-17 01:09:46 +02:00
Egor Pugin
4d4bfb552c Add inactivity timeout for icu download on windows 2016-03-04 12:34:01 +03:00
Ryan Baumann
bd5452d40c Add Junicode to neo-Latin fonts 2016-01-13 10:15:57 -05:00
Ryan Baumann
5b40277d08 Use different font list and exposures for "lat" language training 2016-01-04 11:48:02 -05:00
Hamid Safdari
0cd6e17419 correct minor syntax errors language-specific.sh 2015-12-25 09:50:15 +04:30
Egor Pugin
c16c7831a2 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2015-11-30 11:43:18 +03:00
Egor Pugin
f15cd961c6 Download icu on windows to build set_unicharset_properties target. 2015-11-30 11:43:01 +03:00
zdenop
d025616af5 Merge pull request #148 from nickjwhite/tesstrainbetterargs
Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
2015-11-27 21:56:40 +01:00
zdenop
359593217b Merge pull request #149 from nickjwhite/updategrc
Add defaults for grc training to language-specific.sh
2015-11-27 21:55:46 +01:00
Stefan Weil
29f36d9264 training: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
Stefan Weil
38f3db8ca5 Fix more typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
Nick White
306610c5ec Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.

So whereas previously one would have used this:
  --fontlist Times New Roman + Arial Black
Now they should be specified like this:
  --fontlist "Times New Roman" "Arial Black"
2015-10-30 13:26:45 +00:00
zdenop
896db80f26 Merge pull request #108 from johnteslade/unicharset_extractor
Unicharset extractor problems with wchar
2015-10-05 21:49:03 +02:00
John Slade
379da1f2e0 training/unicharset_extractor.cpp: Print whether WCTYPE is included
Character properties are autogenerated only if wctype is found on the
system.  However, it is not possible to know if a version of
unicharset_extractor was compiled with this support (especially if it
was installed as a pre-compiled binary).

This commit adds a print to the usage details to output if the binary
was compiled with wctype support.
2015-10-05 11:54:24 +01:00
Egor Pugin
f369585f56 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2015-10-02 12:02:04 +03:00
Nick White
b389e2782c Set default exposure settings for grc training 2015-09-11 13:30:05 +01:00
Nick White
ac6050def0 Remove NUMBER_DAWG_FACTOR and WORD_DAWG_FACTOR from grc rules
These aren't used anywhere, and are difficult to calculate for grc,
so leave them as the default.
2015-09-11 09:29:55 +01:00
zdenop
b216f6f66b Merge pull request #92 from nickjwhite/bettertesstrain
Improve tesstrain.sh script
2015-09-10 19:00:26 +02:00
Nick White
714d2cc4ae Use different font list for grc training
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
2015-09-10 15:34:44 +01:00
Nick White
de789ac8ea Use mktemp to create workspace directory
mktemp is a better idea for security, as well as enabling users to
specify a different directory using the TMPDIR environment variable,
which is useful if /tmp is a small tmpfs.

Also fix a bug where the first few log messages were failing as the
workspace directory wasn't been created early enough.
2015-09-10 15:05:07 +01:00
Nick White
c0133ecfa6 Add --exposures option to tesstrain.sh
This flag can be used to specify multiple different exposure levels
for a training. There was some code already in tesstrain_utils.sh
to deal with multiple exposure levels, so it looks like this
functionality was always intended.

The default usage does not change, with exposure level 0 being the
only one used if --exposures is not used.
2015-09-10 14:57:17 +01:00
Egor Pugin
25136e40ea Restore ICU_INCLUDE_DIRS for OS X. 2015-09-07 13:02:22 +03:00
Egor Pugin
670e0fafb3 Hide pango and cairo includes from targets that do not use it. 2015-09-07 12:49:08 +03:00
Egor Pugin
da3852dc77 Fix cygwin build. 2015-09-07 02:49:18 +03:00
Egor Pugin
c0d8a07b9d Fix a lot of training tool to work on windows with static build. 2015-09-07 01:46:33 +03:00
Egor Pugin
03531ba8a5 Fix linux build. 2015-09-06 20:43:28 +03:00
Egor Pugin
5e3b8d33e3 Add source groups. 2015-09-06 13:47:30 +03:00
Nick White
8e71c79dc2 Remove --bin_dir option from tesstrain.sh (should use $PATH instead)
The --bin_dir option to tesstrain.sh is not useful, as $PATH does the
same job much better, so switch to relying on that instead.

This also makes the code a bit more readable, as it removes the need
to refer to binaries as COMMAND_NAME_EXE rather than just command_name.
2015-08-26 18:49:14 +01:00
Nick White
e110b14465 tesstrain.sh: Initialise fontconfig even if Arial isn't available
The fontconfig initialisation hardcodes using Arial. However it may
not be available, whereas the fonts being used later will be, so use
one of them for initialisation instead.
2015-08-26 18:32:44 +01:00
Nick White
422c424995 tesstrain.sh: Only set FONTS if they weren't set on the command line
Previously the fonts specified in language-selection.sh would override
any specified on the command line.

This changes language-specific.sh from overriding a user request to
just setting the default fonts if none are specified with --fontlist.
2015-08-26 18:24:14 +01:00
Nick White
8d0f59d09d tesstrain.sh: Only fall back to default Latin fonts if none were provided
The --fontlist argument to tesstrain.sh was always ignored, even if
the language had no specific fonts specified in language-specific.sh.

Change this behaviour so the --fontlist argument is used if no specifc
fonts are selected by language-specific.sh.
2015-08-26 18:14:30 +01:00
James R. Barlow
18ac7ae7ef Get OpenCL to compile on OS X
However, the output of the OpenCL build is garbage....
2015-08-26 02:03:07 -07:00