Commit Graph

315 Commits

Author SHA1 Message Date
Stefan Weil
61f96981e5 training: Fix typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-03 09:16:02 +02:00
Arkady Shapkin
d171488e21 Added CMake option to use system ICU library 2017-08-17 02:50:54 +03:00
Ray Smith
5f5e85e4a0 Fixed lack of error on non-existent traineddata 2017-08-07 09:58:43 -07:00
Ray Smith
0a91498195 Improved error message on missing optional config 2017-08-07 09:50:49 -07:00
Ray Smith
4b3c5f6c35 Added check for non-empty traineddata flag 2017-08-07 09:43:30 -07:00
Egor Pugin
c67c2e9f41 Add combine_lang_model to cmake and cppan builds. 2017-08-06 14:46:32 +03:00
Stefan Weil
cdec915e17 Fix broken build for Windows
Windows does not provide a mkdir function with two parameters.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-04 10:18:35 +02:00
Ray Smith
77c44cdecd Added convert to int and directory listing to combine_tessdata 2017-08-02 14:53:07 -07:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Ray Smith
3f7735492f Removed unnecessary using statements and cleaned up google/non-google distinction 2017-07-19 16:42:48 -07:00
Stefan Weil
5a7b7ed7e1 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:22:05 +02:00
Stefan Weil
0cd71c67c9 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:59 +02:00
Stefan Weil
fbfbf67cf9 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:49 +02:00
Stefan Weil
500f913b51 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:35 +02:00
Stefan Weil
059e30d4cb PangoFontInfo: Remove unused method is_fraktur
That restores commit 25e0c1accb and
partially revert commit 4907a23fea
which added the now unused Shlwapi library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:16:38 +02:00
Egor Pugin
4907a23fea Fix windows build. 2017-07-15 15:09:00 +03:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
df41eab6aa Added script-specific validation and normalization for virama-using scripts and updated normalization for others 2017-07-14 10:05:05 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
zdenop
59de660386 Merge pull request #969 from stweil/clean
PangoFontInfo: Remove some unused methods
2017-06-03 15:30:46 +02:00
Stefan Weil
2843739843 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
e420417c85 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
0d411cb5c5 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
8786e56084 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:19 +02:00
Egor Pugin
4ed4864dd6 Merge pull request #966 from rfschtkt/pen_color_
StringRenderer::pen_color_: int[3]->double[3]
2017-06-03 12:32:26 +03:00
Stefan Weil
8ec67a940d Remove strcasestr which is no longer needed
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 10:11:44 +02:00
Stefan Weil
25e0c1accb PangoFontInfo: Remove unused method is_fraktur
That allows removing a dirty hack which used the
non-portable function strcasestr.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 10:08:21 +02:00
Raf Schietekat
2981c6c585 StringRenderer::pen_color_: int[3]->double[3] 2017-06-02 09:58:26 +02:00
Raf Schietekat
8dad542f77 Fewer g++ -Wunused-variable warnings 2017-05-11 23:36:05 +02:00
Raf Schietekat
7f382df5ec Fewer g++ -Wsign-compare warnings (cont.) 2017-05-11 23:14:52 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Stefan Weil
0c88b72909 training: Fix format error and some compiler warnings
The size() method returns a size_type value which is an unsigned type.
As there is no portable format string for that type, a type cast is needed.

Fix also several signed / unsigned mismatches which resulted in compiler
warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Egor Pugin
2ea946d11c Turn on building of text2image. 2017-05-07 20:05:12 +03:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
1d6dd03bfc training: Replace memfree by free
free also accepts a nullptr argument, so the code can be simplified.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 18:14:00 +02:00
Stefan Weil
445befd3cb Remove unused include statements for freelist.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 17:12:43 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
500bfaf315 Added std:: to some stl types 2017-04-27 17:15:35 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Egor Pugin
0dcb6b3547 Rename cppan/cmake projects. 2017-02-23 15:39:58 +03:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Mikhail Solomennik
e2974cf953 err -> err_exit 2017-01-20 18:50:47 +03:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Egor Pugin
442b5b731a Fix building of training tools in shared configuration. 2016-12-17 16:19:35 +03:00
Zdenko Podobný
f8dffecf41 fix training build addition to 7c684be724 (Add missing linker flags for Leptonica) 2016-12-15 22:20:35 +01:00
Stefan Weil
7c684be724 Add missing linker flags for Leptonica
They were removed in commit d70f3c3663.
The old code implicitly added `-llept` by using the `AC_CHECK_LIB` macro.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-15 17:28:01 +01:00
zdenop
831e161066 Merge pull request #569 from stweil/nullptr
training: Replace NULL by nullptr
2016-12-15 09:05:20 +01:00
zdenop
a0201831c3 Merge pull request #576 from stweil/shellcheck
Fix some issues reported by shellcheck (SC2004, SC2006)
2016-12-15 08:30:30 +01:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Stefan Weil
cb6e9e0071 training: Replace NULL by nullptr
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-14 21:08:36 +01:00
Ray Smith
81ebba0394 More makefile changes to remove cube 2016-12-14 11:17:06 -08:00
Ray Smith
9f5ba9105f Removed dependency on cube from the code 2016-12-14 10:55:15 -08:00
Stefan Weil
b75beda7f9 Fix some issues reported by shellcheck (SC2004, SC2006)
Examples:

In training/tesstrain.sh line 64:
if (( ${LINEDATA} )); then
      ^-- SC2004: $/${} is unnecessary on arithmetic variables.

In training/tesstrain.sh line 56:
source `dirname $0`/language-specific.sh
       ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-14 14:11:24 +01:00
Stefan Weil
a9b300dc1d Use pkg-config for icu compiler and linker flags
The old settings are used as fallback if there is no configuration.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-13 13:29:34 +01:00
Stefan Weil
7755e05e50 training: Update Makefile for current Mingw-w64
Mingw-w64 no longer needs special linker options,
builds with those options fail.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-06 23:02:47 +01:00
Stefan Weil
70c6f1624c Fix #define guards in header files
Some guards were missing, others were not the first statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Egor Pugin
afd069c219 Fix build. 2016-12-01 12:51:03 +03:00
Egor Pugin
68aa285dcc Update CMakeLists.txt 2016-12-01 12:38:45 +03:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
9d9056716f Added std:: to vector 2016-11-30 15:45:36 -08:00
Ray Smith
53003f9074 Formatting changes from clang_tidy on latest pull 2016-11-30 15:44:25 -08:00
Stefan Weil
6158f7eae2 Simplify calls of free
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
67deea5703 Fix unix build. 2016-11-24 17:39:16 +03:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
zdenop
ac3b40de2f Merge pull request #478 from stweil/w
Fix some compiler warnings
2016-11-22 08:30:57 +01:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Stefan Weil
4f45940050 training: Fix compiler warnings (deprecated register keyword)
training/commontraining.cpp:824:3: warning:
 'register' storage class specifier is deprecated and incompatible with C++1z [-Wdeprecated-register]
...

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-14 22:34:15 +01:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
5d21ecfad3 Rendering/hash map changes part 2 2016-11-07 11:56:07 -08:00
Ray Smith
a987e6d87c Major bug fixes to pango renderer and resolved issue of hash_map vs unordered_map 2016-11-07 11:35:45 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
34af6155eb training: Remove unnecessary const qualifiers
This fixes several gcc warnings:

warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-08 11:28:22 +02:00
Zdenko Podobný
61032d9b14 set fonts_dir to system default font location. Fixes #409 2016-09-01 18:27:00 +02:00
Zdenko Podobný
916897da1b print text2image info to stdout instead of strerr 2016-09-01 13:38:06 +02:00
Stefan Weil
6ec1a0a09b fileio: Replace assert with tprintf() and exit(1)
Assertions are good for programming errors, but not for wrong user input.

The new code no longer needs File::ReadFileToStringOrDie, so remove that
method.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-30 07:13:56 +02:00
Stefan Weil
1950fec7a2 tlog: Remove unused macro TLOG_FATAL
The implementation was also wrong because it did not use __VA_ARGS__.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-29 19:11:01 +02:00
Stefan Weil
3420acabe5 text2image: Add linefeed to error message
This changes the error message for a missing font from

  Could not find font named Times New Roman.Please correct --font arg.

(missing space after first sentence) to

  Could not find font named Times New Roman.
  Please correct --font arg.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-28 21:33:52 +02:00
Stefan Weil
34ed8ddf62 stringrenderer: Fix compiler warning (-Wwrite-strings)
gcc reported this warning:

../training/stringrenderer.cpp:
 In member function ‘void tesseract::StringRenderer::SetLayoutProperties()’:
../training/stringrenderer.cpp:211:42: warning:
 ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
     set_features("liga, clig, dlig, hlig");
                                          ^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-28 14:07:03 +02:00
zdenop
939023ffb9 Merge pull request #391 from vidiecan/issue_390
fixed #390 by introducing new rotate_image flag
2016-08-15 20:04:30 +02:00
jm
b69561c802 fixed #390 by introducing new rotate_image flag 2016-08-15 18:16:35 +02:00
jm
941e1c4c84 fixes #388 by using raw bytes utf8 encoding 2016-08-15 18:11:01 +02:00
jm
8d2d94e4ed fixes some of the windows issue with text2image, see #380 2016-08-05 20:11:01 +02:00
zdenop
5ca73cca26 Merge pull request #355 from amitdo/pango-name-is-empty
Check that pango's suggested font name is not an empty string
2016-06-20 10:26:11 +02:00
Stefan Weil
ed053aab94 Fix Cygwin compatibility – part III
Commit 65504c8cd2 misplaced the #endif.
The definition of _GNU_SOURCE is only needed for Cygwin.

Defining _GNU_SOURCE on Linux results in compiler warnings because this
macro is already defined by the compiler.

Fix this by moving the #endif to the right place. In addition the code
for Cygwin is made more robust: If a future Cygwin compiler defines
_GNU_SOURCE, too, the code will still work.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-06-19 22:38:03 +02:00
amitdo
724fb894ac Check that pango's suggested font name is not an empty string
On msys2 pango seems to always returns empty string for the suggested
font. It's a good idea to check that the string is not empty before
printing it - on all platforms.
2016-06-19 13:40:17 +03:00
Amit
96720c785d Merge pull request #351 from amitdo/cygwin-compat
Fix Cygwin compatibility
2016-06-19 12:43:35 +03:00
Stefan Weil
65504c8cd2 Fix Cygwin compatibility - Part II 2016-06-19 11:59:58 +03:00
Amit Dovev
13d789d4df Merge pull request #288 from nickjwhite/opentypeligatures
Enable all ligatures available in a font for text2image rendering
2016-06-19 03:33:32 +03:00
Amit Dovev
034d666e7a Replace use of TLOG_FATAL() with tprintf() and exit(1) (#349)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-16 12:10:53 +03:00
Shreeshrii
c3a7fab349 Replace asserts with tprintf() and exit(1)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-14 14:35:05 +03:00
amitdo
cd1a14450c Training tools: Print help message when (argv == 1) 2016-05-22 11:16:42 +03:00
Zdenko Podobný
cab6de1740 remove unused GlyphLessFont files 2016-05-20 21:19:00 +02:00
Nick White
76ed9decb3 Only enable extra ligatures with recent Pango versions
Pango's opentype feature selection functions are only available
from version 1.38+, which is still quite new, so ensure it's just
ignored if using an older version.
2016-03-21 13:03:03 +00:00