Commit Graph

282 Commits

Author SHA1 Message Date
Ria
d751305804
fixed missing include for std::back_inserter.
with Visual Studio 2015 RTM:

Error C2039: 'back_inserter': is not a member of 'std'
Error C3861: 'back_inserter': identifier not found

need "iterator" with Visual Studio 2015 (vc14).

#include <iterator>
2017-11-23 11:37:35 +03:30
Stefan Weil
f3c4b894dc Fix help message for unicharset_extractor (#1206)
If unicharset_extractor was called without any argument,
a help message was printed by tesseract::ParseCommandLineFlags.

Replace that by the local help message which is better.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-11-10 15:45:35 +01:00
ivanzz1001
fb359fc981 Update unicharset_extractor.cpp (#1153)
* change IsWhitespace to IsUTF8Whitespace

To solve "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

please reference: [#1147](https://github.com/tesseract-ocr/tesseract/issues/1147)

* Update unicharset_extractor.cpp

fix the "Phase UP: Generating unicharset and unichar properties files" ERROR

* Update unicharset_extractor.cpp

fix "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

* Update unicharset_extractor.cpp

fix the encoding invalid problem and fix the comment
2017-10-13 11:46:42 +02:00
Stefan Weil
07f1400e6f Revert "change type to UChar32 to fix IsValidCodepoint"
This reverts commit a404c9cdb3.
That code no longer matched the specification (see code comment).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-18 07:42:00 +02:00
Shree Devi Kumar
a404c9cdb3 change type to UChar32 to fix IsValidCodepoint 2017-09-16 14:10:34 +05:30
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Shree Devi Kumar
4e9c975859 fix accidental overwrite using old version 2017-09-11 14:45:25 +05:30
Shreeshrii
9a038f893a Add merge_unicharsets to build 2017-09-10 21:51:52 +05:30
Egor Pugin
36e0d2093a Fix windows build. 2017-09-09 21:25:25 +03:00
Ray Smith
9d258e20d3 Fixed build of unicharset_extractor 2017-09-08 15:33:03 +01:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
4cf123e099 Added ability to randomly rotate images upside-down during training for training OSD 2017-09-08 12:42:57 +01:00
Ray Smith
3e63918f9d Fixed order of characters in ligatures of RTL languages issue #648 2017-09-08 11:55:11 +01:00
Ray Smith
a912967cc3 Rewrote unicharset_extractor to use the new string normalizer and read plain text as well as box files. 2017-09-08 11:49:57 +01:00
Ray Smith
c773eb5784 Fixed rendering of Thai and units of char spacing 2017-09-08 10:29:03 +01:00
Ray Smith
e96d1df072 Fixed leaks in pango font info 2017-09-08 10:28:22 +01:00
Ray Smith
a2a72d7ca7 Clang tidy changes from sync 2017-09-08 10:13:33 +01:00
Stefan Weil
61f96981e5 training: Fix typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-03 09:16:02 +02:00
Arkady Shapkin
d171488e21 Added CMake option to use system ICU library 2017-08-17 02:50:54 +03:00
Ray Smith
5f5e85e4a0 Fixed lack of error on non-existent traineddata 2017-08-07 09:58:43 -07:00
Ray Smith
0a91498195 Improved error message on missing optional config 2017-08-07 09:50:49 -07:00
Ray Smith
4b3c5f6c35 Added check for non-empty traineddata flag 2017-08-07 09:43:30 -07:00
Egor Pugin
c67c2e9f41 Add combine_lang_model to cmake and cppan builds. 2017-08-06 14:46:32 +03:00
Stefan Weil
cdec915e17 Fix broken build for Windows
Windows does not provide a mkdir function with two parameters.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-04 10:18:35 +02:00
Ray Smith
77c44cdecd Added convert to int and directory listing to combine_tessdata 2017-08-02 14:53:07 -07:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Ray Smith
3f7735492f Removed unnecessary using statements and cleaned up google/non-google distinction 2017-07-19 16:42:48 -07:00
Stefan Weil
5a7b7ed7e1 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:22:05 +02:00
Stefan Weil
0cd71c67c9 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:59 +02:00
Stefan Weil
fbfbf67cf9 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:49 +02:00
Stefan Weil
500f913b51 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:35 +02:00
Stefan Weil
059e30d4cb PangoFontInfo: Remove unused method is_fraktur
That restores commit 25e0c1accb and
partially revert commit 4907a23fea
which added the now unused Shlwapi library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:16:38 +02:00
Egor Pugin
4907a23fea Fix windows build. 2017-07-15 15:09:00 +03:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
df41eab6aa Added script-specific validation and normalization for virama-using scripts and updated normalization for others 2017-07-14 10:05:05 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
zdenop
59de660386 Merge pull request #969 from stweil/clean
PangoFontInfo: Remove some unused methods
2017-06-03 15:30:46 +02:00
Stefan Weil
2843739843 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
e420417c85 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
0d411cb5c5 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:44 +02:00
Stefan Weil
8786e56084 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 11:42:19 +02:00
Egor Pugin
4ed4864dd6 Merge pull request #966 from rfschtkt/pen_color_
StringRenderer::pen_color_: int[3]->double[3]
2017-06-03 12:32:26 +03:00
Stefan Weil
8ec67a940d Remove strcasestr which is no longer needed
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 10:11:44 +02:00
Stefan Weil
25e0c1accb PangoFontInfo: Remove unused method is_fraktur
That allows removing a dirty hack which used the
non-portable function strcasestr.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-03 10:08:21 +02:00
Raf Schietekat
2981c6c585 StringRenderer::pen_color_: int[3]->double[3] 2017-06-02 09:58:26 +02:00
Raf Schietekat
8dad542f77 Fewer g++ -Wunused-variable warnings 2017-05-11 23:36:05 +02:00