Commit Graph

315 Commits

Author SHA1 Message Date
Stefan Weil
e07414f425 training: Remove some cube relicts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-22 20:45:31 +02:00
zdenop
577f6f76d2
Merge pull request #1507 from stweil/nullptr
Replace NULL by nullptr
2018-04-22 19:30:02 +02:00
zdenop
d67df9ee57
Merge pull request #1510 from Shreeshrii/Shreeshrii-tesstrain-msg
Change info message from tesstrain.sh
2018-04-22 19:26:12 +02:00
Shreeshrii
86275c2187
Clarify message to indicate additional LSTM training required for 4.0.0 2018-04-22 22:26:07 +05:30
Shreeshrii
0f3d33f699
Change max_pages to zero
Fixes https://github.com/tesseract-ocr/tesseract/issues/1149 and https://github.com/tesseract-ocr/tesseract/issues/1508
2018-04-22 22:19:06 +05:30
Stefan Weil
5eca9143b9 training: Replace NULL by nullptr
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-22 17:42:36 +02:00
Stefan Weil
d68ab9f12e training: Support new command line option -v (short form for --version)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-20 22:36:22 +02:00
Stefan Weil
f95041faac lstmtraining: Fix handling of --max_iterations
The iteration counter should be checked for each iteration,
not only at the end of a batch.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-19 14:47:24 +02:00
Stefan Weil
8f656e49bc training: Fix dubious parsing of command line
- Don't support --helpshort as an undocumented alias for --help
- Don't allow any number of leading '-' characters.
  The preferred form uses --OPTION, and for compatibility reasons the new
  code still supports -OPTION.

Update also related documentation comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-17 22:23:23 +02:00
Zdenko Podobny
4b468e11fa improve readability of commit 198664fb0b 2018-04-17 17:41:59 +02:00
Amit D
88a1364699
Remove obsolete code
Pango versions older than 1.22.0 are not supported.
2018-04-17 15:58:34 +03:00
Stefan Weil
0998bcf1fc training: Support new argument --version for remaining executables
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-16 07:56:55 +02:00
zdenop
a07ee5c40b
Merge pull request #1479 from stweil/version
training: Add initial support for --version argument and check library version
2018-04-15 12:59:58 +02:00
Stefan Weil
a440bd8bf1 training: Support new argument --version
classifier_tester, cntraining, combine_lang_model, lstmeval, lstmtraining,
mftraining, set_unicharset_properties, shapeclustering, text2image and
unicharset_extractor now can show the version.

Still missing: ambiguous_words, combine_tessdata, dawg2wordlist,
merge_unicharsets and wordlist2dawg.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 12:39:18 +02:00
Stefan Weil
8c3045f161 Check library version for training executables
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 11:31:28 +02:00
Stefan Weil
a6fef12bd6 training: Add 'static' to some local functions
Fix also the missing exit value for text2image.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 11:31:22 +02:00
Stefan Weil
3f967d2abc training: Remove unused function prototypes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 11:26:31 +02:00
Stefan Weil
03b0cb9160 training: Format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 10:27:32 +02:00
Stefan Weil
5e9e22c719 training: Fix typo in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 08:41:51 +02:00
zdenop
83f311f043
Merge pull request #1405 from Shreeshrii/patch-2
Add additional Unicodes to IsVedicAccent
2018-04-13 06:38:06 +02:00
zdenop
c869478825
Merge pull request #1406 from Shreeshrii/patch-1
Add kVedicMark to ConsumeVowelIfValid
2018-04-13 06:37:35 +02:00
FernandoGOT
3917a192ca fix for mktemp bug on MAC OS X 2018-04-10 14:22:33 -03:00
FernandoGOT
7a5033d1d9 added sleep 1 before generate_font_image to fix the problem of not finding fonts 2018-04-10 10:16:37 -03:00
Zdenko Podobný
e9e1e93686 add tess_version.h to distribution 2018-04-02 21:48:29 +02:00
Stefan Weil
3fcb952dbf Remove unneeded CPPFLAGS (#1425)
* training: Remove unneeded CPPFLAGS

The training code does not need vs2010/port.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Remove unneeded CPPFLAGS

The unittest code does not need vs2010/port.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 21:04:45 +02:00
Egor Pugin
3fa25d9bbc Install training tools with CMake. 2018-03-23 23:28:51 +03:00
Shreeshrii
198664fb0b
Add additional Unicodes to IsVedicAccent 2018-03-20 20:33:25 +05:30
Shreeshrii
aab83da67d
Add kVedicMark to ConsumeVowelIfValid 2018-03-20 20:31:25 +05:30
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Jim Regan
0d1365a5ee gle_uncial (#1342) 2018-02-22 17:20:31 +01:00
Stefan Weil
2ca7d9451a Remove files generated by libtool (#1329)
It looks like those files were added accidentally
in commit fc6a390c6c.
Add them to .gitignore to avoid that from now on.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:17:49 +01:00
Ria
d751305804
fixed missing include for std::back_inserter.
with Visual Studio 2015 RTM:

Error C2039: 'back_inserter': is not a member of 'std'
Error C3861: 'back_inserter': identifier not found

need "iterator" with Visual Studio 2015 (vc14).

#include <iterator>
2017-11-23 11:37:35 +03:30
Stefan Weil
f3c4b894dc Fix help message for unicharset_extractor (#1206)
If unicharset_extractor was called without any argument,
a help message was printed by tesseract::ParseCommandLineFlags.

Replace that by the local help message which is better.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-11-10 15:45:35 +01:00
ivanzz1001
fb359fc981 Update unicharset_extractor.cpp (#1153)
* change IsWhitespace to IsUTF8Whitespace

To solve "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

please reference: [#1147](https://github.com/tesseract-ocr/tesseract/issues/1147)

* Update unicharset_extractor.cpp

fix the "Phase UP: Generating unicharset and unichar properties files" ERROR

* Update unicharset_extractor.cpp

fix "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

* Update unicharset_extractor.cpp

fix the encoding invalid problem and fix the comment
2017-10-13 11:46:42 +02:00
Stefan Weil
07f1400e6f Revert "change type to UChar32 to fix IsValidCodepoint"
This reverts commit a404c9cdb3.
That code no longer matched the specification (see code comment).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-18 07:42:00 +02:00
Shree Devi Kumar
a404c9cdb3 change type to UChar32 to fix IsValidCodepoint 2017-09-16 14:10:34 +05:30
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Shree Devi Kumar
4e9c975859 fix accidental overwrite using old version 2017-09-11 14:45:25 +05:30
Shreeshrii
9a038f893a Add merge_unicharsets to build 2017-09-10 21:51:52 +05:30
Egor Pugin
36e0d2093a Fix windows build. 2017-09-09 21:25:25 +03:00
Ray Smith
9d258e20d3 Fixed build of unicharset_extractor 2017-09-08 15:33:03 +01:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
4cf123e099 Added ability to randomly rotate images upside-down during training for training OSD 2017-09-08 12:42:57 +01:00
Ray Smith
3e63918f9d Fixed order of characters in ligatures of RTL languages issue #648 2017-09-08 11:55:11 +01:00
Ray Smith
a912967cc3 Rewrote unicharset_extractor to use the new string normalizer and read plain text as well as box files. 2017-09-08 11:49:57 +01:00
Ray Smith
c773eb5784 Fixed rendering of Thai and units of char spacing 2017-09-08 10:29:03 +01:00
Ray Smith
e96d1df072 Fixed leaks in pango font info 2017-09-08 10:28:22 +01:00
Ray Smith
a2a72d7ca7 Clang tidy changes from sync 2017-09-08 10:13:33 +01:00