Commit Graph

246 Commits

Author SHA1 Message Date
Stefan Weil
a4b03fbb27 Fix warning from shellcheck
shellcheck warning:

    In /tesseract/src/training/tesstrain_utils.sh line 209:
        TIMESTAMP=`date +%Y-%m-%d`
                  ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 17:45:20 +01:00
John Lin
bfe58aa56f Fix unbound variable $FONTS 2018-11-15 17:43:15 +01:00
Stefan Weil
0915cbd535 Simplify shell script using mktemp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 13:36:52 +01:00
John Lin
edb76e281a Simplify MKTEMP_DT logic 2018-11-15 10:38:40 +08:00
John Lin
dbfc89f9af Fix mktemp in tesstrain_utils.sh
The commit 10f2c45c00 unified the usage of mktemp, but with a
incorrect bash syntax and unnecessary definition of LANG_CODE
and TIMESTAMP. This patch fixes the above problems.
2018-11-14 09:04:34 +08:00
Stefan Weil
286dfb031a Remove unused include statements
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-29 19:46:58 +01:00
zdenop
e60318f9c0 set PANGOCAIRO_BACKEND=fc to avoid crash; fixes #736 2018-10-23 13:22:38 +02:00
Matthias Geerdsen
eac2880c24 avoid unbound variable TESSDATA_PREFIX
set TESSDATA_PREFIX as empty, if not defined in environment to avoid an
unbound variable
2018-10-22 14:28:14 +02:00
Matthias Geerdsen
95d9c8c57a set default values for unset variables
setting default values for posibly unset variables avoids unbount
variabe errors
2018-10-21 21:30:52 +02:00
Matthias Geerdsen
7b32e64564 add shebang 2018-10-21 21:30:13 +02:00
zdenop
32c1e4f433 FLAGS_webtext_prefix: unbound variable; issue #2005 2018-10-21 14:00:06 +02:00
zdenop
4d3b0bc798 use <cstdio> instead of <stdio.h> 2018-10-20 21:46:40 +02:00
zdenop
8103d17c72 use _strdup instead of strdup in MSVC 2018-10-20 21:43:38 +02:00
zdenop
a033261f63 add info about used backend in text2image 2018-10-20 21:41:09 +02:00
Zdenko Podobný
486940687c Exit training script if run command failed; fixes #2005 2018-10-20 13:00:39 +02:00
Zdenko Podobný
1a523006a6 install training script with autotools. 2018-10-20 12:33:07 +02:00
Zdenko Podobný
1b2bda65e0 Revert "prefer to use FreeType for pango_cairo_font_map"
This reverts commit 345e5ee1f3.
2018-10-20 11:30:07 +02:00
Zdenko Podobný
276c6845ae Revert "free PangoFontMap; fixes #1999"
This reverts commit d1d73b9888.
2018-10-20 11:28:20 +02:00
Stefan Weil
b40151c200 training: Don't hide global variables
This fixes two warnings from LGTM:

    Parameter feature_defs hides a global variable with the same name.
    Parameter Config hides a global variable with the same name.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 22:37:37 +02:00
Zdenko Podobný
d1d73b9888 free PangoFontMap; fixes #1999 2018-10-19 00:48:20 +02:00
Stefan Weil
edbd07a5f9 lstmtraining: Handle failed remove syscall (CID 1396166)
This fixes a warning from Coverity Scan.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d0d73da65a commontraining: Fix two comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-15 11:15:49 +02:00
Zdenko Podobný
10f2c45c00 fix "mkdir -dt" for bds, mac and cygwin 2018-10-14 18:08:50 +02:00
Tom Morris
14af3f720b Add missing cerrno includes - fixes #1986 2018-10-13 16:02:48 -04:00
zdenop
4734317499 fixes #408 - text2image: comma in font name 2018-10-13 15:23:40 +02:00
zdenop
5f4f9372e9 revert debug message commited by mistake 2018-10-13 11:20:25 +02:00
Tom Morris
f6fd9b3a00 Handle null raw_choice - fixes #235, fixes #246 2018-10-13 11:14:26 +02:00
Stefan Weil
d86d520fd0 Remove tab character in source files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
4044ba8260 fix "mktemp -d --tmpdir" on Mac OS; see #1453 2018-10-06 20:47:48 +02:00
Stefan Weil
0e71e5a754 lstmtraining: Remove dead code for purified model name
The purified model name `model_output` was unused,
so remove the comment and the unused code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 09:34:17 +02:00
Stefan Weil
f4e982e041 combine_tessdata: Handle failures when extracting
Report an error and terminate if that fails.

Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main()
and add missing return at end of main().

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 21:39:18 +02:00
Stefan Weil
7434590b9a lstmtraining: Check write permission for output model
This is done by creating a temporary file.
Report an error and terminate if that fails.

Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main().

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 20:38:02 +02:00
Zdenko Podobný
7dbf5a030f print help for tesstrain.sh; fixes #1469 2018-10-02 11:35:10 +02:00
zdenop
57a6f1d22e remove duplicate help from combine_lang_model 2018-10-01 21:22:51 +02:00
Stefan Weil
0f3206d5fe Format code (replace ( xxx ) by (xxx))
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-29 08:21:25 +02:00
zdenop
abe40f17c9 Win32: use the ISO C and C++ conformant name "_putenv" instead of deprecated "putenv" 2018-09-28 20:53:57 +02:00
zdenop
345e5ee1f3 prefer to use FreeType for pango_cairo_font_map 2018-09-28 11:07:26 +02:00
Stefan Weil
319de30814 Add missing include file (fixes linker error for Visual Studio)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 12:22:57 +02:00
Stefan Weil
46d2273e82 IcuErrorCode: Define virtual destructor in .cpp file
This fixes compiler warnings from clang:

src/training/icuerrorcode.h:44:7: warning:
 'IcuErrorCode' has no out-of-line virtual method definitions;
 its vtable will be emitted in every translation unit [-Wweak-vtables]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 12:11:23 +02:00
Stefan Weil
68bcd6ba90 Validator: Define virtual destructor in .cpp file
This fixes compiler warnings from clang:

src/training/validator.h:72:7: warning:
 'Validator' has no out-of-line virtual method definitions;
 its vtable will be emitted in every translation unit [-Wweak-vtables]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 07:48:43 +02:00
Shree Devi Kumar
70daecf267 Javanese Validation works now - for the most part 2018-08-27 21:00:35 +00:00
Shree Devi Kumar
3e8e338c06 taking as kCOnsonant in validate_grapheme 2018-08-27 12:09:34 +00:00
Shree Devi Kumar
a6c6b34bac Workaround for Javanese Aksara's Taling, do not label it as a combiner 2018-08-27 12:09:34 +00:00
Stefan Weil
7a2f8d9010 Move class tesseract::File from training to ccutil
This allows using the class for unittests, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-25 18:16:46 +02:00
Stefan Weil
63965bd750 Fix new whitespace issues
- add linefeed after last line
- remove blanks at line endings

This fixes some warnings from clang:

src/training/validate_javanese.h:63:51: warning:
 no newline at end of file [-Wnewline-eof]
src/training/validate_javanese.cpp:269:26: warning:
 no newline at end of file [-Wnewline-eof]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-23 18:18:15 +02:00
Shree Devi Kumar
43e3f24bb0 add variable --save_box_tiff to Save box/tiff pairs along with lstmf files. 2018-08-20 08:24:09 +00:00
Shree Devi Kumar
b34cf9d424 Javanese script training 2018-08-16 12:15:10 +00:00
Shree Devi Kumar
7957288fd5 chamge validate javanese similar to indic 2018-08-04 09:43:53 +00:00
Shree Devi Kumar
f93f9e8a09 fix typo re Javanese 2018-08-03 14:33:24 +00:00
Shree Devi Kumar
0eb7be1cd1 Initial COmmit to add Aksara Jawa - Javanese script 2018-08-03 13:59:27 +00:00
Stefan Weil
6a28cce96b Fix whitespace issues
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
Stefan Weil
9cf170cb7a Revert "Change default width for images output by text2image"
This reverts commit fdc243b363 because
it caused a regression reported in issue #1798.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-27 07:29:30 +02:00
Stefan Weil
b19e69086c training: Add new flag --workspace_dir to tesstraining_utils.sh
By default, that script creates two new temporary directories with random
names in /tmp.

The new command line flag --workspace_dir PATH uses the given path as
a base directory for all temporary files.

That allows better reproducable training results (no random directory
names in log files).

Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
2018-07-26 17:14:19 +02:00
Stefan Weil
ca25d88538 Add missing execute permission for script files
It is needed for running the training tutorial on Linux.

The correct mode was lost when moving the files in
commit 104fe7931c.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 20:25:41 +02:00
Stefan Weil
216c2b31e7 Fix typo and add TODO comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 09:58:39 +02:00
Stefan Weil
0d4975933e Replace tprintf_internal by tprintf and clean tprintf code
Commit 4d514d5a60 introduced tprintf_internal
with an additional argument "level" which was removed again in commit
7dc5296fe9.

So we can now restore the original state without tprintf_internal.

Remove also the declaration of debug_window_on (it does not exist since
commit 030aae9896) and make the
configuration parameter debug_file local as it is only used by tprintf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:47:10 +02:00
Stefan Weil
d2febafdcd Fix compiler warnings [-Wmissing-prototypes]
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 16:03:02 +02:00
Stefan Weil
296a836f4e Fix compiler warnings [-Wunused-const-variable]
clang warnings:

src/classify/trainingsampleset.cpp:39:11: warning:
 unused variable 'kMinOutlierSamples' [-Wunused-const-variable]
src/lstm/lstmrecognizer.cpp:45:11: warning:
 unused variable 'kMaxChoices' [-Wunused-const-variable]
src/training/dawg2wordlist.cpp:28:11: warning:
 unused variable 'kDictDebugLevel' [-Wunused-const-variable]
src/training/stringrenderer.cpp:50:21: warning:
 unused variable 'kWordJoiner' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Stefan Weil
bdf09f40b1 Fix compiler warnings [-Wzero-as-null-pointer-constant]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 20:40:56 +02:00
Stefan Weil
081793ff48 Fix build with legacy engine disabled
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
Amit D
62c7b796da
Merge branch 'master' into disable-legacy 2018-07-04 11:14:33 +03:00
amitdo
aa9f4b4861 Add an option to compile tesseract without the code of the legacy OCR engine 2018-07-03 18:49:42 +03:00
Stefan Weil
bb7bb1f0b8 Remove old comments for exceptions
Exceptions are no longer used.

Remove also some history comments and fix several comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 14:53:00 +02:00
Stefan Weil
872813245d Replace function DoError and remove danerror.cpp, danerror.h
This allows also removing all error trap macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 13:21:17 +02:00
zdenop
a0ed0b4987
Merge pull request #1732 from stweil/headerfiles
Remove unused include files
2018-07-03 07:57:15 +02:00
Stefan Weil
9325fbe322 Remove unused include files
ccstruct/hpdsizes.h was not used at all.
cutil/const.h was included, but not needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 07:25:38 +02:00
Stefan Weil
cbd7b15788 Remove unneeded macro definition for M_PI
There is already one in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:59:16 +02:00
Stefan Weil
f7b61891bc Replace macro PI by macro M_PI
One definition for pi is sufficient.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:26:53 +02:00
Stefan Weil
b57afc7c78 Replace Efopen by fopen and remove efio.cpp, efio.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 17:46:28 +02:00
Stefan Weil
faae87beaa Replace FLOAT32 by float data type
On most systems float is the IEEE 754 single-precision binary
floating-point format (32 bits). Tesseract does not support other systems.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826 Clean usage of assert.h
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff Remove public API file ndminx.h
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
44450094c3 Replace ASSERT_HOST in genericvector.h
genericvector.h used a mix of assert and ASSERT_HOST.

By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.

Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 22:32:17 +02:00
Shreeshrii
a27e91c4f9
Update tesstrain_utils.sh 2018-06-11 09:35:14 +05:30
Shreeshrii
fdc243b363
Change default width for images output by text2image
Fixes
Image too large to learn!! Size = 2594x48
Image not trainable

See https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-271244655
for related discussion
2018-06-11 09:34:07 +05:30
Stefan Weil
0215d91f45 training: Add missing linefeed to error message
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-06 21:32:16 +02:00
Stefan Weil
4f3b266efe src/training: Replace more proprietary BOOL8 by standard bool data type
Update also callers of the modified functions to use
false / true instead of 0 / 1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
b292013bdc cntraining: Replace proprietary BOOL8 by standard bool data type
Add also "static" attribute to local functions and remove an old comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
f2698c256d src/training: Replace proprietary BOOL8 by standard bool data type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-03 21:13:40 +02:00
Stefan Weil
509a6f0ce0 Fix some typos (most found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-27 18:49:43 +02:00
Alexander Zaitsev
6049225d01 Merge remote-tracking branch 'my_repo/small_fixes' into small_fixes 2018-05-20 18:48:30 +03:00
Alexander Zaitsev
d54d7486b4 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 17:49:48 +03:00
Alexander Zaitsev
14ae0b8727 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 16:18:07 +03:00
Alexander Zaitsev
e7e8e20119 Remove deprecated in C++11 'register' keyword (removed since C++17). 2018-05-20 01:49:26 +03:00
Alexander Zaitsev
0697235bb2 Use using instead of typedef. Reason: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rt-using 2018-05-20 01:31:03 +03:00
Alexander Zaitsev
0248c7ff9d Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
Shreeshrii
6c08ec02e4
Copy .box and .tif files along with .lstmf files from /tmp 2018-05-17 22:45:22 +05:30
Stefan Weil
932a108b4d Revert "fixes #388 by using raw bytes utf8 encoding"
This reverts commit 941e1c4c84. It is no
longer needed since commit f54800f14b.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-07 06:06:42 +02:00
Stefan Weil
11609f9509 Fix CID 1386109 (Logically dead code)
The else statement is never executed.

Remove also an unused element from the names array
and add the "static" attribute.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 18:32:42 +02:00
Noah Metzger
a7d1402e5d Fixed access to uninitialized variable
Coverity ID: 1386084 the set_font method has accessed resolution_ before it was initialized by the set_resolution method.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-05-02 16:11:35 +02:00
Stefan Weil
b87fc523ca Fix CID 1386084 (Uninitialized scalar variable)
The set_font method used the uninitialized member variable resolution_.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-26 18:02:43 +02:00
Stefan Weil
4f9493c409 Partial fix for autotools configuration after source tree reorganisation
This should fix "make" and "make training".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 21:33:28 +02:00
Stefan Weil
dabf3c299f Fix file endings
Text files should end with a LF, but not additional empty lines.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:35:33 +02:00
Stefan Weil
9ceb0c6430 Fix line endings
Replace DOS line endings (CRLF) by standard (LF only).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:04:50 +02:00
Egor Pugin
104fe7931c Move training to src. 2018-04-25 11:35:26 +03:00