Stefan Weil
a4b03fbb27
Fix warning from shellcheck
...
shellcheck warning:
In /tesseract/src/training/tesstrain_utils.sh line 209:
TIMESTAMP=`date +%Y-%m-%d`
^-- SC2006: Use $(..) instead of legacy `..`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 17:45:20 +01:00
John Lin
bfe58aa56f
Fix unbound variable $FONTS
2018-11-15 17:43:15 +01:00
Stefan Weil
0915cbd535
Simplify shell script using mktemp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 13:36:52 +01:00
John Lin
edb76e281a
Simplify MKTEMP_DT logic
2018-11-15 10:38:40 +08:00
John Lin
dbfc89f9af
Fix mktemp in tesstrain_utils.sh
...
The commit 10f2c45c00
unified the usage of mktemp, but with a
incorrect bash syntax and unnecessary definition of LANG_CODE
and TIMESTAMP. This patch fixes the above problems.
2018-11-14 09:04:34 +08:00
Stefan Weil
286dfb031a
Remove unused include statements
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-29 19:46:58 +01:00
zdenop
e60318f9c0
set PANGOCAIRO_BACKEND=fc to avoid crash; fixes #736
2018-10-23 13:22:38 +02:00
Matthias Geerdsen
eac2880c24
avoid unbound variable TESSDATA_PREFIX
...
set TESSDATA_PREFIX as empty, if not defined in environment to avoid an
unbound variable
2018-10-22 14:28:14 +02:00
Matthias Geerdsen
95d9c8c57a
set default values for unset variables
...
setting default values for posibly unset variables avoids unbount
variabe errors
2018-10-21 21:30:52 +02:00
Matthias Geerdsen
7b32e64564
add shebang
2018-10-21 21:30:13 +02:00
zdenop
32c1e4f433
FLAGS_webtext_prefix: unbound variable; issue #2005
2018-10-21 14:00:06 +02:00
zdenop
4d3b0bc798
use <cstdio> instead of <stdio.h>
2018-10-20 21:46:40 +02:00
zdenop
8103d17c72
use _strdup instead of strdup in MSVC
2018-10-20 21:43:38 +02:00
zdenop
a033261f63
add info about used backend in text2image
2018-10-20 21:41:09 +02:00
Zdenko Podobný
486940687c
Exit training script if run command failed; fixes #2005
2018-10-20 13:00:39 +02:00
Zdenko Podobný
1a523006a6
install training script with autotools.
2018-10-20 12:33:07 +02:00
Zdenko Podobný
1b2bda65e0
Revert "prefer to use FreeType for pango_cairo_font_map"
...
This reverts commit 345e5ee1f3
.
2018-10-20 11:30:07 +02:00
Zdenko Podobný
276c6845ae
Revert "free PangoFontMap; fixes #1999 "
...
This reverts commit d1d73b9888
.
2018-10-20 11:28:20 +02:00
Stefan Weil
b40151c200
training: Don't hide global variables
...
This fixes two warnings from LGTM:
Parameter feature_defs hides a global variable with the same name.
Parameter Config hides a global variable with the same name.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 22:37:37 +02:00
Zdenko Podobný
d1d73b9888
free PangoFontMap; fixes #1999
2018-10-19 00:48:20 +02:00
Stefan Weil
edbd07a5f9
lstmtraining: Handle failed remove syscall (CID 1396166)
...
This fixes a warning from Coverity Scan.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d0d73da65a
commontraining: Fix two comments
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-15 11:15:49 +02:00
Zdenko Podobný
10f2c45c00
fix "mkdir -dt" for bds, mac and cygwin
2018-10-14 18:08:50 +02:00
Tom Morris
14af3f720b
Add missing cerrno includes - fixes #1986
2018-10-13 16:02:48 -04:00
zdenop
4734317499
fixes #408 - text2image: comma in font name
2018-10-13 15:23:40 +02:00
zdenop
5f4f9372e9
revert debug message commited by mistake
2018-10-13 11:20:25 +02:00
Tom Morris
f6fd9b3a00
Handle null raw_choice - fixes #235 , fixes #246
2018-10-13 11:14:26 +02:00
Stefan Weil
d86d520fd0
Remove tab character in source files
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
4044ba8260
fix "mktemp -d --tmpdir" on Mac OS; see #1453
2018-10-06 20:47:48 +02:00
Stefan Weil
0e71e5a754
lstmtraining: Remove dead code for purified model name
...
The purified model name `model_output` was unused,
so remove the comment and the unused code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 09:34:17 +02:00
Stefan Weil
f4e982e041
combine_tessdata: Handle failures when extracting
...
Report an error and terminate if that fails.
Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main()
and add missing return at end of main().
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 21:39:18 +02:00
Stefan Weil
7434590b9a
lstmtraining: Check write permission for output model
...
This is done by creating a temporary file.
Report an error and terminate if that fails.
Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main().
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 20:38:02 +02:00
Zdenko Podobný
7dbf5a030f
print help for tesstrain.sh; fixes #1469
2018-10-02 11:35:10 +02:00
zdenop
57a6f1d22e
remove duplicate help from combine_lang_model
2018-10-01 21:22:51 +02:00
Stefan Weil
0f3206d5fe
Format code (replace ( xxx ) by (xxx))
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-29 08:21:25 +02:00
zdenop
abe40f17c9
Win32: use the ISO C and C++ conformant name "_putenv" instead of deprecated "putenv"
2018-09-28 20:53:57 +02:00
zdenop
345e5ee1f3
prefer to use FreeType for pango_cairo_font_map
2018-09-28 11:07:26 +02:00
Stefan Weil
319de30814
Add missing include file (fixes linker error for Visual Studio)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 12:22:57 +02:00
Stefan Weil
46d2273e82
IcuErrorCode: Define virtual destructor in .cpp file
...
This fixes compiler warnings from clang:
src/training/icuerrorcode.h:44:7: warning:
'IcuErrorCode' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 12:11:23 +02:00
Stefan Weil
68bcd6ba90
Validator: Define virtual destructor in .cpp file
...
This fixes compiler warnings from clang:
src/training/validator.h:72:7: warning:
'Validator' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 07:48:43 +02:00
Shree Devi Kumar
70daecf267
Javanese Validation works now - for the most part
2018-08-27 21:00:35 +00:00
Shree Devi Kumar
3e8e338c06
taking as kCOnsonant in validate_grapheme
2018-08-27 12:09:34 +00:00
Shree Devi Kumar
a6c6b34bac
Workaround for Javanese Aksara's Taling, do not label it as a combiner
2018-08-27 12:09:34 +00:00
Stefan Weil
7a2f8d9010
Move class tesseract::File from training to ccutil
...
This allows using the class for unittests, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-25 18:16:46 +02:00
Stefan Weil
63965bd750
Fix new whitespace issues
...
- add linefeed after last line
- remove blanks at line endings
This fixes some warnings from clang:
src/training/validate_javanese.h:63:51: warning:
no newline at end of file [-Wnewline-eof]
src/training/validate_javanese.cpp:269:26: warning:
no newline at end of file [-Wnewline-eof]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-23 18:18:15 +02:00
Shree Devi Kumar
43e3f24bb0
add variable --save_box_tiff to Save box/tiff pairs along with lstmf files.
2018-08-20 08:24:09 +00:00
Shree Devi Kumar
b34cf9d424
Javanese script training
2018-08-16 12:15:10 +00:00
Shree Devi Kumar
7957288fd5
chamge validate javanese similar to indic
2018-08-04 09:43:53 +00:00
Shree Devi Kumar
f93f9e8a09
fix typo re Javanese
2018-08-03 14:33:24 +00:00
Shree Devi Kumar
0eb7be1cd1
Initial COmmit to add Aksara Jawa - Javanese script
2018-08-03 13:59:27 +00:00
Stefan Weil
6a28cce96b
Fix whitespace issues
...
* Remove whitespace (blanks, tabs, cr) at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
Stefan Weil
9cf170cb7a
Revert "Change default width for images output by text2image"
...
This reverts commit fdc243b363
because
it caused a regression reported in issue #1798 .
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-27 07:29:30 +02:00
Stefan Weil
b19e69086c
training: Add new flag --workspace_dir to tesstraining_utils.sh
...
By default, that script creates two new temporary directories with random
names in /tmp.
The new command line flag --workspace_dir PATH uses the given path as
a base directory for all temporary files.
That allows better reproducable training results (no random directory
names in log files).
Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
2018-07-26 17:14:19 +02:00
Stefan Weil
ca25d88538
Add missing execute permission for script files
...
It is needed for running the training tutorial on Linux.
The correct mode was lost when moving the files in
commit 104fe7931c
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 20:25:41 +02:00
Stefan Weil
216c2b31e7
Fix typo and add TODO comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 09:58:39 +02:00
Stefan Weil
0d4975933e
Replace tprintf_internal by tprintf and clean tprintf code
...
Commit 4d514d5a60
introduced tprintf_internal
with an additional argument "level" which was removed again in commit
7dc5296fe9
.
So we can now restore the original state without tprintf_internal.
Remove also the declaration of debug_window_on (it does not exist since
commit 030aae9896
) and make the
configuration parameter debug_file local as it is only used by tprintf.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:47:10 +02:00
Stefan Weil
d2febafdcd
Fix compiler warnings [-Wmissing-prototypes]
...
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 16:03:02 +02:00
Stefan Weil
296a836f4e
Fix compiler warnings [-Wunused-const-variable]
...
clang warnings:
src/classify/trainingsampleset.cpp:39:11: warning:
unused variable 'kMinOutlierSamples' [-Wunused-const-variable]
src/lstm/lstmrecognizer.cpp:45:11: warning:
unused variable 'kMaxChoices' [-Wunused-const-variable]
src/training/dawg2wordlist.cpp:28:11: warning:
unused variable 'kDictDebugLevel' [-Wunused-const-variable]
src/training/stringrenderer.cpp:50:21: warning:
unused variable 'kWordJoiner' [-Wunused-const-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Stefan Weil
bdf09f40b1
Fix compiler warnings [-Wzero-as-null-pointer-constant]
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 20:40:56 +02:00
Stefan Weil
081793ff48
Fix build with legacy engine disabled
...
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
Amit D
62c7b796da
Merge branch 'master' into disable-legacy
2018-07-04 11:14:33 +03:00
amitdo
aa9f4b4861
Add an option to compile tesseract without the code of the legacy OCR engine
2018-07-03 18:49:42 +03:00
Stefan Weil
bb7bb1f0b8
Remove old comments for exceptions
...
Exceptions are no longer used.
Remove also some history comments and fix several comments.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 14:53:00 +02:00
Stefan Weil
872813245d
Replace function DoError and remove danerror.cpp, danerror.h
...
This allows also removing all error trap macros.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 13:21:17 +02:00
zdenop
a0ed0b4987
Merge pull request #1732 from stweil/headerfiles
...
Remove unused include files
2018-07-03 07:57:15 +02:00
Stefan Weil
9325fbe322
Remove unused include files
...
ccstruct/hpdsizes.h was not used at all.
cutil/const.h was included, but not needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 07:25:38 +02:00
Stefan Weil
cbd7b15788
Remove unneeded macro definition for M_PI
...
There is already one in platform.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:59:16 +02:00
Stefan Weil
f7b61891bc
Replace macro PI by macro M_PI
...
One definition for pi is sufficient.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:26:53 +02:00
Stefan Weil
b57afc7c78
Replace Efopen by fopen and remove efio.cpp, efio.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 17:46:28 +02:00
Stefan Weil
faae87beaa
Replace FLOAT32 by float data type
...
On most systems float is the IEEE 754 single-precision binary
floating-point format (32 bits). Tesseract does not support other systems.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
1371980f9f
Replace string.h by standard C++ cstring
...
Remove the unneeded include statement in platform.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826
Clean usage of assert.h
...
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff
Remove public API file ndminx.h
...
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
44450094c3
Replace ASSERT_HOST in genericvector.h
...
genericvector.h used a mix of assert and ASSERT_HOST.
By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.
Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 22:32:17 +02:00
Shreeshrii
a27e91c4f9
Update tesstrain_utils.sh
2018-06-11 09:35:14 +05:30
Shreeshrii
fdc243b363
Change default width for images output by text2image
...
Fixes
Image too large to learn!! Size = 2594x48
Image not trainable
See https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-271244655
for related discussion
2018-06-11 09:34:07 +05:30
Stefan Weil
0215d91f45
training: Add missing linefeed to error message
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-06 21:32:16 +02:00
Stefan Weil
4f3b266efe
src/training: Replace more proprietary BOOL8 by standard bool data type
...
Update also callers of the modified functions to use
false / true instead of 0 / 1.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
b292013bdc
cntraining: Replace proprietary BOOL8 by standard bool data type
...
Add also "static" attribute to local functions and remove an old comment.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
f2698c256d
src/training: Replace proprietary BOOL8 by standard bool data type
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-03 21:13:40 +02:00
Stefan Weil
509a6f0ce0
Fix some typos (most found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-27 18:49:43 +02:00
Alexander Zaitsev
6049225d01
Merge remote-tracking branch 'my_repo/small_fixes' into small_fixes
2018-05-20 18:48:30 +03:00
Alexander Zaitsev
d54d7486b4
Use std::max/std::min instead of MAX/MIN macros.
2018-05-20 17:49:48 +03:00
Alexander Zaitsev
14ae0b8727
Use std::max/std::min instead of MAX/MIN macros.
2018-05-20 16:18:07 +03:00
Alexander Zaitsev
e7e8e20119
Remove deprecated in C++11 'register' keyword (removed since C++17).
2018-05-20 01:49:26 +03:00
Alexander Zaitsev
0697235bb2
Use using instead of typedef. Reason: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rt-using
2018-05-20 01:31:03 +03:00
Alexander Zaitsev
0248c7ff9d
Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>).
2018-05-20 00:52:04 +03:00
Shreeshrii
6c08ec02e4
Copy .box and .tif files along with .lstmf files from /tmp
2018-05-17 22:45:22 +05:30
Stefan Weil
932a108b4d
Revert "fixes #388 by using raw bytes utf8 encoding"
...
This reverts commit 941e1c4c84
. It is no
longer needed since commit f54800f14b
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-07 06:06:42 +02:00
Stefan Weil
11609f9509
Fix CID 1386109 (Logically dead code)
...
The else statement is never executed.
Remove also an unused element from the names array
and add the "static" attribute.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 18:32:42 +02:00
Noah Metzger
a7d1402e5d
Fixed access to uninitialized variable
...
Coverity ID: 1386084 the set_font method has accessed resolution_ before it was initialized by the set_resolution method.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-05-02 16:11:35 +02:00
Stefan Weil
b87fc523ca
Fix CID 1386084 (Uninitialized scalar variable)
...
The set_font method used the uninitialized member variable resolution_.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-26 18:02:43 +02:00
Stefan Weil
4f9493c409
Partial fix for autotools configuration after source tree reorganisation
...
This should fix "make" and "make training".
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 21:33:28 +02:00
Stefan Weil
dabf3c299f
Fix file endings
...
Text files should end with a LF, but not additional empty lines.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:35:33 +02:00
Stefan Weil
9ceb0c6430
Fix line endings
...
Replace DOS line endings (CRLF) by standard (LF only).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:04:50 +02:00
Egor Pugin
104fe7931c
Move training to src.
2018-04-25 11:35:26 +03:00