Commit Graph

356 Commits

Author SHA1 Message Date
Stefan Weil
cac116dd11 Replace more PointerVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-19 12:27:48 +01:00
Stefan Weil
573e7d6bb9 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 11:58:13 +01:00
Stefan Weil
576d8d6c63 Partially revert "Replace remaining GenericVector by std::vector for src/training"
This partially reverts commit 7df1cb0bab
which had broken lstm_squashed_test.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 10:59:07 +01:00
Stefan Weil
a847e0f9b5 Replace remaining GenericVector by std::vector for src/classify
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:36 +01:00
Stefan Weil
7df1cb0bab Replace remaining GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:36 +01:00
Stefan Weil
4d8e9dc659 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:36 +01:00
Stefan Weil
37c9cf4940 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:36 +01:00
Stefan Weil
a00e7bc2bb Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:35 +01:00
Stefan Weil
1609014525 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:35 +01:00
Stefan Weil
cb207ce645 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:35 +01:00
Stefan Weil
b0b6bbf019 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:35 +01:00
Stefan Weil
699f727f3e Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-18 08:57:35 +01:00
Stefan Weil
9728bbc596 Replace more GenericVector by std::vector for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-17 20:28:04 +01:00
Stefan Weil
6fcbea3533 Replace more GenericVector by std::vector for src/classify
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-17 13:45:54 +01:00
Stefan Weil
576c09bf31 Replace remaining STRING by std::string in unittest
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:11:41 +01:00
Stefan Weil
0edd69eb10 Replace remaining STRING by std::string in src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:11:41 +01:00
Stefan Weil
21cf7cf84e Replace remaining STRING by std::string in src/dict
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:11:41 +01:00
Stefan Weil
db9f963411 Replace remaining STRING by std::string in src/ccmain
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:11:41 +01:00
Egor Pugin
efd17e205a Replace typedef structs with structs.
typedef enums are left intact.
2021-03-15 09:47:04 +03:00
Egor Pugin
262f65a4d2
snprintf will add '\0' at the end itself. 2021-03-14 23:54:29 +03:00
Egor Pugin
26ceeef6c0 [training] Modernize. 2021-03-14 23:47:42 +03:00
Shree Devi Kumar
efe9ff611f Limit unicharset from training_text only to Indic languages 2021-03-14 17:58:57 +00:00
Shree Devi Kumar
a589ded25f Create unicharset from training text to avoid normalization errors 2021-03-14 16:39:00 +00:00
Stefan Weil
3b0759940c Replace more STRING by std::string
Remove STRING::add_str_int and STRING::add_str_double which are now unused.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-13 23:16:35 +01:00
Stefan Weil
c9f0da49ca Replace more STRING by std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-13 21:15:52 +01:00
Stefan Weil
9cf5b9870d Replace more STRING by std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-13 21:15:52 +01:00
Stefan Weil
51909d5a2e Replace more STRING by std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-13 21:15:52 +01:00
Stefan Weil
d6495d9026 Replace STRING by std::string in src/lstm
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-13 21:15:51 +01:00
Egor Pugin
0eb7ba88bf [clang-format] Execute clang format on include and src dirs.
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Egor Pugin
d36adf3d40 Replace STRING::truncate_at() with resize(). 2021-03-10 14:40:28 +03:00
Stefan Weil
422452b9f4 Check for float errors when running tesseract and lstmtraining
Some illegal floating point calculations like division by zero,
illegal value or overflow will now abort tesseract with an error
message.

For lstmtraining there is now a new parameter --debug_float to
enable the same kind of checks. It is currently disabled by default
because such errors occur and would abort the training process.
That should be fixed in the future.

If tesseract also shows floating point errors which cannot be
fixed easily, a similar parameter to enable the checks can be
added there, too.

The new code requires the function feenableexcept which is only
available with the GNU libc, so it is only used on Linux.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:49:27 +01:00
Stefan Weil
51a214a51b Remove unused include statements for imagedata.h and document used ones
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:42:28 +01:00
Stefan Weil
373a3527ec Format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 14:22:09 +01:00
Stefan Weil
7097dfd41c Replace GenericVector by std::vector for parameters
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-23 20:20:48 +01:00
Stefan Weil
bc69e28de3 Update include statements for external header file allheaders.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-13 10:17:20 +01:00
Stefan Weil
e6f15621c2 Remove Python training scripts which were moved to tesstrain
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-04 14:45:19 +01:00
Shree Devi Kumar
40f3c8d104 Change LATIN_FONTS to use replacement fonts from TeX Gyre collection 2021-02-04 13:51:03 +01:00
Stefan Weil
4902e68682 cmake: Use pkg_config to find required libraries
This is needed for cmake builds on MacOS (Intel and Amd64) with Homebrew.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-31 17:23:06 +01:00
Stefan Weil
139d127ff7 Remove unneeded include statement for genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-23 17:29:57 +01:00
Stefan Weil
5a3d6e5e0d Fix memory leak in mastertrainer_test (fixes issue #3215)
The issue was introduced in commit 6e9456415.

Partially reverting this commit fixes it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-23 14:54:38 +01:00
Stefan Weil
e3fd938bca lstmtrainer: Modernize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-22 08:17:19 +01:00
Stefan Weil
0cdaab5ac9 lstmtrainer: Remove unused local variable
This fixes a compiler warning:
    src/training/unicharset/lstmtrainer.cpp:107:15: warning:
      unused variable 'shape' [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-22 08:13:38 +01:00
Stefan Weil
3d47e0a91a Replace GenericVector by std::vector in LoadFileLinesToStrings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-22 08:13:38 +01:00
Stefan Weil
5d44a8216f Show names of failing lstmf files in error messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-20 13:36:59 +01:00
Stefan Weil
c7baf8f17d Add more information shown by combine_tessdata -l
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-15 18:49:51 +01:00
Stefan Weil
3195c8f75f Add new option -l for combine_tessdata to list the network string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-15 18:49:51 +01:00
Stefan Weil
73ffcabfe9 lstmtraining: Interpret negative value for --max_iterations as epochs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-14 19:51:58 +01:00
Stefan Weil
80810218f7 Use explicit int32_t for serialized data type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-14 18:06:39 +01:00
Stefan Weil
9b15e65900 Replace resize(0) by clear() for std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-12 19:24:54 +01:00
Shree Devi Kumar
5104af6a15 Remove --psm 6 for lstm.train in tesstrain.py 2021-01-12 13:26:33 +01:00