Commit Graph

5990 Commits

Author SHA1 Message Date
Stefan Weil
6bf5080d4c Remove unused include statements for strngs.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-12 23:11:08 +01:00
Egor Pugin
11a55c6c79
[readme] Require C++17 for building. 2021-03-13 00:56:40 +03:00
Egor Pugin
a393df5038 Add missing export header. 2021-03-13 00:07:19 +03:00
Egor Pugin
2d10be5209 [clang-format] Format generated protobuf source. 2021-03-13 00:07:03 +03:00
Egor Pugin
1d5b083447 [clang-format] Format unit tests. 2021-03-13 00:06:34 +03:00
Egor Pugin
618b185d14 Include missing config_auto.h 2021-03-12 23:39:18 +03:00
Egor Pugin
8b0c5405e2 Add missing forward decl. 2021-03-12 22:35:30 +03:00
Egor Pugin
0eb7ba88bf [clang-format] Execute clang format on include and src dirs.
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Egor Pugin
afa476bc23 [clang-format] Update config. 2021-03-12 22:33:22 +03:00
Egor Pugin
0e9deb68c9 Revert "Format public API files with 'clang-format-11 -i include/tesseract/*.h'"
This reverts commit c20da5e10f.
2021-03-12 20:20:34 +03:00
Stefan Weil
c20da5e10f Format public API files with 'clang-format-11 -i include/tesseract/*.h'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-12 13:26:38 +01:00
Stefan Weil
b68a2a7b47 Fix tatweel_test for C++-20
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-12 13:16:48 +01:00
Stefan Weil
4c6cc5a04d Replace GenericVector by std::vector in class ImageData
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-12 13:10:25 +01:00
Egor Pugin
520aeb34aa
Merge pull request #3323 from Shreeshrii/ci
Actions CI: Add vcpkg build for tesseract 4.1 (windows and linux)
2021-03-12 11:51:44 +03:00
Shree
33c129f50f Actions CI: comment #push 2021-03-12 05:02:55 +00:00
Shree
edf6e0f433 Actions CI: Add vcpkg build for tesseract 4.1 2021-03-12 04:59:41 +00:00
Stefan Weil
fc00834920 autobuild: Require C++17
This completes commit 73a325494e.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-11 21:57:02 +01:00
Ger Hobbelt
779aa79350
Fix build (#3322)
* fix errors after merge commit: missing changes that are needed too to make this codebase compile.
* Update src/wordrec/wordrec.h

Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-03-11 21:43:07 +01:00
Egor Pugin
3444618075 Fix linux build. 2021-03-10 15:35:13 +03:00
Egor Pugin
ce058604ba Pass empty strings into Tesseract::init_tesseract(). 2021-03-10 15:21:03 +03:00
Egor Pugin
911dd93f12 Pass init strings as std::string instead of const char * internally. This does not affect public APIs. 2021-03-10 15:17:00 +03:00
Egor Pugin
9792f3c4ff Remove STRING::size() method. 2021-03-10 14:58:37 +03:00
Egor Pugin
6de97309a1 Remove unused STRING::strdup(). 2021-03-10 14:42:50 +03:00
Egor Pugin
f0e30a2af2 Remove unused STRING::unsigned_size(). 2021-03-10 14:41:31 +03:00
Egor Pugin
d36adf3d40 Replace STRING::truncate_at() with resize(). 2021-03-10 14:40:28 +03:00
Egor Pugin
e9a2fc0083 More std::string replacements. 2021-03-10 14:36:59 +03:00
Egor Pugin
73a325494e
[cmake] Require C++17. 2021-03-10 00:41:47 +03:00
Stefan Weil
0f1296c6f6 Clean implementation for (de-)serialization of a vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-08 13:33:48 +01:00
Egor Pugin
0cd6a07e42
Update .travis.yml 2021-03-08 03:02:25 +03:00
Stefan Weil
6cfe604d58 Fix serialization for vector of RecodedCharID
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-07 23:01:25 +01:00
Shreeshrii
33868a52ae
Travis: build linux matrix (#3320) 2021-03-07 19:31:02 +01:00
Egor Pugin
576c064b44
Merge pull request #3318 from Shreeshrii/travis
Add multiple architectures for travis run
2021-03-06 12:20:25 +03:00
Shree Devi Kumar
4fd0bca6c9 Add multiple architectures for travis run 2021-03-06 08:30:14 +00:00
Stefan Weil
0cde3ede98 Add heuristic to fix swap (partially fixes issue #2586)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-05 14:27:28 +01:00
Stefan Weil
a2769aebb4 Replace GenericVector<TBOX> by std::vector<TBOX>
Fix also endianness handling for (de)serialisation of TBOX.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-05 14:27:28 +01:00
Stefan Weil
c31c1a7d60 Fix two compiler warnings for serialis.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-05 14:27:28 +01:00
Stefan Weil
fe614c6069 Enable less FP exceptions for clang compiler when running tesseract
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-03 22:56:07 +01:00
Egor Pugin
c39b1daa6b GenericVector -> std::vector. 2021-03-03 22:22:00 +03:00
Egor Pugin
0a693a9519 Allow to serialize std vectors with classes from TFile. Implementation from GenericVector. 2021-03-03 22:21:40 +03:00
Stefan Weil
ff830775f9 Fix memory leak in DocumentCache
It was introduced in commit 5cac52173e.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-01 11:31:48 +01:00
Stefan Weil
339c01894e Avoid fp division by 0 (fix issue #3314)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-28 19:42:01 +01:00
Egor Pugin
838a754d24
Merge pull request #3313 from stweil/learning_rate
Add new checks for floating point errors and fix a division by zero
2021-02-27 23:20:09 +03:00
Stefan Weil
cd60728e8a Avoid float division by zero when calculating adaptive learning rate
The following line results in a division by zero when
momentum is -1 and num_samples is even:

     learning_rate /= 1.0f - pow(momentum, num_samples);

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-27 21:08:41 +01:00
Stefan Weil
c12dde2862 Use float instead of double for learning_rate, momentum and adam_beta
Only WeightMatrix::Update used double parameters, all other functions
already used float. So this change avoids unnecessary conversions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-27 21:08:41 +01:00
Stefan Weil
422452b9f4 Check for float errors when running tesseract and lstmtraining
Some illegal floating point calculations like division by zero,
illegal value or overflow will now abort tesseract with an error
message.

For lstmtraining there is now a new parameter --debug_float to
enable the same kind of checks. It is currently disabled by default
because such errors occur and would abort the training process.
That should be fixed in the future.

If tesseract also shows floating point errors which cannot be
fixed easily, a similar parameter to enable the checks can be
added there, too.

The new code requires the function feenableexcept which is only
available with the GNU libc, so it is only used on Linux.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:49:27 +01:00
Stefan Weil
51a214a51b Remove unused include statements for imagedata.h and document used ones
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:42:28 +01:00
Stefan Weil
1d7a981203 Disable code for unused classes WordFeature and FloatWordFeature
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:42:17 +01:00
Stefan Weil
5cac52173e Replace PointerVector by std::vector in class DocumentCache
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 21:42:07 +01:00
Stefan Weil
387acd9881 Initialize weight matrix with 0.0 (fix issue #3229)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 18:49:39 +01:00
Egor Pugin
1ab6b0fbc6
Merge pull request #3311 from stweil/master
Replace calls of exit function
2021-02-26 17:43:53 +03:00