Commit Graph

5817 Commits

Author SHA1 Message Date
Stefan Weil
981c167f8c Improve result message from lstmeval
Old message:

    At iteration 0, stage 0, BCER eval=2.553356, BWER eval=5.586173

New message:

    BCER eval=2.553356, BWER eval=5.586173

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-17 09:02:49 +01:00
Stefan Weil
c716ebdc42
Improve training messages (issue #3560) (#3644)
The old messages could wrongly be interpreted as CER / WER values,
but Tesseract training currently uses simple bag of characters /
bag of words error rates (see LSTMTrainer::ComputeCharError,
LSTMTrainer::ComputeWordError).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-17 09:39:23 +02:00
Stefan Weil
ef3bf98cc1 lstmtrainer: Fix comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:19:54 +01:00
Stefan Weil
83ad8a18de Clean code with clang-tidy (performance-move-const)
Command used:

    clang-tidy --checks="-*,performance-move-const-arg"

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:18:29 +01:00
Stefan Weil
f48620fffb scrollview: Add const attributes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:17:59 +01:00
Stefan Weil
66dc90bc5f Create new release 5.0.0-rc2
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 20:04:23 +01:00
Stefan Weil
f0b8c0254b stepblob: Fix some warnings from clang-tidy
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:40:38 +01:00
Stefan Weil
25cdca6492 combine_tessdata: Print "Version:" instead of "Version string:"
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:38:52 +01:00
Stefan Weil
d8d63fd71b Optimize performance with clang-tidy
The code was partially formatted with clang-format and optimized with

    clang-tidy --checks="-*,perfor*" --fix src/*/*.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 15:54:04 +01:00
Stefan Weil
e5011c545a Remove unused function ScrollView::AwaitEventAnyWindow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 12:10:37 +01:00
Stefan Weil
37b33749da ScrollView: Fix memory leak and modernize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 10:34:20 +01:00
Stefan Weil
371ee2232e Remove spaces at line endings and empty last lines
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 22:45:47 +01:00
Stefan Weil
e18826cfab Fix some compiler warnings and modernize code in class TrainingSampleSet
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 22:33:22 +01:00
Stefan Weil
6360e60877 Modernize code in TessBaseAPI::Init
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:43:46 +01:00
Stefan Weil
03f2cfdf02 Show tessdata directory when listing models
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:43:01 +01:00
Stefan Weil
c2ee0cd06f Fix listing of languages
The last fix for OCR with more than one model introduced
a regression for `tesseract --list-langs`.

Fixes: 9091055783 ("Fix loading of additional model files")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:34:29 +01:00
Stefan Weil
ebce8ab2eb combine_tessdata: Support -dl and -ld options
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 11:33:10 +01:00
Stefan Weil
905795041f Fix new GitHub action CIFuzz
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 09:56:26 +01:00
Stefan Weil
3378d79ae6 Add new GitHub action CIFuzz
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 09:42:04 +01:00
Stefan Weil
5884036ecd Don't use compiler flags -march=native -mtune=native in autoconf builds
Using those flags is not acceptable for Linux distributions
because the resulting code then depends on the build
infrastructure, so the build result is not deterministic.

It is still possible to use those compiler flags by specifying
CXXFLAGS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-11 12:29:51 +01:00
Stefan Weil
9091055783 Fix loading of additional model files (issue #3635)
Modernize also a for loop statement.

Fixes: d6de055acf ("Set default language for tesseract only if required")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-10 20:34:06 +01:00
Amit D
827900675b
Don't add a page separator for a single page image (#3632)
This change was requested in issue #3628.
2021-11-08 20:49:49 +01:00
Stefan Weil
2fbe4f54bb Fix out-of-memory in fuzzer-api (oss-fuzz issue #39185)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-07 13:49:30 +01:00
Stefan Weil
183bb3f519 Use TDimension for arguments of make_edgept
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-06 10:01:22 +01:00
Stefan Weil
6c7cfe41cc Remove some unneeded type casts
Those type casts were also wrong for large image support.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-06 10:01:22 +01:00
Amit D
4469053a9b
Update unittest-disablelegacy.yml 2021-11-05 14:06:46 +02:00
Amit D
8865fefdba
Improve the disable legacy build (#3627)
Undo API changes done in e9b8b840bf.
2021-11-04 18:26:15 +02:00
Amit D
49715f4d27
pagesegmode_test.cc: Disable some code for disable legacy build (#3626)
Co-authored-by: Shree Devi Kumar <5095331+Shreeshrii@users.noreply.github.com>
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-11-04 12:49:32 +01:00
Amit D
e9b8b840bf
Improve the disable legacy build (#3624)
Disable more code related to equation detection and osd.
2021-11-03 19:15:15 +01:00
Amit D
5da09f241c README: Remove the reference to version 3.05.02
Versions 4.1.1 and 5.0.0 still support the legacy engine with the same functionality as 3.05.02, so there is no reason to mention 3.05.02.
2021-11-03 17:53:13 +01:00
Stefan Weil
62bfbf5aa4 Use bool instead of int8_t for boolean variable
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 11:22:14 +01:00
Stefan Weil
333f7bfc5c Use bool instead of int for boolean variable
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 11:02:30 +01:00
Stefan Weil
87a5689f8d Format code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 10:57:40 +01:00
Stefan Weil
a91ea10924 Optimize function ApproximateOutline
The compiler can now inline several functions which are
only used in this compilation unit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 10:53:35 +01:00
Amit D
b77009bd59
configure.ac: Update minimum required autoconf version to 2.69
This version was released in April 2012.

It is supported by old Linux distros like RHEL/CentOS 7, SLES 12 and Ubuntu 14.04.
2021-11-02 15:49:46 +02:00
Stefan Weil
17e795aaae Add missing include statement for INT_MIN, INT_MAX
Fixes: c6b25f3b6e ("Add assertions in IntCastRounded")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-02 10:20:37 +01:00
Stefan Weil
c6b25f3b6e Add assertions in IntCastRounded
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39185 could be
caused by an integer overflow in IntCastRounded.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-02 07:52:31 +01:00
Stefan Weil
565d3912c6 Fix compiler warnings with -Wformat-security
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-01 22:58:56 +01:00
Stefan Weil
7058bbf282 Move googletest to unittest/third_party/googletest
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-01 11:50:50 +01:00
Stefan Weil
a5f2f90c8d Fix legacy build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-01 08:34:34 +01:00
Egor Pugin
1258386e72
Merge pull request #3619 from stweil/move_tesseractmain
Move src/api/tesseractmain.cpp to src/tesseract.cpp
2021-11-01 01:55:52 +03:00
Stefan Weil
104ef8f30e Move src/api/tesseractmain.cpp to src/tesseract.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-31 21:43:30 +01:00
Stefan Weil
c0b529f2e1 Move declaration of ThresholdMethod from public API to thresholder.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 20:15:25 +02:00
Stefan Weil
97cd07f2a0 Add format attributes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 19:55:27 +02:00
Stefan Weil
68017dbf2a lstmtraining: Handle missing traineddata with error message (fix issue #1075)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 12:27:35 +02:00
Stefan Weil
2a66694754 Format API headers with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 10:00:27 +02:00
Stefan Weil
ca9ea78494 Format code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:42:41 +02:00
Stefan Weil
57af712f2f Fix some compiler warnings for unused parameters
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:39:05 +02:00
Stefan Weil
20203de8d9 Fix format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:37:30 +02:00
Stefan Weil
8b6390846e Create new release 5.0.0-rc1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-29 22:32:11 +02:00