Commit Graph

4078 Commits

Author SHA1 Message Date
bact
cd49b61c67 Fix Thai valid text and add Thai illegal sequences
- Fix a invalid sequence in "valid text" `kScriptText`
- Add two illegal sequence in `kBadlyFormedThaiWords`
2019-06-16 18:23:14 +02:00
Bharat123rox
40216e5a5e Fix bug in max_max_dist 2019-06-16 18:23:01 +02:00
Bharat123rox
a324c88563 Fix LGTM and revert bugfix for later PR 2019-06-16 18:22:50 +02:00
Bharat123rox
5f084891be Fix syntax error 2019-06-16 18:22:34 +02:00
Stefan Weil
26f05b1197 Remove SavePixForCrash and related code
That debugging code uses very much memory and is no longer useful.

    text	   data	    bss	    dec	    hex	filename
     815	      0	 262144	 262959	  4032f	src/ccutil/globaloc.o

Remove also the function err_exit which was only used in ccmain/reject.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:22:21 +02:00
Stefan Weil
7b4b330176 tprintf: Make code reentrant and use less memory
Reduce the maximum message size from 64 KiB to 2 KiB which still should
be large enought for trace messages.

Create the smaller message on the stack instead of using a global
array to allow reentrancy and to reduce the memory use of Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:22:03 +02:00
Stefan Weil
ecd0384a31 configure: Use a hopefully more robust way to fix AX_CHECK_COMPILE_FLAG
The check for -Wno-extra-semi-stmt failed on Linux with clang++-7.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:21:50 +02:00
Bharat123rox
19274eebbb Some LGTM fixes and potential bugfixes 2019-06-16 18:21:36 +02:00
Stefan Weil
0190f398bd Remove local definition of M_PI
It is defined for all platforms when math.h or cmath is included
after defining the macro _USE_MATH_DEFINES.

Define _USE_MATH_DEFINES before any include statement to make sure
that M_PI gets defined. It is not necessary to define it conditionally
only for Windows.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:21:16 +02:00
Stefan Weil
0c70c2a69f configure: Fix for clang++-8 and newer
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:

    configure:4224: checking whether C++ compiler accepts -mavx
    configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx  conftest.cpp >&5
    conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
      ;
      ^
    1 error generated.

Add -Wno-extra-semi-stmt to disable those errors if possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:21:04 +02:00
Stefan Weil
2b75680e97 Fix compiler warnings
This fixes lots of warnings related to ERRCODE like the following one:

    src/ccutil/errcode.h:81:15: warning:
      declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:20:32 +02:00
zdenop
315c9ad898
Merge pull request #2493 from jbarlow83/4.1-fix-text2image
[4.1] Fix text2image compilation on C++17 compilers
2019-06-14 20:44:18 +02:00
James R. Barlow
caee962da8 Fix text2image compilation on C++17 compilers
C++17 drops support for `std::random_shuffle`, breaking C++17 compilers
that run to compile text2image.cpp. std::shuffle is valid on C++11
through C++17, so use std::shuffle instead.

Due to the use `std::random_shuffle`, `text2image --render_ngrams`
would not give consistent results for different compilers or platforms.
With the current change, the same random number generator is used for
all platforms and initialized to the same seed, so training output
should be consistent.
2019-06-14 01:26:28 -07:00
zdenop
fe06d3a0b3
Update VERSION 2019-05-29 20:20:27 +02:00
Stefan Weil
3452c8ebee Fix out-of-bounds writes in Classify::ReadNewCutoffs
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.

On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):

    [ RUN      ] TesseractTest.InitConfigOnlyTest
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug

Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:59:48 +02:00
Stefan Weil
4763f24cbb stringrenderer_test: Get system locale only once
This fixes a runtime exception on macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:59:07 +02:00
Stefan Weil
26c294940d Update abseil submodule to HEAD
Abseil suggests to use the latest code:
https://abseil.io/about/philosophy#we-recommend-that-you-choose-to-live-at-head

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:58:49 +02:00
Stefan Weil
d81f6a07a1 oldlist: Fix comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:41:11 +02:00
Stefan Weil
d0a43101c3 Remove space at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:36:45 +02:00
Stefan Weil
403cf31e9f Replace CR-LF line endings by LF
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:35:23 +02:00
Stefan Weil
289404815e Remove space at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:35:08 +02:00
Nick White
c0c53c785f Add different classes to hocr output depending on BlockType
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-17 14:34:20 +02:00
Stefan Weil
8887cad12f Run more unittests with the user's locale
Hopefully this improves the test coverage.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:25:12 +02:00
Stefan Weil
ce6b0c024c Fix more build error for compilation without legacy engine
Skip the tests which need the legacy code.
Add also code to those tests to use the user's locale to test that, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:24:54 +02:00
Stefan Weil
80ba28ef65 Fix linker error for baseapi_test when building without legacy engine
Linker error reported in issue #2439:

    unittest/baseapi_test.cc:190:
      undefined reference to
      `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:24:35 +02:00
Stefan Weil
96f6fc28b8 Remove assertions for unsupported locale settings
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.

Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:24:02 +02:00
Stefan Weil
fb926243bd Fix UNICHARSET::save_to_string for locale de_DE.UTF-8
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:23:40 +02:00
Stefan Weil
728f5d937f Fix baseapi_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/baseapi_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 12 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 10 tests from TesseractTest
    [ RUN      ] TesseractTest.ArraySizeTest
    [       OK ] TesseractTest.ArraySizeTest (0 ms)
    [ RUN      ] TesseractTest.BasicTesseractTest
    [       OK ] TesseractTest.BasicTesseractTest (1251 ms)
    [ RUN      ] TesseractTest.IteratesParagraphsEvenIfNotDetected
    [       OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
    [ RUN      ] TesseractTest.HOCRWorksWithoutSetInputName
    [       OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
    [ RUN      ] TesseractTest.HOCRContainsBaseline
    [       OK ] TesseractTest.HOCRContainsBaseline (389 ms)
    [ RUN      ] TesseractTest.RickSnyderNotFuckSnyder
    [       OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
    [ RUN      ] TesseractTest.AdaptToWordStrTest
    Trying to adapt "136
    " to "1 3 6"
    Trying to adapt "256
    " to "2 5 6"
    Trying to adapt "410
    " to "4 1 0"
    Trying to adapt "432
    " to "4 3 2"
    Trying to adapt "540
    " to "5 4 0"
    Trying to adapt "692
    " to "6 9 2"
    Trying to adapt "779
    " to "7 7 9"
    Trying to adapt "793
    " to "7 9 3"
    Trying to adapt "808
    " to "8 0 8"
    Trying to adapt "815
    " to "8 1 5"
    Trying to adapt "12
    " to "1 2"
    Trying to adapt "12
    " to "1 2"
    [       OK ] TesseractTest.AdaptToWordStrTest (788 ms)
    [ RUN      ] TesseractTest.BasicLSTMTest
    [       OK ] TesseractTest.BasicLSTMTest (4525 ms)
    [ RUN      ] TesseractTest.LSTMGeometryTest
    [       OK ] TesseractTest.LSTMGeometryTest (615 ms)
    [ RUN      ] TesseractTest.InitConfigOnlyTest
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.232621 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.231864 in normproto file is not in unichar set.
    [...]
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.233915 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.221755 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar ? in normproto file is not in unichar set.
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
    [INFO]  Lang eng took 327ms in regular init
    [INFO]  Lang chi_tra took 1422ms in regular init
    Abort trap: 6

TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:23:24 +02:00
Stefan Weil
c2444e75e4 Fix apiexample_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/apiexample_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 4 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from EuroText
    [ RUN      ] EuroText.FastLatinOCR
    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:21:01 +02:00
Stefan Weil
ab695f882d configure: Fix for latest developer tools on macOS
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wunused-macros. Add -Wno-unused-macros to disable those
errors if possible.

Simplify also the setting of several conditionals (AVX, AVX2, ...).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:19:26 +02:00
Stefan Weil
bf74471113 Fix Doxygen comments for void functions
Void functions should not use @return. It causes compiler warnings
like this one:

    src/classify/intproto.cpp:326:5: warning:
      '@return' command used in a comment that is attached to a function
      returning void [-Wdocumentation]

Some non-void functions also were documented with @return none.
Fix those comments, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:19:01 +02:00
Stefan Weil
9bc576fafc normmatch: Remove unused private function
PrintNormMatch was unused. Remove it and remove also an unused prototype.
Make the only remaining private function NormEvidenceOf static.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:18:44 +02:00
Egor Pugin
6ed503bfaa Update sw build. 2019-05-16 20:18:24 +02:00
Stefan Weil
25f2e0cb10 Replace sscanf by std::istringstream
Using std::istringstream allows conversion of string to float
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:18:06 +02:00
Stefan Weil
8cc751136d Fix reading of parameter from traineddata normproto component
The NonEssential parameter was wrongly derived from linear_token instead
of essential_token and therefore always set to true.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:17:51 +02:00
Stefan Weil
73a08678dc Fix Doxygen comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:17:16 +02:00
Stefan Weil
70ffe33976 Fix cast from pointer to integer type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:16:56 +02:00
Zdenko Podobný
a14ae450b9 cmake: uninstall target 2019-05-16 20:16:22 +02:00
zdenop
ee024e0209 cmake: fix build without pkg-config (issue #2424) 2019-05-16 20:16:00 +02:00
Zdenko Podobný
5320320b63 autotools: remove list of traineddata files 2019-05-08 15:42:20 +02:00
James R. Barlow
8ef392cb08 Fix CPPFLAGS configuration for icu4c and libarchive missing from configure.ac 2019-05-08 15:42:10 +02:00
zdenop
57bf215d14 ScrollView: remove custom implementation of GetAddrInfo 2019-05-05 20:03:50 +02:00
zdenop
9cd60b2b90 remove unused include 2019-05-05 20:03:50 +02:00
Stefan Weil
98be949f5d tesscallback: Remove more unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-05 20:03:50 +02:00
Stefan Weil
3ae4069411 tesscallback: Remove unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-05 20:03:50 +02:00
zdenop
c4bb4b933b correct tessdata comment in baseapi.h 2019-05-04 14:35:41 +02:00
Stefan Weil
78ed5ef8b9 universalambigs: Add missing include file
This allows fixing two compiler warnings from clang++:

    src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations]
    src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:35:41 +02:00
Stefan Weil
a8c8a96107 commandlineflags: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00
Stefan Weil
8831cbfead paramsd: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00
Stefan Weil
231da0064a clusttool: Replace strtof by std::stringstream
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00