Commit Graph

3931 Commits

Author SHA1 Message Date
Stefan Weil
09edd1a604 Fix out-of-bounds writes in Classify::ReadNewCutoffs
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.

On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):

    [ RUN      ] TesseractTest.InitConfigOnlyTest
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug

Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:39:55 +02:00
Stefan Weil
639781b5c8 stringrenderer_test: Get system locale only once
This fixes a runtime exception on macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:24:13 +02:00
Stefan Weil
bb226c19ab Update abseil submodule to HEAD
Abseil suggests to use the latest code:
https://abseil.io/about/philosophy#we-recommend-that-you-choose-to-live-at-head

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 15:03:43 +02:00
zdenop
2308cbf87f
Merge pull request #2444 from zdenop/fix_travis
fix typo
2019-05-17 11:26:40 +02:00
zdenop
a54e345c9b fix typo 2019-05-17 11:19:07 +02:00
Zdenko Podobný
5282cdf7be another improvement for ca0be2fb72 2019-05-17 11:04:42 +02:00
Zdenko Podobný
e92a424efa try to fix ca0be2fb72 2019-05-17 10:51:06 +02:00
zdenop
af3dd1af06 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-05-16 23:19:42 +02:00
zdenop
ca0be2fb72 cmake: fix travis build 2019-05-16 23:18:13 +02:00
Stefan Weil
68d7a679e4 Replace CR-LF line endings by LF
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:49:01 +02:00
Stefan Weil
cc754ed1e0 Remove space at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:49:01 +02:00
zdenop
198bbe3df5
Merge pull request #2441 from stweil/linkfix
Fix unittest build without legacy code and use locale for most unittests
2019-05-16 19:12:15 +02:00
Stefan Weil
8e7b1119b5 Run more unittests with the user's locale
Hopefully this improves the test coverage.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
Stefan Weil
59e31e958b Fix more build error for compilation without legacy engine
Skip the tests which need the legacy code.
Add also code to those tests to use the user's locale to test that, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
Stefan Weil
780986ebfb Fix linker error for baseapi_test when building without legacy engine
Linker error reported in issue #2439:

    unittest/baseapi_test.cc:190:
      undefined reference to
      `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
zdenop
3864d0d088
Merge pull request #2440 from stweil/linkfix
Fix linker error for baseapi_test when building without legacy engine
2019-05-16 17:31:35 +02:00
Stefan Weil
f097b8a358 Fix linker error for baseapi_test when building without legacy engine
Linker error reported in issue #2439:

    unittest/baseapi_test.cc:190:
      undefined reference to
      `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 17:17:56 +02:00
zdenop
7e9d2f4bc4
Merge pull request #2432 from nickjwhite/hocrmoretypes
Add different classes to hocr output depending on BlockType
2019-05-16 17:02:48 +02:00
zdenop
b124a5f6ca
Merge pull request #2437 from stweil/locale-fix
Fix some unittests with locale de_DE.UTF-8
2019-05-16 17:02:02 +02:00
Stefan Weil
331cc84d8d Remove assertions for unsupported locale settings
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.

Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 13:59:39 +02:00
Stefan Weil
77f9bad3c2 Fix UNICHARSET::save_to_string for locale de_DE.UTF-8
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:39:59 +02:00
Stefan Weil
36ed6da349 Fix baseapi_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/baseapi_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 12 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 10 tests from TesseractTest
    [ RUN      ] TesseractTest.ArraySizeTest
    [       OK ] TesseractTest.ArraySizeTest (0 ms)
    [ RUN      ] TesseractTest.BasicTesseractTest
    [       OK ] TesseractTest.BasicTesseractTest (1251 ms)
    [ RUN      ] TesseractTest.IteratesParagraphsEvenIfNotDetected
    [       OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
    [ RUN      ] TesseractTest.HOCRWorksWithoutSetInputName
    [       OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
    [ RUN      ] TesseractTest.HOCRContainsBaseline
    [       OK ] TesseractTest.HOCRContainsBaseline (389 ms)
    [ RUN      ] TesseractTest.RickSnyderNotFuckSnyder
    [       OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
    [ RUN      ] TesseractTest.AdaptToWordStrTest
    Trying to adapt "136
    " to "1 3 6"
    Trying to adapt "256
    " to "2 5 6"
    Trying to adapt "410
    " to "4 1 0"
    Trying to adapt "432
    " to "4 3 2"
    Trying to adapt "540
    " to "5 4 0"
    Trying to adapt "692
    " to "6 9 2"
    Trying to adapt "779
    " to "7 7 9"
    Trying to adapt "793
    " to "7 9 3"
    Trying to adapt "808
    " to "8 0 8"
    Trying to adapt "815
    " to "8 1 5"
    Trying to adapt "12
    " to "1 2"
    Trying to adapt "12
    " to "1 2"
    [       OK ] TesseractTest.AdaptToWordStrTest (788 ms)
    [ RUN      ] TesseractTest.BasicLSTMTest
    [       OK ] TesseractTest.BasicLSTMTest (4525 ms)
    [ RUN      ] TesseractTest.LSTMGeometryTest
    [       OK ] TesseractTest.LSTMGeometryTest (615 ms)
    [ RUN      ] TesseractTest.InitConfigOnlyTest
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.232621 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.231864 in normproto file is not in unichar set.
    [...]
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.233915 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.221755 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar ? in normproto file is not in unichar set.
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
    [INFO]  Lang eng took 327ms in regular init
    [INFO]  Lang chi_tra took 1422ms in regular init
    Abort trap: 6

TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:05:09 +02:00
Stefan Weil
0dcc889e8d Fix apiexample_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/apiexample_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 4 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from EuroText
    [ RUN      ] EuroText.FastLatinOCR
    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-15 22:43:47 +02:00
zdenop
4b397c70cc
Merge pull request #2434 from stweil/configure
configure: Fix for latest developer tools on macOS
2019-05-15 07:31:44 +02:00
Stefan Weil
7917ffb6c2 configure: Fix for latest developer tools on macOS
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wunused-macros. Add -Wno-unused-macros to disable those
errors if possible.

Simplify also the setting of several conditionals (AVX, AVX2, ...).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 22:31:23 +02:00
Stefan Weil
6b1e709b19 Fix Doxygen comments for void functions
Void functions should not use @return. It causes compiler warnings
like this one:

    src/classify/intproto.cpp:326:5: warning:
      '@return' command used in a comment that is attached to a function
      returning void [-Wdocumentation]

Some non-void functions also were documented with @return none.
Fix those comments, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 21:57:17 +02:00
Stefan Weil
caa04882fd normmatch: Remove unused private function
PrintNormMatch was unused. Remove it and remove also an unused prototype.
Make the only remaining private function NormEvidenceOf static.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 20:56:04 +02:00
Nick White
068eb4c35d Add different classes to hocr output depending on BlockType
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-14 13:25:08 +01:00
Egor Pugin
b9b74a6942 Update sw build. 2019-05-13 01:54:23 +03:00
zdenop
746674fcd5
Merge pull request #2430 from stweil/fix
Fix reading of parameter from traineddata normproto component and make function independent of locale
2019-05-12 15:59:41 +02:00
Stefan Weil
5d92fbf010 Replace sscanf by std::istringstream
Using std::istringstream allows conversion of string to float
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 15:04:30 +02:00
Stefan Weil
c76ceafcdf Fix reading of parameter from traineddata normproto component
The NonEssential parameter was wrongly derived from linear_token instead
of essential_token and therefore always set to true.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 14:43:58 +02:00
Stefan Weil
c07bc4e014 Fix Doxygen comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 08:55:23 +02:00
Stefan Weil
c8e96e2c02 Fix cast from pointer to integer type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 08:54:46 +02:00
Zdenko Podobný
3f4dcf3c8b cmake: uninstall target 2019-05-08 19:19:26 +02:00
zdenop
a94334a255 cmake: fix build without pkg-config (issue #2424) 2019-05-08 18:49:48 +02:00
Zdenko Podobný
68ca3518be autotools: remove list of traineddata files 2019-05-08 15:36:58 +02:00
zdenop
28cfaaae43
Merge pull request #2423 from jbarlow83/fix-cppflags
Fix CPPFLAGS configuration for icu4c and libarchive
2019-05-07 11:28:59 +02:00
James R. Barlow
403361701a Fix CPPFLAGS configuration for icu4c and libarchive missing from configure.ac 2019-05-07 02:01:20 -07:00
zdenop
7a5b9b8fcd ScrollView: remove custom implementation of GetAddrInfo 2019-05-04 15:16:41 +02:00
zdenop
5e01f74648 remove unused include 2019-05-04 15:14:54 +02:00
zdenop
83e92e0179
Merge pull request #2422 from stweil/include
tesscallback: Remove unused code
2019-05-04 12:20:23 +02:00
Stefan Weil
aba037329a tesscallback: Remove more unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 11:05:50 +02:00
Stefan Weil
57ff92e4bf tesscallback: Remove unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 22:14:04 +02:00
zdenop
9192c3afe2 correct tessdata comment in baseapi.h 2019-05-02 08:43:04 +02:00
zdenop
7e48368a5e
Merge pull request #2421 from stweil/includes
universalambigs: Add missing include file
2019-05-02 08:36:49 +02:00
zdenop
39d3824c78
Merge pull request #2420 from stweil/locale
Fix more locale dependencies
2019-05-02 08:31:41 +02:00
zdenop
4b77d9e806
Merge pull request #2419 from stweil/typos
Fix some typos (most found and fixed by codespell)
2019-05-02 08:29:13 +02:00
Stefan Weil
cd749be473 universalambigs: Add missing include file
This allows fixing two compiler warnings from clang++:

    src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations]
    src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:36:31 +02:00
Stefan Weil
4fbc0a257b commandlineflags: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:46 +02:00