Commit Graph

858 Commits

Author SHA1 Message Date
Stefan Weil
3452c8ebee Fix out-of-bounds writes in Classify::ReadNewCutoffs
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.

On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):

    [ RUN      ] TesseractTest.InitConfigOnlyTest
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug

Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:59:48 +02:00
Stefan Weil
d81f6a07a1 oldlist: Fix comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:41:11 +02:00
Stefan Weil
d0a43101c3 Remove space at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 14:36:45 +02:00
Nick White
c0c53c785f Add different classes to hocr output depending on BlockType
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-17 14:34:20 +02:00
Stefan Weil
96f6fc28b8 Remove assertions for unsupported locale settings
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.

Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:24:02 +02:00
Stefan Weil
fb926243bd Fix UNICHARSET::save_to_string for locale de_DE.UTF-8
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:23:40 +02:00
Stefan Weil
728f5d937f Fix baseapi_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/baseapi_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 12 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 10 tests from TesseractTest
    [ RUN      ] TesseractTest.ArraySizeTest
    [       OK ] TesseractTest.ArraySizeTest (0 ms)
    [ RUN      ] TesseractTest.BasicTesseractTest
    [       OK ] TesseractTest.BasicTesseractTest (1251 ms)
    [ RUN      ] TesseractTest.IteratesParagraphsEvenIfNotDetected
    [       OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
    [ RUN      ] TesseractTest.HOCRWorksWithoutSetInputName
    [       OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
    [ RUN      ] TesseractTest.HOCRContainsBaseline
    [       OK ] TesseractTest.HOCRContainsBaseline (389 ms)
    [ RUN      ] TesseractTest.RickSnyderNotFuckSnyder
    [       OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
    [ RUN      ] TesseractTest.AdaptToWordStrTest
    Trying to adapt "136
    " to "1 3 6"
    Trying to adapt "256
    " to "2 5 6"
    Trying to adapt "410
    " to "4 1 0"
    Trying to adapt "432
    " to "4 3 2"
    Trying to adapt "540
    " to "5 4 0"
    Trying to adapt "692
    " to "6 9 2"
    Trying to adapt "779
    " to "7 7 9"
    Trying to adapt "793
    " to "7 9 3"
    Trying to adapt "808
    " to "8 0 8"
    Trying to adapt "815
    " to "8 1 5"
    Trying to adapt "12
    " to "1 2"
    Trying to adapt "12
    " to "1 2"
    [       OK ] TesseractTest.AdaptToWordStrTest (788 ms)
    [ RUN      ] TesseractTest.BasicLSTMTest
    [       OK ] TesseractTest.BasicLSTMTest (4525 ms)
    [ RUN      ] TesseractTest.LSTMGeometryTest
    [       OK ] TesseractTest.LSTMGeometryTest (615 ms)
    [ RUN      ] TesseractTest.InitConfigOnlyTest
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.232621 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.231864 in normproto file is not in unichar set.
    [...]
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.233915 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.221755 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar ? in normproto file is not in unichar set.
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
    [INFO]  Lang eng took 327ms in regular init
    [INFO]  Lang chi_tra took 1422ms in regular init
    Abort trap: 6

TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:23:24 +02:00
Stefan Weil
c2444e75e4 Fix apiexample_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/apiexample_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 4 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from EuroText
    [ RUN      ] EuroText.FastLatinOCR
    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:21:01 +02:00
Stefan Weil
bf74471113 Fix Doxygen comments for void functions
Void functions should not use @return. It causes compiler warnings
like this one:

    src/classify/intproto.cpp:326:5: warning:
      '@return' command used in a comment that is attached to a function
      returning void [-Wdocumentation]

Some non-void functions also were documented with @return none.
Fix those comments, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:19:01 +02:00
Stefan Weil
9bc576fafc normmatch: Remove unused private function
PrintNormMatch was unused. Remove it and remove also an unused prototype.
Make the only remaining private function NormEvidenceOf static.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:18:44 +02:00
Stefan Weil
25f2e0cb10 Replace sscanf by std::istringstream
Using std::istringstream allows conversion of string to float
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:18:06 +02:00
Stefan Weil
8cc751136d Fix reading of parameter from traineddata normproto component
The NonEssential parameter was wrongly derived from linear_token instead
of essential_token and therefore always set to true.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:17:51 +02:00
Stefan Weil
73a08678dc Fix Doxygen comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:17:16 +02:00
Stefan Weil
70ffe33976 Fix cast from pointer to integer type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:16:56 +02:00
zdenop
57bf215d14 ScrollView: remove custom implementation of GetAddrInfo 2019-05-05 20:03:50 +02:00
zdenop
9cd60b2b90 remove unused include 2019-05-05 20:03:50 +02:00
Stefan Weil
98be949f5d tesscallback: Remove more unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-05 20:03:50 +02:00
Stefan Weil
3ae4069411 tesscallback: Remove unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-05 20:03:50 +02:00
zdenop
c4bb4b933b correct tessdata comment in baseapi.h 2019-05-04 14:35:41 +02:00
Stefan Weil
78ed5ef8b9 universalambigs: Add missing include file
This allows fixing two compiler warnings from clang++:

    src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations]
    src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:35:41 +02:00
Stefan Weil
a8c8a96107 commandlineflags: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00
Stefan Weil
8831cbfead paramsd: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00
Stefan Weil
231da0064a clusttool: Replace strtof by std::stringstream
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:58 +02:00
Stefan Weil
97f6711ce0 clusttool: Remove unused code and some global functions
* WriteProtoList is unused. Remove it.

* ReadNFloats, WriteNFloats and WriteProtoStyle are only used locally,
  so make them local.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:34 +02:00
Stefan Weil
1d14d15902 Fix some typos (most found and fixed by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 14:34:34 +02:00
zdenop
ef33a06e65 fix crash in case of missing PNG support in Leptonica see #2333 2019-05-01 20:14:26 +02:00
zdenop
b2fc3eba8f fix documentation about datapath: ending "/" is not relevant 2019-05-01 12:26:52 +02:00
Jeff Breidenbach
f70859f1fa fix #1900: intraword spacing for slightly better pdf copy-paste performance 2019-05-01 12:26:52 +02:00
zdenop
2746566ecc Print info when uzn file is used. 2019-05-01 12:26:52 +02:00
Zdenko Podobný
0d132e40d8 fix spelling 2019-05-01 12:26:52 +02:00
Zdenko Podobný
9132bc73ef remove unused variable 2019-05-01 12:26:52 +02:00
Stefan Weil
c1f70e27c9 Fix build for Windows
* winsock2.h is case sensitive, lower case is required for cross build.
* ws2tcpip.h is required for addrinfo.
* FreeAddrInfo conflicts with existing freeaddrinfo, so rename it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:26:52 +02:00
zdenop
9587e17043 svutil.cpp: fix windows build 2019-05-01 12:26:52 +02:00
Stefan Weil
315bd3a9c8 Only include windows.h using host.h
host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be
set before including windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:26:52 +02:00
Stefan Weil
668f59f3f8 Clean macros in platform.h
* Remove unused macros ultoa, SIGNED.
* Move macros NOMINMAX and WIN32_LEAN_AND_MEAN to host.h
  because they are used when including windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:26:52 +02:00
Stefan Weil
b7e3122174 svutil: Clean include file
* Remove MIN, MAX macros. They are unused.
* Include windows.h indirectly by including host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:26:52 +02:00
Stefan Weil
c774471086 Remove host.h from Tesseract API
It is not needed by other API header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:26:52 +02:00
Stefan Weil
57604ec59d Fix Windows build
timeval is declared in winsock2.h, so add the missing include statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 12:16:01 +02:00
Stefan Weil
53dd6ca0d2 Fix typo in description
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:30:00 +02:00
Shree
6aa887d6d4 fix the coordinates for EOL tab 2019-05-01 11:29:44 +02:00
zdenop
08b6dc504e remove unused includes 2019-05-01 11:29:29 +02:00
zdenop
8ee5c865f1 MSVS support inttypes.h from VS 2015 2019-05-01 11:29:03 +02:00
zdenop
30078d8aa8 fix missing EOL 2019-05-01 11:26:06 +02:00
Stefan Weil
66e35c171a Don't include windows.h from platform.h
This partially reverts commit c150b9832d.
Now params.cpp includes host.h which also gets the definition for MAX_PATH.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:25:45 +02:00
Stefan Weil
89d09cf537 Remove unneeded include statements for pgedit.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:24:20 +02:00
Stefan Weil
1d6e57adb8 pgedit: Remove unused global functions
pgeditor_show_point is unused, so remove it completely.
Some more functions are only used locally, so make them static functions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:24:08 +02:00
Stefan Weil
afa2fff536 pdfrenderer: Replace snprintf by std::stringstream
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Some snprintf statements are not needed at all because a constant string
can be appended directly.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:23:42 +02:00
Stefan Weil
09cb0bcc7a baseapi: Use std::stringstream to format float values
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:23:30 +02:00
Stefan Weil
1eea24ea78 Remove strtofloat
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:23:13 +02:00
Stefan Weil
38f68aa615 Replace sscanf by std::stringstream
Using std::stringstream allows working with the C locale, independent
of the current locale settings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 11:22:45 +02:00