Commit Graph

4858 Commits

Author SHA1 Message Date
zdenop
e44c60c3b2 cmake: respect -DTESSDATA_PREFIX=/path (on linux) 2019-05-25 08:31:26 +02:00
Stefan Weil
32dcfd06ba Replace Tensorflow by TensorFlow
The name is written in camel case, see https://www.tensorflow.org/.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 17:14:28 +02:00
Stefan Weil
1ba8c97cac Fix linking of unittest with Tensorflow
This does not add Tensorflow tests. It only fixes the linker errors.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 17:08:48 +02:00
Stefan Weil
2441e4d8ac Implement check for Tensorflow header file
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.

Simplify also the rules for linking Tensorflow.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 16:52:14 +02:00
Stefan Weil
9cdf041448 Remove "third_party/" in comments and update path names
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:12:52 +02:00
Stefan Weil
4382ab1a34 Support build with Tensorflow
It expects include files in /usr/include/tensorflow.

* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
  (so the Tensorflow sources are not needed for the build)
* Update Makefiles

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:11:31 +02:00
Zdenko Podobný
c69ee9af24 cmake: fix tiff linking to executable if tiffio.h is found 2019-05-24 11:12:39 +02:00
Zdenko Podobný
0f1e13a859 cmake: fix warning 2019-05-24 10:59:59 +02:00
Zdenko Podobný
294f548ac1 fix missing tiff format 2019-05-24 10:39:17 +02:00
Stefan Weil
3f74da5da9 lstmtrainer: Set constant kLearningRateDecay at compile time
sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2.

This also fixes a compiler warning:

    src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-23 15:01:23 +02:00
zdenop
4bab7dd83d
Merge pull request #2451 from Bharat123rox/lgtm
Some LGTM alert fixes and potential bugfixes
2019-05-22 12:19:44 +02:00
Egor Pugin
fea1f3970b
Merge pull request #2452 from stweil/tprintf
tprintf: Make code reentrant and use less memory
2019-05-22 12:31:34 +03:00
Egor Pugin
8f99880a7a
Merge pull request #2453 from stweil/crashcode
Remove SavePixForCrash and related code
2019-05-22 12:30:29 +03:00
bact
aac6f593f3
Update normstrngs_test.cc 2019-05-22 15:21:16 +07:00
bact
e05c5ecfcc
Fix Thai valid text and add Thai illegal sequences
- Fix a invalid sequence in "valid text" `kScriptText`
- Add two illegal sequence in `kBadlyFormedThaiWords`
2019-05-22 15:19:49 +07:00
Bharat123rox
bc3ea622a6 Fix bug in max_max_dist 2019-05-22 08:21:30 +02:00
Bharat123rox
0bf45e81e7 Fix LGTM and revert bugfix for later PR 2019-05-22 11:23:27 +05:30
Bharat123rox
945ccac85a Fix syntax error 2019-05-22 10:10:12 +05:30
Stefan Weil
6514479e8f Remove SavePixForCrash and related code
That debugging code uses very much memory and is no longer useful.

    text	   data	    bss	    dec	    hex	filename
     815	      0	 262144	 262959	  4032f	src/ccutil/globaloc.o

Remove also the function err_exit which was only used in ccmain/reject.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 20:25:58 +02:00
Stefan Weil
078a129674 tprintf: Make code reentrant and use less memory
Reduce the maximum message size from 64 KiB to 2 KiB which still should
be large enought for trace messages.

Create the smaller message on the stack instead of using a global
array to allow reentrancy and to reduce the memory use of Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 20:22:58 +02:00
Stefan Weil
c926bdb265 configure: Use a hopefully more robust way to fix AX_CHECK_COMPILE_FLAG
The check for -Wno-extra-semi-stmt failed on Linux with clang++-7.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 20:21:05 +02:00
Bharat123rox
7f31a0634d Some LGTM fixes and potential bugfixes 2019-05-21 23:24:50 +05:30
zdenop
b96df3a33a
Merge pull request #2448 from stweil/pi
Remove local definition of M_PI
2019-05-21 11:47:51 +02:00
Stefan Weil
d2ca81e794 Remove local definition of M_PI
It is defined for all platforms when math.h or cmath is included
after defining the macro _USE_MATH_DEFINES.

Define _USE_MATH_DEFINES before any include statement to make sure
that M_PI gets defined. It is not necessary to define it conditionally
only for Windows.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-20 21:18:52 +02:00
Stefan Weil
d6c1fa766c configure: Fix for clang++-8 and newer
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:

    configure:4224: checking whether C++ compiler accepts -mavx
    configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx  conftest.cpp >&5
    conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
      ;
      ^
    1 error generated.

Add -Wno-extra-semi-stmt to disable those errors if possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-20 10:52:39 +02:00
zdenop
b753ff62ee
Merge pull request #2445 from stweil/errcode
Fix compiler warnings
2019-05-20 09:31:28 +02:00
Stefan Weil
64bdceee69 Fix compiler warnings
This fixes lots of warnings related to ERRCODE like the following one:

    src/ccutil/errcode.h:81:15: warning:
      declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-19 22:10:22 +02:00
Stefan Weil
09edd1a604 Fix out-of-bounds writes in Classify::ReadNewCutoffs
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.

On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):

    [ RUN      ] TesseractTest.InitConfigOnlyTest
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug

Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:39:55 +02:00
Stefan Weil
639781b5c8 stringrenderer_test: Get system locale only once
This fixes a runtime exception on macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:24:13 +02:00
Stefan Weil
bb226c19ab Update abseil submodule to HEAD
Abseil suggests to use the latest code:
https://abseil.io/about/philosophy#we-recommend-that-you-choose-to-live-at-head

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-17 15:03:43 +02:00
zdenop
2308cbf87f
Merge pull request #2444 from zdenop/fix_travis
fix typo
2019-05-17 11:26:40 +02:00
zdenop
a54e345c9b fix typo 2019-05-17 11:19:07 +02:00
Zdenko Podobný
5282cdf7be another improvement for ca0be2fb72 2019-05-17 11:04:42 +02:00
Zdenko Podobný
e92a424efa try to fix ca0be2fb72 2019-05-17 10:51:06 +02:00
zdenop
af3dd1af06 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-05-16 23:19:42 +02:00
zdenop
ca0be2fb72 cmake: fix travis build 2019-05-16 23:18:13 +02:00
Stefan Weil
68d7a679e4 Replace CR-LF line endings by LF
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:49:01 +02:00
Stefan Weil
cc754ed1e0 Remove space at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:49:01 +02:00
zdenop
198bbe3df5
Merge pull request #2441 from stweil/linkfix
Fix unittest build without legacy code and use locale for most unittests
2019-05-16 19:12:15 +02:00
Stefan Weil
8e7b1119b5 Run more unittests with the user's locale
Hopefully this improves the test coverage.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
Stefan Weil
59e31e958b Fix more build error for compilation without legacy engine
Skip the tests which need the legacy code.
Add also code to those tests to use the user's locale to test that, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
Stefan Weil
780986ebfb Fix linker error for baseapi_test when building without legacy engine
Linker error reported in issue #2439:

    unittest/baseapi_test.cc:190:
      undefined reference to
      `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 18:12:55 +02:00
zdenop
3864d0d088
Merge pull request #2440 from stweil/linkfix
Fix linker error for baseapi_test when building without legacy engine
2019-05-16 17:31:35 +02:00
Stefan Weil
f097b8a358 Fix linker error for baseapi_test when building without legacy engine
Linker error reported in issue #2439:

    unittest/baseapi_test.cc:190:
      undefined reference to
      `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 17:17:56 +02:00
zdenop
7e9d2f4bc4
Merge pull request #2432 from nickjwhite/hocrmoretypes
Add different classes to hocr output depending on BlockType
2019-05-16 17:02:48 +02:00
zdenop
b124a5f6ca
Merge pull request #2437 from stweil/locale-fix
Fix some unittests with locale de_DE.UTF-8
2019-05-16 17:02:02 +02:00
Stefan Weil
331cc84d8d Remove assertions for unsupported locale settings
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.

Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 13:59:39 +02:00
Stefan Weil
77f9bad3c2 Fix UNICHARSET::save_to_string for locale de_DE.UTF-8
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:39:59 +02:00
Stefan Weil
36ed6da349 Fix baseapi_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/baseapi_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 12 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 10 tests from TesseractTest
    [ RUN      ] TesseractTest.ArraySizeTest
    [       OK ] TesseractTest.ArraySizeTest (0 ms)
    [ RUN      ] TesseractTest.BasicTesseractTest
    [       OK ] TesseractTest.BasicTesseractTest (1251 ms)
    [ RUN      ] TesseractTest.IteratesParagraphsEvenIfNotDetected
    [       OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
    [ RUN      ] TesseractTest.HOCRWorksWithoutSetInputName
    [       OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
    [ RUN      ] TesseractTest.HOCRContainsBaseline
    [       OK ] TesseractTest.HOCRContainsBaseline (389 ms)
    [ RUN      ] TesseractTest.RickSnyderNotFuckSnyder
    [       OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
    [ RUN      ] TesseractTest.AdaptToWordStrTest
    Trying to adapt "136
    " to "1 3 6"
    Trying to adapt "256
    " to "2 5 6"
    Trying to adapt "410
    " to "4 1 0"
    Trying to adapt "432
    " to "4 3 2"
    Trying to adapt "540
    " to "5 4 0"
    Trying to adapt "692
    " to "6 9 2"
    Trying to adapt "779
    " to "7 7 9"
    Trying to adapt "793
    " to "7 9 3"
    Trying to adapt "808
    " to "8 0 8"
    Trying to adapt "815
    " to "8 1 5"
    Trying to adapt "12
    " to "1 2"
    Trying to adapt "12
    " to "1 2"
    [       OK ] TesseractTest.AdaptToWordStrTest (788 ms)
    [ RUN      ] TesseractTest.BasicLSTMTest
    [       OK ] TesseractTest.BasicLSTMTest (4525 ms)
    [ RUN      ] TesseractTest.LSTMGeometryTest
    [       OK ] TesseractTest.LSTMGeometryTest (615 ms)
    [ RUN      ] TesseractTest.InitConfigOnlyTest
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.232621 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.231864 in normproto file is not in unichar set.
    [...]
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.233915 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.221755 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar ? in normproto file is not in unichar set.
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
    [INFO]  Lang eng took 327ms in regular init
    [INFO]  Lang chi_tra took 1422ms in regular init
    Abort trap: 6

TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:05:09 +02:00
Stefan Weil
0dcc889e8d Fix apiexample_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/apiexample_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 4 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from EuroText
    [ RUN      ] EuroText.FastLatinOCR
    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-15 22:43:47 +02:00