This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.
Simplify also the rules for linking Tensorflow.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It expects include files in /usr/include/tensorflow.
* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
(so the Tensorflow sources are not needed for the build)
* Update Makefiles
Signed-off-by: Stefan Weil <sw@weilnetz.de>
sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2.
This also fixes a compiler warning:
src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
That debugging code uses very much memory and is no longer useful.
text data bss dec hex filename
815 0 262144 262959 4032f src/ccutil/globaloc.o
Remove also the function err_exit which was only used in ccmain/reject.cpp.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reduce the maximum message size from 64 KiB to 2 KiB which still should
be large enought for trace messages.
Create the smaller message on the stack instead of using a global
array to allow reentrancy and to reduce the memory use of Tesseract.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It is defined for all platforms when math.h or cmath is included
after defining the macro _USE_MATH_DEFINES.
Define _USE_MATH_DEFINES before any include statement to make sure
that M_PI gets defined. It is not necessary to define it conditionally
only for Windows.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:
configure:4224: checking whether C++ compiler accepts -mavx
configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx conftest.cpp >&5
conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
;
^
1 error generated.
Add -Wno-extra-semi-stmt to disable those errors if possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes lots of warnings related to ERRCODE like the following one:
src/ccutil/errcode.h:81:15: warning:
declaration requires a global constructor [-Wglobal-constructors]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.
On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):
[ RUN ] TesseractTest.InitConfigOnlyTest
baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Skip the tests which need the legacy code.
Add also code to those tests to use the user's locale to test that, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Linker error reported in issue #2439:
unittest/baseapi_test.cc:190:
undefined reference to
`tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Linker error reported in issue #2439:
unittest/baseapi_test.cc:190:
undefined reference to
`tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.
Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The unittest failed with LANG=de_DE.UTF-8:
$ unittest/baseapi_test
Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
[==========] Running 12 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 10 tests from TesseractTest
[ RUN ] TesseractTest.ArraySizeTest
[ OK ] TesseractTest.ArraySizeTest (0 ms)
[ RUN ] TesseractTest.BasicTesseractTest
[ OK ] TesseractTest.BasicTesseractTest (1251 ms)
[ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected
[ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
[ RUN ] TesseractTest.HOCRWorksWithoutSetInputName
[ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
[ RUN ] TesseractTest.HOCRContainsBaseline
[ OK ] TesseractTest.HOCRContainsBaseline (389 ms)
[ RUN ] TesseractTest.RickSnyderNotFuckSnyder
[ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
[ RUN ] TesseractTest.AdaptToWordStrTest
Trying to adapt "136
" to "1 3 6"
Trying to adapt "256
" to "2 5 6"
Trying to adapt "410
" to "4 1 0"
Trying to adapt "432
" to "4 3 2"
Trying to adapt "540
" to "5 4 0"
Trying to adapt "692
" to "6 9 2"
Trying to adapt "779
" to "7 7 9"
Trying to adapt "793
" to "7 9 3"
Trying to adapt "808
" to "8 0 8"
Trying to adapt "815
" to "8 1 5"
Trying to adapt "12
" to "1 2"
Trying to adapt "12
" to "1 2"
[ OK ] TesseractTest.AdaptToWordStrTest (788 ms)
[ RUN ] TesseractTest.BasicLSTMTest
[ OK ] TesseractTest.BasicLSTMTest (4525 ms)
[ RUN ] TesseractTest.LSTMGeometryTest
[ OK ] TesseractTest.LSTMGeometryTest (615 ms)
[ RUN ] TesseractTest.InitConfigOnlyTest
Error: unichar ? in normproto file is not in unichar set.
Error: unichar 0.232621 in normproto file is not in unichar set.
Error: unichar 0.000400 in normproto file is not in unichar set.
Error: unichar 0.231864 in normproto file is not in unichar set.
[...]
Error: unichar ? in normproto file is not in unichar set.
Error: unichar 0.233915 in normproto file is not in unichar set.
Error: unichar 0.000400 in normproto file is not in unichar set.
Error: unichar 0.221755 in normproto file is not in unichar set.
Error: unichar 0.000400 in normproto file is not in unichar set.
Error: unichar ? in normproto file is not in unichar set.
baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
[INFO] Lang eng took 327ms in regular init
[INFO] Lang chi_tra took 1422ms in regular init
Abort trap: 6
TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The unittest failed with LANG=de_DE.UTF-8:
$ unittest/apiexample_test
Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
[==========] Running 4 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from EuroText
[ RUN ] EuroText.FastLatinOCR
contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874
Signed-off-by: Stefan Weil <sw@weilnetz.de>