tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-05 02:47:00 +08:00

Author	SHA1	Message	Date
zdenop	e44c60c3b2	cmake: respect -DTESSDATA_PREFIX=/path (on linux)	2019-05-25 08:31:26 +02:00
Stefan Weil	32dcfd06ba	Replace Tensorflow by TensorFlow The name is written in camel case, see https://www.tensorflow.org/. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 17:14:28 +02:00
Stefan Weil	1ba8c97cac	Fix linking of unittest with Tensorflow This does not add Tensorflow tests. It only fixes the linker errors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 17:08:48 +02:00
Stefan Weil	2441e4d8ac	Implement check for Tensorflow header file This looks for one of the header files which are included by Tesseract. It currently uses a hard coded path which works for Debian / Ubuntu. Simplify also the rules for linking Tensorflow. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 16:52:14 +02:00
Stefan Weil	9cdf041448	Remove "third_party/" in comments and update path names Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:12:52 +02:00
Stefan Weil	4382ab1a34	Support build with Tensorflow It expects include files in /usr/include/tensorflow. * Add configure option --with-tensorflow (disabled by default) * Fix data type tensorflow::int64 * Remove "third_party/" in include statements * Add dummy implementations for Backward and DebugWeights in TFNetwork * Add files generated with protoc from tfnetwork.proto (so the Tensorflow sources are not needed for the build) * Update Makefiles Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:11:31 +02:00
Zdenko Podobný	c69ee9af24	cmake: fix tiff linking to executable if tiffio.h is found	2019-05-24 11:12:39 +02:00
Zdenko Podobný	0f1e13a859	cmake: fix warning	2019-05-24 10:59:59 +02:00
Zdenko Podobný	294f548ac1	fix missing tiff format	2019-05-24 10:39:17 +02:00
Stefan Weil	3f74da5da9	lstmtrainer: Set constant kLearningRateDecay at compile time sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2. This also fixes a compiler warning: src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-23 15:01:23 +02:00
zdenop	4bab7dd83d	Merge pull request #2451 from Bharat123rox/lgtm Some LGTM alert fixes and potential bugfixes	2019-05-22 12:19:44 +02:00
Egor Pugin	fea1f3970b	Merge pull request #2452 from stweil/tprintf tprintf: Make code reentrant and use less memory	2019-05-22 12:31:34 +03:00
Egor Pugin	8f99880a7a	Merge pull request #2453 from stweil/crashcode Remove SavePixForCrash and related code	2019-05-22 12:30:29 +03:00
bact	aac6f593f3	Update normstrngs_test.cc	2019-05-22 15:21:16 +07:00
bact	e05c5ecfcc	Fix Thai valid text and add Thai illegal sequences - Fix a invalid sequence in "valid text" `kScriptText` - Add two illegal sequence in `kBadlyFormedThaiWords`	2019-05-22 15:19:49 +07:00
Bharat123rox	bc3ea622a6	Fix bug in max_max_dist	2019-05-22 08:21:30 +02:00
Bharat123rox	0bf45e81e7	Fix LGTM and revert bugfix for later PR	2019-05-22 11:23:27 +05:30
Bharat123rox	945ccac85a	Fix syntax error	2019-05-22 10:10:12 +05:30
Stefan Weil	6514479e8f	Remove SavePixForCrash and related code That debugging code uses very much memory and is no longer useful. text data bss dec hex filename 815 0 262144 262959 4032f src/ccutil/globaloc.o Remove also the function err_exit which was only used in ccmain/reject.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:25:58 +02:00
Stefan Weil	078a129674	tprintf: Make code reentrant and use less memory Reduce the maximum message size from 64 KiB to 2 KiB which still should be large enought for trace messages. Create the smaller message on the stack instead of using a global array to allow reentrancy and to reduce the memory use of Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:22:58 +02:00
Stefan Weil	c926bdb265	configure: Use a hopefully more robust way to fix AX_CHECK_COMPILE_FLAG The check for -Wno-extra-semi-stmt failed on Linux with clang++-7. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:21:05 +02:00
Bharat123rox	7f31a0634d	Some LGTM fixes and potential bugfixes	2019-05-21 23:24:50 +05:30
zdenop	b96df3a33a	Merge pull request #2448 from stweil/pi Remove local definition of M_PI	2019-05-21 11:47:51 +02:00
Stefan Weil	d2ca81e794	Remove local definition of M_PI It is defined for all platforms when math.h or cmath is included after defining the macro _USE_MATH_DEFINES. Define _USE_MATH_DEFINES before any include statement to make sure that M_PI gets defined. It is not necessary to define it conditionally only for Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 21:18:52 +02:00
Stefan Weil	d6c1fa766c	configure: Fix for clang++-8 and newer AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler raises error -Wextra-semi-stmt: configure:4224: checking whether C++ compiler accepts -mavx configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx conftest.cpp >&5 conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt] ; ^ 1 error generated. Add -Wno-extra-semi-stmt to disable those errors if possible. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 10:52:39 +02:00
zdenop	b753ff62ee	Merge pull request #2445 from stweil/errcode Fix compiler warnings	2019-05-20 09:31:28 +02:00
Stefan Weil	64bdceee69	Fix compiler warnings This fixes lots of warnings related to ERRCODE like the following one: src/ccutil/errcode.h:81:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-19 22:10:22 +02:00
Stefan Weil	09edd1a604	Fix out-of-bounds writes in Classify::ReadNewCutoffs The function did not correctly read Chinese unichars into the local Class variable if the locale was set to de_DE.UTF-8 (or other incompatible locales). That resulted in a wrong ClassId which was used to write into the Cutoffs array without checking for valid bounds. On macOS the result was a runtime error in baseapi_test (see GitHub issue #1250): [ RUN ] TesseractTest.InitConfigOnlyTest baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug Replacing sscanf by std::istringstream fixes that. Add also an assertion to catch future out-of-bounds writes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:39:55 +02:00
Stefan Weil	639781b5c8	stringrenderer_test: Get system locale only once This fixes a runtime exception on macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:24:13 +02:00
Stefan Weil	bb226c19ab	Update abseil submodule to HEAD Abseil suggests to use the latest code: https://abseil.io/about/philosophy#we-recommend-that-you-choose-to-live-at-head Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-17 15:03:43 +02:00
zdenop	2308cbf87f	Merge pull request #2444 from zdenop/fix_travis fix typo	2019-05-17 11:26:40 +02:00
zdenop	a54e345c9b	fix typo	2019-05-17 11:19:07 +02:00
Zdenko Podobný	5282cdf7be	another improvement for `ca0be2fb72`	2019-05-17 11:04:42 +02:00
Zdenko Podobný	e92a424efa	try to fix `ca0be2fb72`	2019-05-17 10:51:06 +02:00
zdenop	af3dd1af06	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-05-16 23:19:42 +02:00
zdenop	ca0be2fb72	cmake: fix travis build	2019-05-16 23:18:13 +02:00
Stefan Weil	68d7a679e4	Replace CR-LF line endings by LF Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 20:49:01 +02:00
Stefan Weil	cc754ed1e0	Remove space at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 20:49:01 +02:00
zdenop	198bbe3df5	Merge pull request #2441 from stweil/linkfix Fix unittest build without legacy code and use locale for most unittests	2019-05-16 19:12:15 +02:00
Stefan Weil	8e7b1119b5	Run more unittests with the user's locale Hopefully this improves the test coverage. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 18:12:55 +02:00
Stefan Weil	59e31e958b	Fix more build error for compilation without legacy engine Skip the tests which need the legacy code. Add also code to those tests to use the user's locale to test that, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 18:12:55 +02:00
Stefan Weil	780986ebfb	Fix linker error for baseapi_test when building without legacy engine Linker error reported in issue #2439: unittest/baseapi_test.cc:190: undefined reference to `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 18:12:55 +02:00
zdenop	3864d0d088	Merge pull request #2440 from stweil/linkfix Fix linker error for baseapi_test when building without legacy engine	2019-05-16 17:31:35 +02:00
Stefan Weil	f097b8a358	Fix linker error for baseapi_test when building without legacy engine Linker error reported in issue #2439: unittest/baseapi_test.cc:190: undefined reference to `tesseract::TessBaseAPI::AdaptToWordStr(tesseract::PageSegMode, char const*)' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 17:17:56 +02:00
zdenop	7e9d2f4bc4	Merge pull request #2432 from nickjwhite/hocrmoretypes Add different classes to hocr output depending on BlockType	2019-05-16 17:02:48 +02:00
zdenop	b124a5f6ca	Merge pull request #2437 from stweil/locale-fix Fix some unittests with locale de_DE.UTF-8	2019-05-16 17:02:02 +02:00
Stefan Weil	331cc84d8d	Remove assertions for unsupported locale settings The latest code passed all unittests with locale de_DE.UTF-8 and has fixed the locale issues which were reported on GitHub. Therefore the assertions can be removed. Any remaining locale issue will be fixed when it is identified. To help finding such remaining isses, debug code now uses the user's locale settings instead of the default "C" locale for all executables which use TessBaseAPI. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 13:59:39 +02:00
Stefan Weil	77f9bad3c2	Fix UNICHARSET::save_to_string for locale de_DE.UTF-8 That function writes float values which must always use '.' as the decimal separator, no matter what the current locale setting is. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:39:59 +02:00
Stefan Weil	36ed6da349	Fix baseapi_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/baseapi_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 12 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10 tests from TesseractTest [ RUN ] TesseractTest.ArraySizeTest [ OK ] TesseractTest.ArraySizeTest (0 ms) [ RUN ] TesseractTest.BasicTesseractTest [ OK ] TesseractTest.BasicTesseractTest (1251 ms) [ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected [ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms) [ RUN ] TesseractTest.HOCRWorksWithoutSetInputName [ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms) [ RUN ] TesseractTest.HOCRContainsBaseline [ OK ] TesseractTest.HOCRContainsBaseline (389 ms) [ RUN ] TesseractTest.RickSnyderNotFuckSnyder [ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms) [ RUN ] TesseractTest.AdaptToWordStrTest Trying to adapt "136 " to "1 3 6" Trying to adapt "256 " to "2 5 6" Trying to adapt "410 " to "4 1 0" Trying to adapt "432 " to "4 3 2" Trying to adapt "540 " to "5 4 0" Trying to adapt "692 " to "6 9 2" Trying to adapt "779 " to "7 7 9" Trying to adapt "793 " to "7 9 3" Trying to adapt "808 " to "8 0 8" Trying to adapt "815 " to "8 1 5" Trying to adapt "12 " to "1 2" Trying to adapt "12 " to "1 2" [ OK ] TesseractTest.AdaptToWordStrTest (788 ms) [ RUN ] TesseractTest.BasicLSTMTest [ OK ] TesseractTest.BasicLSTMTest (4525 ms) [ RUN ] TesseractTest.LSTMGeometryTest [ OK ] TesseractTest.LSTMGeometryTest (615 ms) [ RUN ] TesseractTest.InitConfigOnlyTest Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.232621 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.231864 in normproto file is not in unichar set. [...] Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.233915 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.221755 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar ? in normproto file is not in unichar set. baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug [INFO] Lang eng took 327ms in regular init [INFO] Lang chi_tra took 1422ms in regular init Abort trap: 6 TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream instead of sscanf. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:05:09 +02:00
Stefan Weil	0dcc889e8d	Fix apiexample_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/apiexample_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from EuroText [ RUN ] EuroText.FastLatinOCR contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-15 22:43:47 +02:00

... 2 3 4 5 6 ...

4108 Commits