tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-23 15:07:49 +08:00

Author	SHA1	Message	Date
zdenop	4bab7dd83d	Merge pull request #2451 from Bharat123rox/lgtm Some LGTM alert fixes and potential bugfixes	2019-05-22 12:19:44 +02:00
Egor Pugin	fea1f3970b	Merge pull request #2452 from stweil/tprintf tprintf: Make code reentrant and use less memory	2019-05-22 12:31:34 +03:00
Egor Pugin	8f99880a7a	Merge pull request #2453 from stweil/crashcode Remove SavePixForCrash and related code	2019-05-22 12:30:29 +03:00
Bharat123rox	bc3ea622a6	Fix bug in max_max_dist	2019-05-22 08:21:30 +02:00
Bharat123rox	0bf45e81e7	Fix LGTM and revert bugfix for later PR	2019-05-22 11:23:27 +05:30
Bharat123rox	945ccac85a	Fix syntax error	2019-05-22 10:10:12 +05:30
Stefan Weil	6514479e8f	Remove SavePixForCrash and related code That debugging code uses very much memory and is no longer useful. text data bss dec hex filename 815 0 262144 262959 4032f src/ccutil/globaloc.o Remove also the function err_exit which was only used in ccmain/reject.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:25:58 +02:00
Stefan Weil	078a129674	tprintf: Make code reentrant and use less memory Reduce the maximum message size from 64 KiB to 2 KiB which still should be large enought for trace messages. Create the smaller message on the stack instead of using a global array to allow reentrancy and to reduce the memory use of Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:22:58 +02:00
Bharat123rox	7f31a0634d	Some LGTM fixes and potential bugfixes	2019-05-21 23:24:50 +05:30
Stefan Weil	d2ca81e794	Remove local definition of M_PI It is defined for all platforms when math.h or cmath is included after defining the macro _USE_MATH_DEFINES. Define _USE_MATH_DEFINES before any include statement to make sure that M_PI gets defined. It is not necessary to define it conditionally only for Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 21:18:52 +02:00
Stefan Weil	64bdceee69	Fix compiler warnings This fixes lots of warnings related to ERRCODE like the following one: src/ccutil/errcode.h:81:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-19 22:10:22 +02:00
Stefan Weil	09edd1a604	Fix out-of-bounds writes in Classify::ReadNewCutoffs The function did not correctly read Chinese unichars into the local Class variable if the locale was set to de_DE.UTF-8 (or other incompatible locales). That resulted in a wrong ClassId which was used to write into the Cutoffs array without checking for valid bounds. On macOS the result was a runtime error in baseapi_test (see GitHub issue #1250): [ RUN ] TesseractTest.InitConfigOnlyTest baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug Replacing sscanf by std::istringstream fixes that. Add also an assertion to catch future out-of-bounds writes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:39:55 +02:00
zdenop	7e9d2f4bc4	Merge pull request #2432 from nickjwhite/hocrmoretypes Add different classes to hocr output depending on BlockType	2019-05-16 17:02:48 +02:00
Stefan Weil	331cc84d8d	Remove assertions for unsupported locale settings The latest code passed all unittests with locale de_DE.UTF-8 and has fixed the locale issues which were reported on GitHub. Therefore the assertions can be removed. Any remaining locale issue will be fixed when it is identified. To help finding such remaining isses, debug code now uses the user's locale settings instead of the default "C" locale for all executables which use TessBaseAPI. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 13:59:39 +02:00
Stefan Weil	77f9bad3c2	Fix UNICHARSET::save_to_string for locale de_DE.UTF-8 That function writes float values which must always use '.' as the decimal separator, no matter what the current locale setting is. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:39:59 +02:00
Stefan Weil	36ed6da349	Fix baseapi_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/baseapi_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 12 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10 tests from TesseractTest [ RUN ] TesseractTest.ArraySizeTest [ OK ] TesseractTest.ArraySizeTest (0 ms) [ RUN ] TesseractTest.BasicTesseractTest [ OK ] TesseractTest.BasicTesseractTest (1251 ms) [ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected [ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms) [ RUN ] TesseractTest.HOCRWorksWithoutSetInputName [ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms) [ RUN ] TesseractTest.HOCRContainsBaseline [ OK ] TesseractTest.HOCRContainsBaseline (389 ms) [ RUN ] TesseractTest.RickSnyderNotFuckSnyder [ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms) [ RUN ] TesseractTest.AdaptToWordStrTest Trying to adapt "136 " to "1 3 6" Trying to adapt "256 " to "2 5 6" Trying to adapt "410 " to "4 1 0" Trying to adapt "432 " to "4 3 2" Trying to adapt "540 " to "5 4 0" Trying to adapt "692 " to "6 9 2" Trying to adapt "779 " to "7 7 9" Trying to adapt "793 " to "7 9 3" Trying to adapt "808 " to "8 0 8" Trying to adapt "815 " to "8 1 5" Trying to adapt "12 " to "1 2" Trying to adapt "12 " to "1 2" [ OK ] TesseractTest.AdaptToWordStrTest (788 ms) [ RUN ] TesseractTest.BasicLSTMTest [ OK ] TesseractTest.BasicLSTMTest (4525 ms) [ RUN ] TesseractTest.LSTMGeometryTest [ OK ] TesseractTest.LSTMGeometryTest (615 ms) [ RUN ] TesseractTest.InitConfigOnlyTest Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.232621 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.231864 in normproto file is not in unichar set. [...] Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.233915 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.221755 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar ? in normproto file is not in unichar set. baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug [INFO] Lang eng took 327ms in regular init [INFO] Lang chi_tra took 1422ms in regular init Abort trap: 6 TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream instead of sscanf. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:05:09 +02:00
Stefan Weil	0dcc889e8d	Fix apiexample_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/apiexample_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from EuroText [ RUN ] EuroText.FastLatinOCR contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-15 22:43:47 +02:00
Stefan Weil	6b1e709b19	Fix Doxygen comments for void functions Void functions should not use @return. It causes compiler warnings like this one: src/classify/intproto.cpp:326:5: warning: '@return' command used in a comment that is attached to a function returning void [-Wdocumentation] Some non-void functions also were documented with @return none. Fix those comments, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 21:57:17 +02:00
Stefan Weil	caa04882fd	normmatch: Remove unused private function PrintNormMatch was unused. Remove it and remove also an unused prototype. Make the only remaining private function NormEvidenceOf static. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 20:56:04 +02:00
Nick White	068eb4c35d	Add different classes to hocr output depending on BlockType These classes are taken from the hOCR specification, and seem to map well onto the BlockType types. There are probably more that could be added.	2019-05-14 13:25:08 +01:00
Stefan Weil	5d92fbf010	Replace sscanf by std::istringstream Using std::istringstream allows conversion of string to float independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 15:04:30 +02:00
Stefan Weil	c76ceafcdf	Fix reading of parameter from traineddata normproto component The NonEssential parameter was wrongly derived from linear_token instead of essential_token and therefore always set to true. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 14:43:58 +02:00
Stefan Weil	c07bc4e014	Fix Doxygen comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:55:23 +02:00
Stefan Weil	c8e96e2c02	Fix cast from pointer to integer type Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:54:46 +02:00
zdenop	7a5b9b8fcd	ScrollView: remove custom implementation of GetAddrInfo	2019-05-04 15:16:41 +02:00
zdenop	5e01f74648	remove unused include	2019-05-04 15:14:54 +02:00
Stefan Weil	aba037329a	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-04 11:05:50 +02:00
Stefan Weil	57ff92e4bf	tesscallback: Remove unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 22:14:04 +02:00
zdenop	9192c3afe2	correct tessdata comment in baseapi.h	2019-05-02 08:43:04 +02:00
zdenop	7e48368a5e	Merge pull request #2421 from stweil/includes universalambigs: Add missing include file	2019-05-02 08:36:49 +02:00
zdenop	39d3824c78	Merge pull request #2420 from stweil/locale Fix more locale dependencies	2019-05-02 08:31:41 +02:00
Stefan Weil	cd749be473	universalambigs: Add missing include file This allows fixing two compiler warnings from clang++: src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations] src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:36:31 +02:00
Stefan Weil	4fbc0a257b	commandlineflags: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	d047fa1d1b	paramsd: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	e3860e45b7	clusttool: Replace strtof by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	ed45656ec8	clusttool: Remove unused code and some global functions * WriteProtoList is unused. Remove it. * ReadNFloats, WriteNFloats and WriteProtoStyle are only used locally, so make them local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	28a521fec2	Fix some typos (most found and fixed by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-01 20:30:41 +02:00
zdenop	41f50b19bb	fix crash in case of missing PNG support in Leptonica see #2333	2019-05-01 19:51:54 +02:00
zdenop	90aef80dd7	fix documentation about datapath: ending "/" is not relevant	2019-05-01 11:37:50 +02:00
Jeff Breidenbach	546a9e81eb	fix #1900 : intraword spacing for slightly better pdf copy-paste performance	2019-04-29 11:28:30 +02:00
zdenop	137e6de56f	Print info when uzn file is used.	2019-04-28 19:06:38 +02:00
Zdenko Podobný	80e54e401d	fix spelling	2019-04-24 15:35:22 +02:00
Zdenko Podobný	832c257771	remove unused variable	2019-04-24 14:55:35 +02:00
Stefan Weil	b7bc71e987	Fix build for Windows * winsock2.h is case sensitive, lower case is required for cross build. * ws2tcpip.h is required for addrinfo. * FreeAddrInfo conflicts with existing freeaddrinfo, so rename it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-24 11:24:47 +02:00
zdenop	129fe95390	svutil.cpp: fix windows build	2019-04-23 23:03:28 +02:00
zdenop	7bacc8852b	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-04-23 22:01:30 +02:00
zdenop	5c6ac61fe2	remove unused includes	2019-04-23 20:59:36 +02:00
zdenop	27f0f2ecea	MSVS support inttypes.h from VS 2015	2019-04-23 20:45:14 +02:00
Stefan Weil	708511adcb	Only include windows.h using host.h host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be set before including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	53f1265362	Clean macros in platform.h * Remove unused macros ultoa, SIGNED. * Move macros NOMINMAX and WIN32_LEAN_AND_MEAN to host.h because they are used when including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	3bd61bfae4	svutil: Clean include file * Remove MIN, MAX macros. They are unused. * Include windows.h indirectly by including host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	e12b99d49b	Remove host.h from Tesseract API It is not needed by other API header files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	8a34da027f	Fix typo in description Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:50:37 +02:00
Shree	f8fba6362b	fix the coordinates for EOL tab	2019-04-22 09:54:20 +00:00
zdenop	3ec7c22a87	fix missing EOL	2019-04-22 08:49:55 +02:00
Stefan Weil	09255ebe44	Don't include windows.h from platform.h This partially reverts commit `c150b9832d`. Now params.cpp includes host.h which also gets the definition for MAX_PATH. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-21 22:20:13 +02:00
zdenop	6781d78211	Merge pull request #2399 from stweil/pgedit pgedit: Remove unused global functions	2019-04-20 19:26:02 +02:00
Stefan Weil	4ac1fad18a	pdfrenderer: Replace snprintf by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Some snprintf statements are not needed at all because a constant string can be appended directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	07d5365a1f	baseapi: Use std::stringstream to format float values Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	743fc2562d	Remove unneeded include statements for pgedit.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	26dd0b82bf	pgedit: Remove unused global functions pgeditor_show_point is unused, so remove it completely. Some more functions are only used locally, so make them static functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	217c2530e6	Remove strtofloat Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	7c3f9000cd	Replace sscanf by std::stringstream Using std::stringstream allows working with the C locale, independent of the current locale settings. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	5529a5db11	unittest: Fix and enable params_model_test This needs the latest test submodule. The test uses LoadFromFile which is not used otherwise, so remove that function from class ParamsModel. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-18 17:06:48 +02:00
Stefan Weil	a1ffcd3654	Use std::stringstream for add_str_double Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:16:16 +02:00
Stefan Weil	aa64a63f69	Use std::stringstream to generate PDF output Using std::stringstream simplifies the code and allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:15:39 +02:00
Stefan Weil	78a957b989	Remove spaces a line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:54:42 +02:00
Stefan Weil	12ca2513d4	Revert "e" flag for fopen clang-tidy added it in commit `ac0b191f6b`. The "e" flag is an extension for glibc which sets the O_CLOEXEC flag, so the file handle is not leaked to child processes. It is not needed here. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:53:57 +02:00
Samuel Lee	e32b3360aa	Fix for MSVC LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.	2019-04-11 02:33:51 +09:00
Stefan Weil	f88a7f28e3	fontinfo: Fix wrong delete Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:16:04 +02:00
Stefan Weil	3dfe1b8807	classify: Modernize function UniformDensity This should fix an issue reported by Codacy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:13:45 +02:00
Stefan Weil	72c874140e	Modernize code by replacing C type casts This was done using clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 09:04:51 +02:00
zdenop	95a15a7a82	fix cmake&clang build	2019-04-06 15:31:53 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Robert Schubert	25a42ea42f	fixed failure report for tesstrain commands: - with `set -e` in effect, looking at stdout to detect failure is too late	2019-04-06 08:13:03 +02:00
Robert Schubert	d5584e793e	fixed failure report for tesstrain commands: - with `set -e` in effect, it does not make sense to query `$?` indirectly	2019-04-06 08:13:03 +02:00
zdenop	be617b3722	Merge pull request #2361 from Shreeshrii/truth Change message display for debug_level -1 during lstmtraining	2019-04-05 10:52:21 +02:00
zdenop	2982cb4ff3	Merge pull request #2368 from amitdo/no-legacy-fix disable-legacy build: Do not include unused headers	2019-04-05 09:35:04 +02:00
Stefan Weil	d35a6f2de5	Modernize code (clang-tidy check modernize-deprecated-headers) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
amitdo	fab9a54981	Remove unneeded 'SUBDIRS=' from 3 Makefile.am files	2019-04-04 19:31:39 +02:00
Shree	6673347986	Change page to line in message	2019-04-04 15:43:29 +00:00
Shree	51c3535310	Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1	2019-04-04 15:33:22 +00:00
Shree	84d4cc2e95	Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1	2019-04-04 15:33:22 +00:00
Amit D	2069c057d6	Merge branch 'master' into no-legacy-fix	2019-04-04 18:26:22 +03:00
Egor Pugin	2a1d238bd5	Merge pull request #2366 from stweil/modernize Modernize code with "using"	2019-04-04 15:13:10 +03:00
amitdo	546014aecd	disable-legacy build: Do not include unused headers	2019-04-04 15:09:08 +03:00
Stefan Weil	98346c2cd4	Modernize and format code The code was modernized using clang-tidy with "modernize-use-using". The modified files were then formatted using clang-tidy with "google-readability-braces-around-statements", then clang-format was applied. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-03 21:02:23 +02:00
Shreeshrii	613c2bf6e4	Change pages to lines in message The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.	2019-04-03 10:41:14 +05:30
Egor Pugin	af7cc1ce4c	Fix windows build.	2019-04-01 22:38:01 +03:00
Stefan Weil	81fbd878dd	Add more missing include statements for Windows build Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-01 08:10:25 +02:00
Stefan Weil	ab009fae94	Remove macro WINDLLNAME It is now no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:05:41 +02:00
Stefan Weil	77a5f2623e	Remove unused config variable tessedit_module_name It was only defined for Windows builds. Use also false instead of 0 to set the default value of two boolean config variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:04:00 +02:00
Stefan Weil	c150b9832d	Add missing include statements for Windows build The last commits which removed BOOL8 had broken the Windows build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 19:02:29 +02:00
Stefan Weil	802f42e821	Remove BOOL8, TRUE, FALSE from host.h Remove unneeded include statements for host.h, add required ones and update the comments for the remaining include statements. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:27:20 +02:00
Stefan Weil	be96b7b660	bits16: Format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:26:50 +02:00
Stefan Weil	146079f31d	api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:15:53 +02:00
Stefan Weil	4e0c726d6c	ccutil: replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:47 +02:00
Stefan Weil	da0c14ae45	cutil: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:19 +02:00
Stefan Weil	87a973652c	classify: Replace BOOL8, TRUE, FALSE by bool, true, false Simplify also some related code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:48 +02:00
Stefan Weil	30ee3afc29	textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:20 +02:00
Stefan Weil	b391ab84d0	wordrec: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:54:21 +02:00
Stefan Weil	cbb5e729a1	classify: Use bool and replace TRUE, FALSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:50 +02:00
Stefan Weil	46fa59aadc	ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:06 +02:00
Stefan Weil	92b9f9f8de	ccmain: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:52:09 +02:00
Stefan Weil	7db25e15c0	Remove unused config variable tessedit_single_match Replace also TRUE, FALSE by true, false. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:38:35 +02:00
Stefan Weil	ca2947a2c0	blobclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:36:46 +02:00
Stefan Weil	f2bd98e656	PageIterator: Remove useless const Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:35:43 +02:00
Stefan Weil	813b7803e0	pgedit: Replace BOOL8 by bool Replace also TRUE, FALSE by true, false and add some static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:29:15 +02:00
Stefan Weil	664811a869	Replace BOOL8, TRUE, FALSE by bool, true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:28:28 +02:00
Stefan Weil	51a2c2eae8	Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:24:02 +02:00
Stefan Weil	95ea778745	capi: Replace FALSE, TRUE and simplify and format code Format code using clang-format and clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:19:04 +02:00
Stefan Weil	89ba48b106	strngs: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:13:38 +02:00
Stefan Weil	127d0e31f0	serialis: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:12:11 +02:00
Stefan Weil	8b663e7620	helpers: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:06:19 +02:00
zdenop	3bb8f9cd49	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-03-31 16:54:15 +02:00
zdenop	5f06402755	python: optimize imports, reformat code	2019-03-31 16:53:39 +02:00
zdenop	2e9fd69c9e	use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"	2019-03-31 16:53:33 +02:00
zdenop	a0527b41bd	fix LGTM reports for python	2019-03-31 16:53:25 +02:00
Stefan Weil	1948f0d520	ocrclass: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:39:44 +02:00
Stefan Weil	85957e9673	WERD: Don't print space character after "FALSE" at end of line Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:32:42 +02:00
Stefan Weil	83d4433d3b	Modernize and format unichar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:30:15 +02:00
Stefan Weil	ac0b191f6b	Modernize and format genericvector.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:21:32 +02:00
Stefan Weil	36ed08636b	Modernize and format tesscallback.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:16:00 +02:00
zdenop	f47c7c92dd	fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer; CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142	2019-03-31 12:26:49 +02:00
Shreeshrii	ea36e94e58	fix Could not parse bool from flag (#2359 )	2019-03-29 14:50:21 +01:00
Stefan Weil	852598eecf	Remove file tessedit.h It only declared the unused global variable global_monitor which is now removed, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	6e59abcce2	Remove file cutil.h It only contained three type definitions which fit better in other include files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	b6bfb20f1d	Improve readability of conditional code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	36a1a30c22	Remove some old type casts Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	a44bf41f14	Modernize C++ loops The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-loop-convert' -fix Then the resulting code was cleaned manually. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 08:38:21 +01:00
Stefan Weil	ed011670c8	Modernize C++ code using bool literals The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-bool-literals' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:58:02 +01:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	36f768853a	Modernize C++ code using override The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-override' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:37:52 +01:00
Stefan Weil	f877640bc9	Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval tesstrain: check failure of subjobs	2019-03-25 16:10:09 +01:00
Stefan Weil	d8d2f6f48a	Fix broken shell scripts for training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 15:32:43 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ecaad2aca8	ccstruct/werd: Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 07:57:34 +01:00
Stefan Weil	b1e305f38c	Simplify code which tests for non-empty StringParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:35:52 +01:00
Stefan Weil	f9860cda41	Optimize functions ResetFrom The loop can terminate as soon as the parameter name was found. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:21:23 +01:00
Stefan Weil	41da5afe9d	UNICHARSET: Fix compiler warning (signed/unsigned mismatch) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:18:21 +01:00
Stefan Weil	91e2b253c0	Format modified code with clang-format Format the files which were changed in commit `297d7d86ce`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:10:29 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	58423d2f6c	Merge pull request #2328 from bertsky/lstm-with-user-patterns2 Add user words / patterns again	2019-03-24 19:38:40 +01:00
zdenop	0d36d9a9d7	Merge pull request #2341 from Shreeshrii/fix Fix	2019-03-24 18:21:09 +01:00
Stefan Weil	da6305b632	Fix compiler warnings caused by ASSERT_HOST The modified definition avoids warnings caused by redundant semicolons. Now a semicolon is required when using the macro, so a few code locations had to be updated. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:47:04 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	f4f34a87db	WERD_RES: Fix uninitialized member variable Credit to OSS-Fuzz which reported this issue: pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool' #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7 #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3 #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 14:59:08 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Shreeshrii	8749f3553e	LINEDATA=false	2019-03-23 19:16:49 +05:30
Shree	bcb7cf9846	sort arguments, use true/false instead of 1/0	2019-03-23 12:28:53 +00:00
Shree	c2db272134	Modify distort_image for Boolean	2019-03-22 17:02:46 +00:00
Shree	259d5af6b1	Add PSM values to the definition	2019-03-22 15:29:02 +00:00
Shree	8eafec0d17	Fix comments with current values of PSM codes	2019-03-22 14:10:49 +00:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Shree	9b915d5efb	add --distort_image	2019-03-22 05:39:38 +00:00
Shree	f7ffde99d5	add --distort_image	2019-03-22 05:34:00 +00:00
zdenop	ac7ea4322a	Merge pull request #2335 from Shreeshrii/master Changes to tesstrain.py - max_workers=8, distort_image=false	2019-03-17 15:27:34 +01:00
zdenop	26877ba703	check min. python version; os.uname is not available on windows	2019-03-17 15:25:48 +01:00
Shreeshrii	f8e8521606	Update tesstrain_utils.py	2019-03-17 15:32:35 +05:30
Shree	6fa8e1bb15	Set max_workers=8	2019-03-17 09:58:11 +00:00
Shree	e21499e81e	Set default value for distort_image	2019-03-17 09:54:16 +00:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Shree	d47b0d588a	Use LATIN_FONTS for kmr	2019-03-15 15:47:56 +00:00
Shree	3eee1d217a	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 15:37:49 +00:00
Robert Schubert	297d7d86ce	trying to add user words/patterns again: - pass in ParamsVectors from Tesseract (carrying values from langdata/config/api) into LSTMRecognizer::Load and LoadDictionary - after LSTMRecognizer's Dict is initialised (with default values), reset the variables user_{words,patterns}_{suffix,file} from the corresponding entries in the passed vector	2019-03-15 16:06:19 +01:00
Shree	b2ebf0195f	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 14:39:39 +00:00
Shree	37befdf6c4	Add option for --distort_image	2019-03-15 13:32:36 +00:00
zdenop	0a36b38169	Merge pull request #2317 from eighttails/master Added missing linker flags for MinGW.	2019-03-15 08:01:21 +01:00
Robert Schubert	14346e56b0	tesstrain: catch+handle SIGINT (to stop waiting on subjobs)	2019-03-15 00:03:16 +01:00
Robert Schubert	6cbad17e30	tesstrain: check all subjobs' retval	2019-03-14 14:38:51 +01:00
Robert Schubert	5316bcbb94	tesstrain: check failure of subjobs	2019-03-14 11:42:01 +01:00
Stefan Weil	4c2bbebecc	Fix compiler warning (-Wunused-value) Warning from clang++: ..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:56:03 +01:00
Stefan Weil	ed84ba0a44	Fix wrong comparison symbol_steps is a vector, so testing for a nullptr was wrong. clang++ reports: ..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare] if (&word_res_->symbol_steps == nullptr \|\| !LSTM_mode_) return nullptr; ~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:38:38 +01:00
Tadahito Yao	bbbd262a8d	Added missing linker flags for MinGW.	2019-03-13 22:10:36 +09:00
jm server2	1206362d30	`accumulated_timesteps` is not a pointer but a vector and in case we use ChoiceIterator without `lstm_choice_mode` tesseract crashes (or similar) because the check is true and we reference not existing item	2019-03-13 12:55:14 +01:00
Stefan Weil	3baf0d8076	Fix boolean assignments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 15:34:24 +01:00
Stefan Weil	8ad0489f0f	Remove svpaint.cpp from libtesseract svpaint is a standalone application (it includes a main function) and should not be part of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 12:22:53 +01:00
zdenop	7546a01020	Merge pull request #2310 from noahmetzger/LSTMChoiceRIL Lstm choice ril	2019-03-12 10:46:11 +01:00
Stefan Weil	35a999f91a	Fix assertion caused by wrong unicharset Credit to OSS-Fuzz: it found another case which triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 This is the OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:31:21 +01:00
Stefan Weil	56a39bda77	Fix float division by zero That runtime error is normally not visible because it does not abort the program, but is detected when the code was compiled with sanitizers. It can be triggered with this OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:28:16 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
zdenop	f80085c0bf	Merge pull request #2289 from Armyke/master Added an additional optional --tmp_dir parameter to specify the tempo…	2019-03-06 15:03:14 +01:00
Stefan Weil	1c7e00611b	Add initial support for traineddata files in standard archive formats This requires libarchive-dev. Tesseract can now load traineddata files in any of the archive formats which are supported by libarchive. Example of a zipped BagIt archive: $ unzip -l /usr/local/share/tessdata/zip.traineddata Archive: /usr/local/share/tessdata/zip.traineddata Length Date Time Name --------- ---------- ----- ---- 55 2019-03-05 15:27 bagit.txt 0 2019-03-05 15:25 data/ 1557 2019-03-05 15:28 manifest-sha256.txt 1082890 2019-03-05 15:25 data/eng.word-dawg 1487588 2019-03-05 15:25 data/eng.lstm 7477 2019-03-05 15:25 data/eng.unicharset 63346 2019-03-05 15:25 data/eng.shapetable 976552 2019-03-05 15:25 data/eng.inttemp 13408 2019-03-05 15:25 data/eng.normproto 4322 2019-03-05 15:25 data/eng.punc-dawg 4738 2019-03-05 15:25 data/eng.lstm-number-dawg 1410 2019-03-05 15:25 data/eng.freq-dawg 844 2019-03-05 15:25 data/eng.pffmtable 6360 2019-03-05 15:25 data/eng.lstm-unicharset 1012 2019-03-05 15:25 data/eng.lstm-recoder 1047 2019-03-05 15:25 data/eng.unicharambigs 4322 2019-03-05 15:25 data/eng.lstm-punc-dawg 16109842 2019-03-05 15:25 data/eng.bigram-dawg 80 2019-03-05 15:25 data/eng.version 6426 2019-03-05 15:25 data/eng.number-dawg 3694794 2019-03-05 15:25 data/eng.lstm-word-dawg --------- ------- 23468070 21 files `combine_tessdata -d` and `combine_tessdata -u` also work. The traineddata files in the new format can be generated with standard tools like zip or tar. More work is needed for other training tools and big endian support. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-05 17:18:48 +01:00
Armyke	56b04d4ea7	Added the same --tmp_dir flag to tesstrain_utils.sh	2019-03-04 14:05:25 +00:00
Armyke	25fa392887	Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive	2019-03-04 13:26:53 +00:00
Stefan Weil	7fbde96a04	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:26:07 +01:00
Stefan Weil	38fac625cd	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:01:48 +01:00
Shree	a0202bac70	Rename function to TessBaseAPIGetTsvText to be consistent to the Create method	2019-03-02 16:29:53 +00:00
zdenop	5de2a21b3f	Merge pull request #2283 from Shreeshrii/lstmbox Add missing renderers to C-API	2019-03-02 15:15:34 +01:00
Stefan Weil	9c90894ff0	PAGE_RES_IT: Optimize compare operators by using inline code Avoiding a function call will make both == and != operator faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:57:16 +01:00
Stefan Weil	295996ed05	commandlineflags: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:21:04 +01:00
Stefan Weil	eb14726aac	ICOORD: Fix old type casts This fixes compiler warnings and avoids unnecessary conversions between float and double. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	fb0f1bcf66	BoxChar: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	0e1a1fc3cf	Validator: Fix compiler warnings (signed/unsigned) This also fixes a regression in validate_grapheme_test introduced by commit `32e9d7c8f5`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 13:05:03 +01:00
Shree	c7e8131efc	Add TSV option to C-API	2019-03-02 09:50:54 +00:00
Shree	22c099348b	rename LSTMBOX to LSTMBox	2019-03-02 09:11:47 +00:00
zdenop	2ba8e0061a	Merge branch 'master' into mya	2019-03-01 18:37:24 +01:00
Shree	c33f03e33e	Add lstmboxand wordstrbox to capi.h	2019-03-01 17:16:59 +00:00
Shree	76ec21df3d	Add lstmbox and wordstrbox to C-API	2019-03-01 16:40:41 +00:00
zdenop	646b043d2c	use space instead of tab	2019-03-01 14:36:09 +01:00
Shree	5ee1deaea2	correct handling of 0BF0-0BFA Tamil numbers and symbols	2019-03-01 13:21:49 +00:00
zdenop	d7ddc4c5b7	Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER Treat U_ARABIC_NUMBER as LTR	2019-02-28 09:27:54 +01:00
zdenop	12c1225a5f	Merge pull request #2271 from stweil/refactor Refactor class Network	2019-02-27 07:43:13 +01:00
Michal Čihař	14c4494f42	Allow UTF-8 variant of C locale It behaves same in scanf, but it allows proper handling of unicode chars.	2019-02-26 21:37:33 +01:00
Stefan Weil	98dd3b6351	Refactor class Network That class is an abstract class with several pure virtual functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-26 16:55:31 +01:00
Shree	25b02bf1f2	Treat U_ARABIC_NUMBER as LTR	2019-02-26 09:51:21 +00:00
Shreeshrii	2f71fe280c	Use alternative way to comment a block of code (using the c preprocessor). https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382 Thanks @amitdo	2019-02-26 15:05:51 +05:30
Shree	449f1cd4ba	Remove test for Word started with a combiner	2019-02-25 18:47:42 +00:00
zdenop	25c43b1e7c	Merge branch 'master' into distort	2019-02-23 18:23:14 +01:00
Stefan Weil	b3e355a682	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-23 17:49:56 +01:00
Shreeshrii	34e4d6b1d7	Revert to 0 (50% percents of images inverted).	2019-02-23 17:59:00 +05:30
Shreeshrii	287d5341bf	TODO	2019-02-23 17:56:02 +05:30
Shreeshrii	3e3e1ed55d	Remove commented Code	2019-02-23 17:54:00 +05:30
zdenop	c02f5e99fc	Merge pull request #2259 from Shreeshrii/distort implement PrepareDistortedPix as part of DegradeImage	2019-02-22 21:06:29 +01:00
Shree	2aded47a3c	Implement distort_image in text2image - default false	2019-02-22 12:27:27 +00:00
Shree	49ed3a72d4	implement PrepareDistortedPix as part of DegradeImage	2019-02-21 14:48:29 +00:00
zdenop	e250f3422d	Merge pull request #2258 from stweil/doc Fix doxygen comments	2019-02-21 07:41:22 +01:00
Stefan Weil	2cbe723d03	Fix doxygen comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 21:11:38 +01:00
Stefan Weil	ef4d5b2e69	Optimize calculation of dot product for double vectors with AVX This improves the performance with best models and should also make training faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 17:45:38 +01:00
Stefan Weil	b3bd23edb7	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-19 13:53:31 +01:00
Stefan Weil	b95598a0b1	Merge pull request #2070 from pndaza/master add missed letters ( ၌ ၍ ၎ ၏ ) and symbols ( ၊ ။ ) - 0x104a to 0x104f -	2019-02-19 12:22:53 +01:00
Stefan Weil	38861be639	Use __builtin_trap instead of null pointer dereference to abort This fixes a warning from Apple's clang compiler: [ 34%] Building CXX object CMakeFiles/libtesseract.dir/src/ccutil/errcode.cpp.o /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference] reinterpret_cast<int>(0) = 0; ^~~~~~~~~~~~~~~~~~~~~~~~~~ /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: note: consider using __builtin_trap() or qualifying pointer with 'volatile' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-18 10:49:51 +01:00
Stefan Weil	ddea230b1b	Don't compute function tables at compile time with clang The current code fails to compile with clang compilers on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-17 08:38:42 +01:00
zdenop	15f2a4b2c1	Merge pull request #2231 from Shreeshrii/wordstr Add renderer to create WordStr box files from images	2019-02-16 13:48:06 +01:00
Stefan Weil	862322c18c	Fix check for images which are too small to scale Images with width == min_width are not too small. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-15 13:53:11 +01:00
Shree	a044f64375	fix Myanmar validation rules as per Unicode charts	2019-02-15 04:40:55 +00:00
Stefan Weil	c0523ee5a2	Fix compiler warning g++ warning: src/lstm/functions.h:152:35: warning: unused parameter ‘x’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	3556152412	Compute function tables at compile time This requires C++ 14. Older compilers still use the old code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	f491eb6188	Simplify tanh and logistic functions and precompute function tables Both functions are called very often, so computing the table values at program start should be faster than computing them on demand. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-12 12:04:08 +01:00
Shree Devi Kumar	f3362a4b5b	Add renderer to create WordStr box files from images	2019-02-10 19:59:17 +00:00
zdenop	2ae65b2493	Merge pull request #2216 from Shreeshrii/lstmbox Lstmbox	2019-02-10 13:53:41 +01:00
Shree Devi Kumar	311053681c	put common code in AddBoxToLSTM	2019-02-10 09:16:45 +00:00
zdenop	e51f1885e6	Merge pull request #2229 from stweil/warn Fix some compiler warnings	2019-02-10 08:20:23 +01:00
Shree Devi Kumar	b51c1bf05a	change to const char* as suggested by @stweil	2019-02-10 05:13:18 +00:00
Stefan Weil	0c9f7db536	Fix compiler warning (-Wimplicit-fallthrough) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:53:44 +01:00
Stefan Weil	d91c316ab1	FontInfo: Make sure that deleted member variables can no longer be used Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	877e62db55	Fix compiler warning (-Wmaybe-uninitialized) gcc warning: src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized] It's a false positive, but setting the variable to 0 satisfies the compiler. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	33f6dc2a67	Fix compiler warnings (-Wformat-truncation=) gcc warnings: src/viewer/scrollview.cpp:404:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] src/viewer/scrollview.cpp:572:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	2a355ea103	Fix compiler warnings (-Wimplicit-fallthrough) gcc warnings: src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=] [...] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	aa2dcca295	Fix compiler warnings (-Wstringop-truncation) gcc warnings: src/api/tesseractmain.cpp:252:14: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 255 equals destination size [-Wstringop-truncation] src/ccutil/unicharset.h:66:12: warning: ‘char* strncpy(char, const char, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation] src/ccutil/unicharset.cpp:806:12: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:09 +01:00
Stefan Weil	d42413dd17	OpenCL: Remove PERF_COUNT framework It was rarely used, but added a lot of code and an unconditional dependency on openclwrapper.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 10:58:15 +01:00
Shree Devi Kumar	0f42fd8c69	change to use bbox coordinates for TEXTLINE for all characters (cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)	2019-02-05 14:03:29 +00:00
Shree Devi Kumar	9c89cd51cf	Add a new renderer to create box files from images for LSTM training (cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a) fix typo (cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd) Add lstmboxrenderer to CMakeLists (cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e) fix formatting (cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)	2019-02-05 14:03:29 +00:00
Shreeshrii	c28a68115e	Merge branch 'master' into boxtiff	2019-02-02 23:42:39 +05:30
Shree Devi Kumar	d9590f8adf	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:35:45 +00:00
Shree Devi Kumar	323361b902	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:33:32 +00:00
Shree Devi Kumar	ad223296af	use --xsize instead of --x_size (cherry picked from commit 94b8988b8cca3812137933db00750bd6e2e84e32)	2019-02-02 11:08:34 +00:00
Mikhail Akopov	7be04342cf	Fix typo	2019-02-01 09:58:44 +01:00
Stefan Weil	b49806766e	Fix AVX2 support for Windows builds with MSC It was never detected, so the existing code for AVX2 was compiled but never used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-30 11:40:17 +01:00
Shree Devi Kumar	4d9bc11fd3	add --xsize as parameter for tesstrain	2019-01-27 07:00:25 +00:00
zdenop	12c1abcb6b	Merge pull request #2189 from stweil/fix Fix memory leak for PNG images	2019-01-24 07:59:55 +01:00
zdenop	059c50be8c	Merge pull request #2184 from stweil/tests Fix and enable stringrenderer_test	2019-01-24 07:59:07 +01:00
Stefan Weil	9e6e3a0232	Fix memory leak for PNG images Commit `5fe1390748` used an implementation which created a new Pix object. That object was never destroyed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 20:05:10 +01:00
Diego de la Hera	1a398a5b5d	removed reference to unbound variable	2019-01-23 15:04:16 -03:00
Stefan Weil	ecf73f5bc7	training: Don't terminate after processing 8 fonts or 8 images tesstrain_utils.sh sets the shell flag -e, so it exits immediately if a command exits with a non-zero status. The following command returns a non-zero status as soon as counter is a multiple of par_factor (par_factor=8, that means as soon as 8 fonts or images are processed): let rem=counter%par_factor The new code fixes this undesired exit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 17:26:40 +01:00
Stefan Weil	32e9d7c8f5	training: Fix some compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Stefan Weil	e4b862d588	pango_font_info: Fix runtime error messages from Pango pango_coverage_get and pango_coverage_unref should not be called with coverage == nullptr. pango_font_get_coverage should not be called with font == nullptr. Otherwise Pango prints runtime error messages: (process:12657): Pango-CRITICAL : pango_coverage_get: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_coverage_unref: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_font_get_coverage: assertion 'font != NULL' failed (process:12657): GLib-GObject-CRITICAL : g_object_unref: assertion 'G_IS_OBJECT (object)' failed Typically those errors occur if a required font is not installed, so this can be a quite common error. Fix also a potential resource leak in PangoFontInfo::CoversUTF8Text. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Shree Devi Kumar	77d0b6ce8e	fix WORDLIST filename	2019-01-22 15:49:55 +01:00
Stefan Weil	564482db30	Fix selection of IntSimdMatrix method Commit `d36231e3e4` did not distinguish between AVX and AVX2, so AVX2 code was enabled for IntSimdMatrix even when only AVX was supported. This resulted in an illegal instruction. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-20 22:13:04 +01:00
Stefan Weil	66e31bfd8c	OpenCL: Fix alloc-dealloc mismatch Bug message from AddressSanitizer: ==7153==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs free) on 0x602000072cb0 #0 0x7ffff70c6a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10) #1 0x555557188638 in writeProfileToFile ../../../../../src/opencl/openclwrapper.cpp:541 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 08:06:26 +01:00
Stefan Weil	ad19183b92	OpenCL: Fix heap buffer overflow Bug message from AddressSanitizer: ==6158==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fffe774b7fc at pc 0x555557086b54 bp 0x7fffffffcee0 sp 0x7fffffffced8 READ of size 1 at 0x7fffe774b7fc thread T0 #0 0x555557086b53 in tesseract::HistogramRect(Pix, int, int, int, int, int, int) ../../../../../src/ccstruct/otsuthr.cpp:163 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 07:58:16 +01:00
Stefan Weil	502bb624c2	More optimisations for IntSimdMatrix * Move IntDotProductSSE. That allows inlining of the code. * Improve IntDotProductSSE by moving some instructions. * Remove unused num_input_groups_ from IntSimdMatrix. * Re-order elements in IntSimdMatrix to avoid padding. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	95606398f5	Clean code for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7fc7d28dd0	Compile files for AVX, AVX2 or SSE only when needed Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	a9a1035e55	Move IntSimdMatrixNative from IntSimdMatrix to unittest It is only used for the unit test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	d36231e3e4	Set best or user selected IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	605b4d66c7	Replace dynamically allocated IntSimdMatrix instances by constants Two header files are no longer needed and could be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	26be7c5d2e	Use constructor with parameters for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	e237a38405	Add const attributes to IntSimMatrix multiplier IntSimMatrix no longer contains variable members. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7c70147701	Move shaped weights from IntSimMatrix to WeightMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	ea4d0d354b	Format comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	c79d613b65	Replace ASSERT_HOST by assert Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
zdenop	f75b2c1948	Merge pull request #310 from nickjwhite/hocrcharboxes Character boxes in hOCR output	2019-01-14 19:19:04 +01:00
Stefan Weil	9adf6e442b	Revert `59fb3370bb` (-ffast-math) It breaks intsimdmatrix_test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 17:56:35 +01:00
Nick White	ebbf907c56	Fix typo in hocr character box output	2019-01-13 16:28:31 +00:00
Nick White	4ce797b6f6	Fix hocr character box info to use new hocr renderer correctly	2019-01-13 13:01:14 +00:00
Nick White	c43e4501e3	Merge remote-tracking branch 'origin/master' into hocrcharboxes	2019-01-13 12:41:42 +00:00
zdenop	238cb219d5	Merge pull request #2152 from stweil/clean Remove opencl_device_selection.h	2019-01-09 15:02:59 +01:00
Stefan Weil	a0e6586e63	Fix documentation for page segmentation mode 2 It never worked, so add a comment that the implementation is missing. Add also a to-do comment. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 13:51:44 +01:00
Stefan Weil	0fae848b58	OpenCL: Add comments to users of openclwrapper.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:11:00 +01:00
Stefan Weil	e0fc4f2945	Remove opencl_device_selection.h Always use OpenCL device selection if OpenCL is enabled. This fixes a regression which was introduced by commit `5c6a57b727` which removed the definition for USE_DEVICE_SELECTION. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:09:56 +01:00
Stefan Weil	595bb7df16	OpenCL: Remove unused code The OpenCL kernel pixSubtract is never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-05 16:41:20 +01:00
Nick White	b8de06430d	Ensure baseapi.h header is used by commontraining.h regardless of autotools usage	2019-01-04 20:20:00 +00:00
Nick White	cd34ee55ec	Add necessary intproto.h header to protos.cpp	2019-01-04 20:19:54 +00:00
Stefan Weil	62b635a74e	Remove unused functions from cluster.cpp Add also missing static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 13:16:31 +01:00
Stefan Weil	f76d8a14cd	Remove unused code from oldlist Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 12:27:10 +01:00
Stefan Weil	7719f80155	Add missing std namespace in tensorflow code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 11:15:36 +01:00
Stefan Weil	8a6fa452dc	Fix build for architectures without CPUID Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 09:32:36 +01:00
Stefan Weil	91af010200	Fix compiler warning gcc warning: src/training/text2image.cpp:694:35: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings] putenv expects a string which can be modified. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:49:04 +01:00
Stefan Weil	5dd606c631	Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:45:49 +01:00
Stefan Weil	d9600cd82e	Fix and simplify SIMD tests The tests for SSE and AVX must only be done if the correct compiler flags were used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 11:19:17 +01:00
zdenop	d3065520fa	fix 2 clang warnings	2018-12-30 20:25:24 +01:00
Stefan Weil	cb049133cd	Fix compiler warning clang warning: tesseractmain.cpp(512,21): warning: '&&' within '\|\|' [-Wlogical-op-parentheses] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-29 22:17:33 +01:00
zdenop	420fb0ced0	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2018-12-29 10:31:33 +01:00
zdenop	8885fe2ccb	provide info about compiled openmp version	2018-12-29 10:18:27 +01:00
Stefan Weil	993e56ffde	Don't try to create text output if other renderers failed (fix regression) Commit `49d7df6dc3` added error handling, but since that commit Tesseract used the text fallback if the user selected output failed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-27 10:23:28 +01:00
zdenop	cc997b53c7	add missing the implementation for TessBaseAPIGetAltoText method in C-API	2018-12-26 21:35:47 +01:00
Stefan Weil	db9c7e0312	Use std::stringstream to generate hOCR output Using std::stringstream simplifies the code and allows conversion of double to string independant of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-16 20:14:11 +01:00
zdenop	72d8df581b	Merge pull request #2121 from stweil/hocr Move code for hOCR renderer to new file	2018-12-16 16:26:27 +01:00
Stefan Weil	c7e8d30280	Fix value for PHYSICAL_IMG_NR in ALTO output Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-16 15:07:02 +01:00
Stefan Weil	457c53026d	Fix indentation of hOCR output Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 17:51:59 +01:00
Stefan Weil	5de3fc47bb	Format code in new file hocrrenderer.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 15:35:21 +01:00
Stefan Weil	48713f7df2	Move code for hOCR renderer to new file Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 15:33:47 +01:00
zdenop	1f5fb15af3	remove setting constant resolution from ImageThresholder::SetImage. Credible resolution with be set afterward. Fixes #2080.	2018-12-14 19:23:22 +01:00
zdenop	6d06d39bf4	Merge pull request #2118 from stweil/clean protos: Remove several unused macros, functions and global variables	2018-12-14 09:20:53 +01:00
Stefan Weil	b8c4f1b9fc	protos: Remove unused config variable Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-13 21:37:33 +01:00
Stefan Weil	f35eeb3b4a	protos: Remove several unused macros, functions and global variables The unused global variable TrainingData used a lot of runtime memory. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-13 21:32:56 +01:00
Stefan Weil	fbbbdb4565	Use std::stringstream to generate ALTO output and add <SP> element Using std::stringstream simplifies the code. The <SP> element is needed between two >String> elements. Remove also some unneeded spaces in the ALTO output. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-12 22:29:35 +01:00
Stefan Weil	7ebd3153ae	Fix several typos (most of them found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-10 18:59:58 +01:00
Stefan Weil	81ab302d52	FPRow: Remove three unused methods This fixes warnings from the Intel compiler: src/textord/cjkpitch.cpp(319): warning #177: function "<unnamed>::FPRow::good_gaps" was declared but never referenced src/textord/cjkpitch.cpp(383): warning #177: function "<unnamed>::FPRow::is_bad" was declared but never referenced src/textord/cjkpitch.cpp(387): warning #177: function "<unnamed>::FPRow::is_unknown" was declared but never referenced Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 16:43:52 +01:00
Stefan Weil	404f9cd147	SimpleStats: Remove unused method This fixes a warning from the Intel compiler: src/textord/cjkpitch.cpp(79): warning #177: function "<unnamed>::SimpleStats::maximum" was declared but never referenced Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 16:39:46 +01:00
Stefan Weil	a9121d28f3	Merge pull request #2107 from stweil/march Add check whether compiler supports -march=native flag	2018-12-08 10:53:09 +01:00
Stefan Weil	2c044df959	Fix wrong x_fsize in hOCR output (regression) The regression was caused by the latest commit `c9e85ab78f`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 10:39:31 +01:00
Stefan Weil	2ccc5810f3	Add check whether compiler supports -march=native flag Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-05 20:13:28 +01:00
Stefan Weil	c9e85ab78f	Fix wrong font attributes in hOCR output Instrumented code throws this runtime error during OCR: ../../src/api/baseapi.cpp:1616:5: runtime error: load of value 128, which is not a valid value for type 'bool' ../../src/api/baseapi.cpp:1627:5: runtime error: load of value 128, which is not a valid value for type 'bool' If there is no font information (typical for Tesseract with a LSTM model), the font attributes got random values resulting in wrong hOCR output. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-04 10:52:46 +01:00
Stefan Weil	0bdae8f8bf	GENERIC_2D_ARRAY: Fix runtime error in assignment operator Instrumented code throws this runtime error during OCR: ../../src/ccstruct/matrix.h:84:11: runtime error: null pointer passed as argument 2, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-04 10:48:46 +01:00
Stefan Weil	f0a4d04187	Add config variable for selection of dot product function All also a C++ implementation with more aggressive compiler options which is optimized for the CPU where the software was built. It is now possible to select the function used for the dot product with -c dotproduct=FUNCTION where FUNCTION can be one of those values: * auto selection based on detected hardware (default) * generic C++ code with default compiler options * native C++ code optimized for build host * avx optimized code for AVX * sse optimized code for SSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-01 00:19:28 +01:00
zdenop	b527b37825	Merge pull request #2097 from stweil/namespace SIMDDetect: Use tesseract namespace and format code	2018-12-01 00:02:18 +01:00
Stefan Weil	1910b1a72b	SIMDDetect: Use tesseract namespace and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:36:39 +01:00
Stefan Weil	66d3275d0b	IntSimdMatrixSSE: Remove unused include statement and simplify code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	048eb34934	Add missing static attribute to local inline functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	b73370aac9	Remove unneeded test for nullptr IntSimdMatrix::GetFastestMultiplier never returns a nullptr. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	e2419b1968	Fix potential crash in tprintf Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	6b6d9de497	Fix potential crash in STRING class Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	59fb3370bb	Use -ffast-math for calculation of dot product This reduces the code size for intsimdmatrixavx2 from 2700 to 2668 and slightly improves the performance for fast models with AVX2. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 22:52:04 +01:00
Stefan Weil	fda3ba9009	IntSimdMatrixSSE: Fix comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 22:13:32 +01:00
zdenop	07b140364f	Merge pull request #2093 from stweil/python Updates for Python scripts	2018-11-30 08:10:20 +01:00
zdenop	53600c677e	Merge pull request #2092 from stweil/format Format new ALTO code with clang-format	2018-11-30 08:08:52 +01:00
zdenop	f6493dd5e8	Merge pull request #2090 from stweil/inline Optimize performance by using inline functions	2018-11-30 08:07:45 +01:00
Stefan Weil	c59c45fb3e	Fix Amharic font list This was reported for the Python code by LGTM. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 08:00:22 +01:00

... 5 6 7 8 9 ...

1165 Commits