tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-23 23:17:49 +08:00

Author	SHA1	Message	Date
Stefan Weil	3bd61bfae4	svutil: Clean include file * Remove MIN, MAX macros. They are unused. * Include windows.h indirectly by including host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	e12b99d49b	Remove host.h from Tesseract API It is not needed by other API header files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	8a34da027f	Fix typo in description Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:50:37 +02:00
Shree	f8fba6362b	fix the coordinates for EOL tab	2019-04-22 09:54:20 +00:00
zdenop	3ec7c22a87	fix missing EOL	2019-04-22 08:49:55 +02:00
Stefan Weil	09255ebe44	Don't include windows.h from platform.h This partially reverts commit `c150b9832d`. Now params.cpp includes host.h which also gets the definition for MAX_PATH. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-21 22:20:13 +02:00
zdenop	6781d78211	Merge pull request #2399 from stweil/pgedit pgedit: Remove unused global functions	2019-04-20 19:26:02 +02:00
Stefan Weil	4ac1fad18a	pdfrenderer: Replace snprintf by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Some snprintf statements are not needed at all because a constant string can be appended directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	07d5365a1f	baseapi: Use std::stringstream to format float values Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	743fc2562d	Remove unneeded include statements for pgedit.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	26dd0b82bf	pgedit: Remove unused global functions pgeditor_show_point is unused, so remove it completely. Some more functions are only used locally, so make them static functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	217c2530e6	Remove strtofloat Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	7c3f9000cd	Replace sscanf by std::stringstream Using std::stringstream allows working with the C locale, independent of the current locale settings. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	5529a5db11	unittest: Fix and enable params_model_test This needs the latest test submodule. The test uses LoadFromFile which is not used otherwise, so remove that function from class ParamsModel. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-18 17:06:48 +02:00
Stefan Weil	a1ffcd3654	Use std::stringstream for add_str_double Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:16:16 +02:00
Stefan Weil	aa64a63f69	Use std::stringstream to generate PDF output Using std::stringstream simplifies the code and allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:15:39 +02:00
Stefan Weil	78a957b989	Remove spaces a line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:54:42 +02:00
Stefan Weil	12ca2513d4	Revert "e" flag for fopen clang-tidy added it in commit `ac0b191f6b`. The "e" flag is an extension for glibc which sets the O_CLOEXEC flag, so the file handle is not leaked to child processes. It is not needed here. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:53:57 +02:00
Samuel Lee	e32b3360aa	Fix for MSVC LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.	2019-04-11 02:33:51 +09:00
Stefan Weil	f88a7f28e3	fontinfo: Fix wrong delete Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:16:04 +02:00
Stefan Weil	3dfe1b8807	classify: Modernize function UniformDensity This should fix an issue reported by Codacy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:13:45 +02:00
Stefan Weil	72c874140e	Modernize code by replacing C type casts This was done using clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 09:04:51 +02:00
zdenop	95a15a7a82	fix cmake&clang build	2019-04-06 15:31:53 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Robert Schubert	25a42ea42f	fixed failure report for tesstrain commands: - with `set -e` in effect, looking at stdout to detect failure is too late	2019-04-06 08:13:03 +02:00
Robert Schubert	d5584e793e	fixed failure report for tesstrain commands: - with `set -e` in effect, it does not make sense to query `$?` indirectly	2019-04-06 08:13:03 +02:00
zdenop	be617b3722	Merge pull request #2361 from Shreeshrii/truth Change message display for debug_level -1 during lstmtraining	2019-04-05 10:52:21 +02:00
zdenop	2982cb4ff3	Merge pull request #2368 from amitdo/no-legacy-fix disable-legacy build: Do not include unused headers	2019-04-05 09:35:04 +02:00
Stefan Weil	d35a6f2de5	Modernize code (clang-tidy check modernize-deprecated-headers) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
amitdo	fab9a54981	Remove unneeded 'SUBDIRS=' from 3 Makefile.am files	2019-04-04 19:31:39 +02:00
Shree	6673347986	Change page to line in message	2019-04-04 15:43:29 +00:00
Shree	51c3535310	Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1	2019-04-04 15:33:22 +00:00
Shree	84d4cc2e95	Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1	2019-04-04 15:33:22 +00:00
Amit D	2069c057d6	Merge branch 'master' into no-legacy-fix	2019-04-04 18:26:22 +03:00
Egor Pugin	2a1d238bd5	Merge pull request #2366 from stweil/modernize Modernize code with "using"	2019-04-04 15:13:10 +03:00
amitdo	546014aecd	disable-legacy build: Do not include unused headers	2019-04-04 15:09:08 +03:00
Stefan Weil	98346c2cd4	Modernize and format code The code was modernized using clang-tidy with "modernize-use-using". The modified files were then formatted using clang-tidy with "google-readability-braces-around-statements", then clang-format was applied. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-03 21:02:23 +02:00
Shreeshrii	613c2bf6e4	Change pages to lines in message The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.	2019-04-03 10:41:14 +05:30
Egor Pugin	af7cc1ce4c	Fix windows build.	2019-04-01 22:38:01 +03:00
Stefan Weil	81fbd878dd	Add more missing include statements for Windows build Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-01 08:10:25 +02:00
Stefan Weil	ab009fae94	Remove macro WINDLLNAME It is now no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:05:41 +02:00
Stefan Weil	77a5f2623e	Remove unused config variable tessedit_module_name It was only defined for Windows builds. Use also false instead of 0 to set the default value of two boolean config variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:04:00 +02:00
Stefan Weil	c150b9832d	Add missing include statements for Windows build The last commits which removed BOOL8 had broken the Windows build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 19:02:29 +02:00
Stefan Weil	802f42e821	Remove BOOL8, TRUE, FALSE from host.h Remove unneeded include statements for host.h, add required ones and update the comments for the remaining include statements. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:27:20 +02:00
Stefan Weil	be96b7b660	bits16: Format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:26:50 +02:00
Stefan Weil	146079f31d	api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:15:53 +02:00
Stefan Weil	4e0c726d6c	ccutil: replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:47 +02:00
Stefan Weil	da0c14ae45	cutil: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:19 +02:00
Stefan Weil	87a973652c	classify: Replace BOOL8, TRUE, FALSE by bool, true, false Simplify also some related code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:48 +02:00
Stefan Weil	30ee3afc29	textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:20 +02:00
Stefan Weil	b391ab84d0	wordrec: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:54:21 +02:00
Stefan Weil	cbb5e729a1	classify: Use bool and replace TRUE, FALSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:50 +02:00
Stefan Weil	46fa59aadc	ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:06 +02:00
Stefan Weil	92b9f9f8de	ccmain: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:52:09 +02:00
Stefan Weil	7db25e15c0	Remove unused config variable tessedit_single_match Replace also TRUE, FALSE by true, false. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:38:35 +02:00
Stefan Weil	ca2947a2c0	blobclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:36:46 +02:00
Stefan Weil	f2bd98e656	PageIterator: Remove useless const Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:35:43 +02:00
Stefan Weil	813b7803e0	pgedit: Replace BOOL8 by bool Replace also TRUE, FALSE by true, false and add some static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:29:15 +02:00
Stefan Weil	664811a869	Replace BOOL8, TRUE, FALSE by bool, true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:28:28 +02:00
Stefan Weil	51a2c2eae8	Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:24:02 +02:00
Stefan Weil	95ea778745	capi: Replace FALSE, TRUE and simplify and format code Format code using clang-format and clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:19:04 +02:00
Stefan Weil	89ba48b106	strngs: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:13:38 +02:00
Stefan Weil	127d0e31f0	serialis: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:12:11 +02:00
Stefan Weil	8b663e7620	helpers: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:06:19 +02:00
zdenop	3bb8f9cd49	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-03-31 16:54:15 +02:00
zdenop	5f06402755	python: optimize imports, reformat code	2019-03-31 16:53:39 +02:00
zdenop	2e9fd69c9e	use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"	2019-03-31 16:53:33 +02:00
zdenop	a0527b41bd	fix LGTM reports for python	2019-03-31 16:53:25 +02:00
Stefan Weil	1948f0d520	ocrclass: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:39:44 +02:00
Stefan Weil	85957e9673	WERD: Don't print space character after "FALSE" at end of line Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:32:42 +02:00
Stefan Weil	83d4433d3b	Modernize and format unichar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:30:15 +02:00
Stefan Weil	ac0b191f6b	Modernize and format genericvector.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:21:32 +02:00
Stefan Weil	36ed08636b	Modernize and format tesscallback.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:16:00 +02:00
zdenop	f47c7c92dd	fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer; CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142	2019-03-31 12:26:49 +02:00
Shreeshrii	ea36e94e58	fix Could not parse bool from flag (#2359 )	2019-03-29 14:50:21 +01:00
Stefan Weil	852598eecf	Remove file tessedit.h It only declared the unused global variable global_monitor which is now removed, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	6e59abcce2	Remove file cutil.h It only contained three type definitions which fit better in other include files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	b6bfb20f1d	Improve readability of conditional code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	36a1a30c22	Remove some old type casts Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	a44bf41f14	Modernize C++ loops The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-loop-convert' -fix Then the resulting code was cleaned manually. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 08:38:21 +01:00
Stefan Weil	ed011670c8	Modernize C++ code using bool literals The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-bool-literals' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:58:02 +01:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	36f768853a	Modernize C++ code using override The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-override' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:37:52 +01:00
Stefan Weil	f877640bc9	Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval tesstrain: check failure of subjobs	2019-03-25 16:10:09 +01:00
Stefan Weil	d8d2f6f48a	Fix broken shell scripts for training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 15:32:43 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ecaad2aca8	ccstruct/werd: Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 07:57:34 +01:00
Stefan Weil	b1e305f38c	Simplify code which tests for non-empty StringParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:35:52 +01:00
Stefan Weil	f9860cda41	Optimize functions ResetFrom The loop can terminate as soon as the parameter name was found. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:21:23 +01:00
Stefan Weil	41da5afe9d	UNICHARSET: Fix compiler warning (signed/unsigned mismatch) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:18:21 +01:00
Stefan Weil	91e2b253c0	Format modified code with clang-format Format the files which were changed in commit `297d7d86ce`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:10:29 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	58423d2f6c	Merge pull request #2328 from bertsky/lstm-with-user-patterns2 Add user words / patterns again	2019-03-24 19:38:40 +01:00
zdenop	0d36d9a9d7	Merge pull request #2341 from Shreeshrii/fix Fix	2019-03-24 18:21:09 +01:00
Stefan Weil	da6305b632	Fix compiler warnings caused by ASSERT_HOST The modified definition avoids warnings caused by redundant semicolons. Now a semicolon is required when using the macro, so a few code locations had to be updated. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:47:04 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	f4f34a87db	WERD_RES: Fix uninitialized member variable Credit to OSS-Fuzz which reported this issue: pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool' #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7 #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3 #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 14:59:08 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Shreeshrii	8749f3553e	LINEDATA=false	2019-03-23 19:16:49 +05:30
Shree	bcb7cf9846	sort arguments, use true/false instead of 1/0	2019-03-23 12:28:53 +00:00
Shree	c2db272134	Modify distort_image for Boolean	2019-03-22 17:02:46 +00:00
Shree	259d5af6b1	Add PSM values to the definition	2019-03-22 15:29:02 +00:00
Shree	8eafec0d17	Fix comments with current values of PSM codes	2019-03-22 14:10:49 +00:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Shree	9b915d5efb	add --distort_image	2019-03-22 05:39:38 +00:00
Shree	f7ffde99d5	add --distort_image	2019-03-22 05:34:00 +00:00
zdenop	ac7ea4322a	Merge pull request #2335 from Shreeshrii/master Changes to tesstrain.py - max_workers=8, distort_image=false	2019-03-17 15:27:34 +01:00
zdenop	26877ba703	check min. python version; os.uname is not available on windows	2019-03-17 15:25:48 +01:00
Shreeshrii	f8e8521606	Update tesstrain_utils.py	2019-03-17 15:32:35 +05:30
Shree	6fa8e1bb15	Set max_workers=8	2019-03-17 09:58:11 +00:00
Shree	e21499e81e	Set default value for distort_image	2019-03-17 09:54:16 +00:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Shree	d47b0d588a	Use LATIN_FONTS for kmr	2019-03-15 15:47:56 +00:00
Shree	3eee1d217a	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 15:37:49 +00:00
Robert Schubert	297d7d86ce	trying to add user words/patterns again: - pass in ParamsVectors from Tesseract (carrying values from langdata/config/api) into LSTMRecognizer::Load and LoadDictionary - after LSTMRecognizer's Dict is initialised (with default values), reset the variables user_{words,patterns}_{suffix,file} from the corresponding entries in the passed vector	2019-03-15 16:06:19 +01:00
Shree	b2ebf0195f	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 14:39:39 +00:00
Shree	37befdf6c4	Add option for --distort_image	2019-03-15 13:32:36 +00:00
zdenop	0a36b38169	Merge pull request #2317 from eighttails/master Added missing linker flags for MinGW.	2019-03-15 08:01:21 +01:00
Robert Schubert	14346e56b0	tesstrain: catch+handle SIGINT (to stop waiting on subjobs)	2019-03-15 00:03:16 +01:00
Robert Schubert	6cbad17e30	tesstrain: check all subjobs' retval	2019-03-14 14:38:51 +01:00
Robert Schubert	5316bcbb94	tesstrain: check failure of subjobs	2019-03-14 11:42:01 +01:00
Stefan Weil	4c2bbebecc	Fix compiler warning (-Wunused-value) Warning from clang++: ..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:56:03 +01:00
Stefan Weil	ed84ba0a44	Fix wrong comparison symbol_steps is a vector, so testing for a nullptr was wrong. clang++ reports: ..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare] if (&word_res_->symbol_steps == nullptr \|\| !LSTM_mode_) return nullptr; ~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:38:38 +01:00
Tadahito Yao	bbbd262a8d	Added missing linker flags for MinGW.	2019-03-13 22:10:36 +09:00
jm server2	1206362d30	`accumulated_timesteps` is not a pointer but a vector and in case we use ChoiceIterator without `lstm_choice_mode` tesseract crashes (or similar) because the check is true and we reference not existing item	2019-03-13 12:55:14 +01:00
Stefan Weil	3baf0d8076	Fix boolean assignments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 15:34:24 +01:00
Stefan Weil	8ad0489f0f	Remove svpaint.cpp from libtesseract svpaint is a standalone application (it includes a main function) and should not be part of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 12:22:53 +01:00
zdenop	7546a01020	Merge pull request #2310 from noahmetzger/LSTMChoiceRIL Lstm choice ril	2019-03-12 10:46:11 +01:00
Stefan Weil	35a999f91a	Fix assertion caused by wrong unicharset Credit to OSS-Fuzz: it found another case which triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 This is the OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:31:21 +01:00
Stefan Weil	56a39bda77	Fix float division by zero That runtime error is normally not visible because it does not abort the program, but is detected when the code was compiled with sanitizers. It can be triggered with this OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:28:16 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
zdenop	f80085c0bf	Merge pull request #2289 from Armyke/master Added an additional optional --tmp_dir parameter to specify the tempo…	2019-03-06 15:03:14 +01:00
Stefan Weil	1c7e00611b	Add initial support for traineddata files in standard archive formats This requires libarchive-dev. Tesseract can now load traineddata files in any of the archive formats which are supported by libarchive. Example of a zipped BagIt archive: $ unzip -l /usr/local/share/tessdata/zip.traineddata Archive: /usr/local/share/tessdata/zip.traineddata Length Date Time Name --------- ---------- ----- ---- 55 2019-03-05 15:27 bagit.txt 0 2019-03-05 15:25 data/ 1557 2019-03-05 15:28 manifest-sha256.txt 1082890 2019-03-05 15:25 data/eng.word-dawg 1487588 2019-03-05 15:25 data/eng.lstm 7477 2019-03-05 15:25 data/eng.unicharset 63346 2019-03-05 15:25 data/eng.shapetable 976552 2019-03-05 15:25 data/eng.inttemp 13408 2019-03-05 15:25 data/eng.normproto 4322 2019-03-05 15:25 data/eng.punc-dawg 4738 2019-03-05 15:25 data/eng.lstm-number-dawg 1410 2019-03-05 15:25 data/eng.freq-dawg 844 2019-03-05 15:25 data/eng.pffmtable 6360 2019-03-05 15:25 data/eng.lstm-unicharset 1012 2019-03-05 15:25 data/eng.lstm-recoder 1047 2019-03-05 15:25 data/eng.unicharambigs 4322 2019-03-05 15:25 data/eng.lstm-punc-dawg 16109842 2019-03-05 15:25 data/eng.bigram-dawg 80 2019-03-05 15:25 data/eng.version 6426 2019-03-05 15:25 data/eng.number-dawg 3694794 2019-03-05 15:25 data/eng.lstm-word-dawg --------- ------- 23468070 21 files `combine_tessdata -d` and `combine_tessdata -u` also work. The traineddata files in the new format can be generated with standard tools like zip or tar. More work is needed for other training tools and big endian support. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-05 17:18:48 +01:00
Armyke	56b04d4ea7	Added the same --tmp_dir flag to tesstrain_utils.sh	2019-03-04 14:05:25 +00:00
Armyke	25fa392887	Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive	2019-03-04 13:26:53 +00:00
Stefan Weil	7fbde96a04	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:26:07 +01:00
Stefan Weil	38fac625cd	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:01:48 +01:00
Shree	a0202bac70	Rename function to TessBaseAPIGetTsvText to be consistent to the Create method	2019-03-02 16:29:53 +00:00
zdenop	5de2a21b3f	Merge pull request #2283 from Shreeshrii/lstmbox Add missing renderers to C-API	2019-03-02 15:15:34 +01:00
Stefan Weil	9c90894ff0	PAGE_RES_IT: Optimize compare operators by using inline code Avoiding a function call will make both == and != operator faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:57:16 +01:00
Stefan Weil	295996ed05	commandlineflags: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:21:04 +01:00
Stefan Weil	eb14726aac	ICOORD: Fix old type casts This fixes compiler warnings and avoids unnecessary conversions between float and double. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	fb0f1bcf66	BoxChar: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	0e1a1fc3cf	Validator: Fix compiler warnings (signed/unsigned) This also fixes a regression in validate_grapheme_test introduced by commit `32e9d7c8f5`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 13:05:03 +01:00
Shree	c7e8131efc	Add TSV option to C-API	2019-03-02 09:50:54 +00:00
Shree	22c099348b	rename LSTMBOX to LSTMBox	2019-03-02 09:11:47 +00:00
zdenop	2ba8e0061a	Merge branch 'master' into mya	2019-03-01 18:37:24 +01:00
Shree	c33f03e33e	Add lstmboxand wordstrbox to capi.h	2019-03-01 17:16:59 +00:00
Shree	76ec21df3d	Add lstmbox and wordstrbox to C-API	2019-03-01 16:40:41 +00:00
zdenop	646b043d2c	use space instead of tab	2019-03-01 14:36:09 +01:00
Shree	5ee1deaea2	correct handling of 0BF0-0BFA Tamil numbers and symbols	2019-03-01 13:21:49 +00:00
zdenop	d7ddc4c5b7	Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER Treat U_ARABIC_NUMBER as LTR	2019-02-28 09:27:54 +01:00
zdenop	12c1225a5f	Merge pull request #2271 from stweil/refactor Refactor class Network	2019-02-27 07:43:13 +01:00
Michal Čihař	14c4494f42	Allow UTF-8 variant of C locale It behaves same in scanf, but it allows proper handling of unicode chars.	2019-02-26 21:37:33 +01:00
Stefan Weil	98dd3b6351	Refactor class Network That class is an abstract class with several pure virtual functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-26 16:55:31 +01:00
Shree	25b02bf1f2	Treat U_ARABIC_NUMBER as LTR	2019-02-26 09:51:21 +00:00
Shreeshrii	2f71fe280c	Use alternative way to comment a block of code (using the c preprocessor). https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382 Thanks @amitdo	2019-02-26 15:05:51 +05:30
Shree	449f1cd4ba	Remove test for Word started with a combiner	2019-02-25 18:47:42 +00:00
zdenop	25c43b1e7c	Merge branch 'master' into distort	2019-02-23 18:23:14 +01:00
Stefan Weil	b3e355a682	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-23 17:49:56 +01:00
Shreeshrii	34e4d6b1d7	Revert to 0 (50% percents of images inverted).	2019-02-23 17:59:00 +05:30
Shreeshrii	287d5341bf	TODO	2019-02-23 17:56:02 +05:30
Shreeshrii	3e3e1ed55d	Remove commented Code	2019-02-23 17:54:00 +05:30
zdenop	c02f5e99fc	Merge pull request #2259 from Shreeshrii/distort implement PrepareDistortedPix as part of DegradeImage	2019-02-22 21:06:29 +01:00
Shree	2aded47a3c	Implement distort_image in text2image - default false	2019-02-22 12:27:27 +00:00
Shree	49ed3a72d4	implement PrepareDistortedPix as part of DegradeImage	2019-02-21 14:48:29 +00:00
zdenop	e250f3422d	Merge pull request #2258 from stweil/doc Fix doxygen comments	2019-02-21 07:41:22 +01:00
Stefan Weil	2cbe723d03	Fix doxygen comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 21:11:38 +01:00
Stefan Weil	ef4d5b2e69	Optimize calculation of dot product for double vectors with AVX This improves the performance with best models and should also make training faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 17:45:38 +01:00
Stefan Weil	b3bd23edb7	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-19 13:53:31 +01:00
Stefan Weil	b95598a0b1	Merge pull request #2070 from pndaza/master add missed letters ( ၌ ၍ ၎ ၏ ) and symbols ( ၊ ။ ) - 0x104a to 0x104f -	2019-02-19 12:22:53 +01:00
Stefan Weil	38861be639	Use __builtin_trap instead of null pointer dereference to abort This fixes a warning from Apple's clang compiler: [ 34%] Building CXX object CMakeFiles/libtesseract.dir/src/ccutil/errcode.cpp.o /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference] reinterpret_cast<int>(0) = 0; ^~~~~~~~~~~~~~~~~~~~~~~~~~ /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: note: consider using __builtin_trap() or qualifying pointer with 'volatile' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-18 10:49:51 +01:00
Stefan Weil	ddea230b1b	Don't compute function tables at compile time with clang The current code fails to compile with clang compilers on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-17 08:38:42 +01:00
zdenop	15f2a4b2c1	Merge pull request #2231 from Shreeshrii/wordstr Add renderer to create WordStr box files from images	2019-02-16 13:48:06 +01:00
Stefan Weil	862322c18c	Fix check for images which are too small to scale Images with width == min_width are not too small. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-15 13:53:11 +01:00
Shree	a044f64375	fix Myanmar validation rules as per Unicode charts	2019-02-15 04:40:55 +00:00
Stefan Weil	c0523ee5a2	Fix compiler warning g++ warning: src/lstm/functions.h:152:35: warning: unused parameter ‘x’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	3556152412	Compute function tables at compile time This requires C++ 14. Older compilers still use the old code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	f491eb6188	Simplify tanh and logistic functions and precompute function tables Both functions are called very often, so computing the table values at program start should be faster than computing them on demand. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-12 12:04:08 +01:00
Shree Devi Kumar	f3362a4b5b	Add renderer to create WordStr box files from images	2019-02-10 19:59:17 +00:00
zdenop	2ae65b2493	Merge pull request #2216 from Shreeshrii/lstmbox Lstmbox	2019-02-10 13:53:41 +01:00
Shree Devi Kumar	311053681c	put common code in AddBoxToLSTM	2019-02-10 09:16:45 +00:00
zdenop	e51f1885e6	Merge pull request #2229 from stweil/warn Fix some compiler warnings	2019-02-10 08:20:23 +01:00
Shree Devi Kumar	b51c1bf05a	change to const char* as suggested by @stweil	2019-02-10 05:13:18 +00:00
Stefan Weil	0c9f7db536	Fix compiler warning (-Wimplicit-fallthrough) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:53:44 +01:00
Stefan Weil	d91c316ab1	FontInfo: Make sure that deleted member variables can no longer be used Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	877e62db55	Fix compiler warning (-Wmaybe-uninitialized) gcc warning: src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized] It's a false positive, but setting the variable to 0 satisfies the compiler. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	33f6dc2a67	Fix compiler warnings (-Wformat-truncation=) gcc warnings: src/viewer/scrollview.cpp:404:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] src/viewer/scrollview.cpp:572:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	2a355ea103	Fix compiler warnings (-Wimplicit-fallthrough) gcc warnings: src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=] [...] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	aa2dcca295	Fix compiler warnings (-Wstringop-truncation) gcc warnings: src/api/tesseractmain.cpp:252:14: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 255 equals destination size [-Wstringop-truncation] src/ccutil/unicharset.h:66:12: warning: ‘char* strncpy(char, const char, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation] src/ccutil/unicharset.cpp:806:12: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:09 +01:00
Stefan Weil	d42413dd17	OpenCL: Remove PERF_COUNT framework It was rarely used, but added a lot of code and an unconditional dependency on openclwrapper.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 10:58:15 +01:00
Shree Devi Kumar	0f42fd8c69	change to use bbox coordinates for TEXTLINE for all characters (cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)	2019-02-05 14:03:29 +00:00
Shree Devi Kumar	9c89cd51cf	Add a new renderer to create box files from images for LSTM training (cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a) fix typo (cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd) Add lstmboxrenderer to CMakeLists (cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e) fix formatting (cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)	2019-02-05 14:03:29 +00:00
Shreeshrii	c28a68115e	Merge branch 'master' into boxtiff	2019-02-02 23:42:39 +05:30
Shree Devi Kumar	d9590f8adf	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:35:45 +00:00
Shree Devi Kumar	323361b902	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:33:32 +00:00
Shree Devi Kumar	ad223296af	use --xsize instead of --x_size (cherry picked from commit 94b8988b8cca3812137933db00750bd6e2e84e32)	2019-02-02 11:08:34 +00:00
Mikhail Akopov	7be04342cf	Fix typo	2019-02-01 09:58:44 +01:00
Stefan Weil	b49806766e	Fix AVX2 support for Windows builds with MSC It was never detected, so the existing code for AVX2 was compiled but never used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-30 11:40:17 +01:00
Shree Devi Kumar	4d9bc11fd3	add --xsize as parameter for tesstrain	2019-01-27 07:00:25 +00:00
zdenop	12c1abcb6b	Merge pull request #2189 from stweil/fix Fix memory leak for PNG images	2019-01-24 07:59:55 +01:00
zdenop	059c50be8c	Merge pull request #2184 from stweil/tests Fix and enable stringrenderer_test	2019-01-24 07:59:07 +01:00
Stefan Weil	9e6e3a0232	Fix memory leak for PNG images Commit `5fe1390748` used an implementation which created a new Pix object. That object was never destroyed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 20:05:10 +01:00
Diego de la Hera	1a398a5b5d	removed reference to unbound variable	2019-01-23 15:04:16 -03:00
Stefan Weil	ecf73f5bc7	training: Don't terminate after processing 8 fonts or 8 images tesstrain_utils.sh sets the shell flag -e, so it exits immediately if a command exits with a non-zero status. The following command returns a non-zero status as soon as counter is a multiple of par_factor (par_factor=8, that means as soon as 8 fonts or images are processed): let rem=counter%par_factor The new code fixes this undesired exit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 17:26:40 +01:00
Stefan Weil	32e9d7c8f5	training: Fix some compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Stefan Weil	e4b862d588	pango_font_info: Fix runtime error messages from Pango pango_coverage_get and pango_coverage_unref should not be called with coverage == nullptr. pango_font_get_coverage should not be called with font == nullptr. Otherwise Pango prints runtime error messages: (process:12657): Pango-CRITICAL : pango_coverage_get: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_coverage_unref: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_font_get_coverage: assertion 'font != NULL' failed (process:12657): GLib-GObject-CRITICAL : g_object_unref: assertion 'G_IS_OBJECT (object)' failed Typically those errors occur if a required font is not installed, so this can be a quite common error. Fix also a potential resource leak in PangoFontInfo::CoversUTF8Text. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Shree Devi Kumar	77d0b6ce8e	fix WORDLIST filename	2019-01-22 15:49:55 +01:00
Stefan Weil	564482db30	Fix selection of IntSimdMatrix method Commit `d36231e3e4` did not distinguish between AVX and AVX2, so AVX2 code was enabled for IntSimdMatrix even when only AVX was supported. This resulted in an illegal instruction. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-20 22:13:04 +01:00
Stefan Weil	66e31bfd8c	OpenCL: Fix alloc-dealloc mismatch Bug message from AddressSanitizer: ==7153==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs free) on 0x602000072cb0 #0 0x7ffff70c6a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10) #1 0x555557188638 in writeProfileToFile ../../../../../src/opencl/openclwrapper.cpp:541 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 08:06:26 +01:00
Stefan Weil	ad19183b92	OpenCL: Fix heap buffer overflow Bug message from AddressSanitizer: ==6158==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fffe774b7fc at pc 0x555557086b54 bp 0x7fffffffcee0 sp 0x7fffffffced8 READ of size 1 at 0x7fffe774b7fc thread T0 #0 0x555557086b53 in tesseract::HistogramRect(Pix, int, int, int, int, int, int) ../../../../../src/ccstruct/otsuthr.cpp:163 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 07:58:16 +01:00
Stefan Weil	502bb624c2	More optimisations for IntSimdMatrix * Move IntDotProductSSE. That allows inlining of the code. * Improve IntDotProductSSE by moving some instructions. * Remove unused num_input_groups_ from IntSimdMatrix. * Re-order elements in IntSimdMatrix to avoid padding. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	95606398f5	Clean code for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7fc7d28dd0	Compile files for AVX, AVX2 or SSE only when needed Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	a9a1035e55	Move IntSimdMatrixNative from IntSimdMatrix to unittest It is only used for the unit test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	d36231e3e4	Set best or user selected IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	605b4d66c7	Replace dynamically allocated IntSimdMatrix instances by constants Two header files are no longer needed and could be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	26be7c5d2e	Use constructor with parameters for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	e237a38405	Add const attributes to IntSimMatrix multiplier IntSimMatrix no longer contains variable members. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7c70147701	Move shaped weights from IntSimMatrix to WeightMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	ea4d0d354b	Format comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	c79d613b65	Replace ASSERT_HOST by assert Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
zdenop	f75b2c1948	Merge pull request #310 from nickjwhite/hocrcharboxes Character boxes in hOCR output	2019-01-14 19:19:04 +01:00
Stefan Weil	9adf6e442b	Revert `59fb3370bb` (-ffast-math) It breaks intsimdmatrix_test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 17:56:35 +01:00
Nick White	ebbf907c56	Fix typo in hocr character box output	2019-01-13 16:28:31 +00:00
Nick White	4ce797b6f6	Fix hocr character box info to use new hocr renderer correctly	2019-01-13 13:01:14 +00:00
Nick White	c43e4501e3	Merge remote-tracking branch 'origin/master' into hocrcharboxes	2019-01-13 12:41:42 +00:00
zdenop	238cb219d5	Merge pull request #2152 from stweil/clean Remove opencl_device_selection.h	2019-01-09 15:02:59 +01:00
Stefan Weil	a0e6586e63	Fix documentation for page segmentation mode 2 It never worked, so add a comment that the implementation is missing. Add also a to-do comment. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 13:51:44 +01:00
Stefan Weil	0fae848b58	OpenCL: Add comments to users of openclwrapper.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:11:00 +01:00
Stefan Weil	e0fc4f2945	Remove opencl_device_selection.h Always use OpenCL device selection if OpenCL is enabled. This fixes a regression which was introduced by commit `5c6a57b727` which removed the definition for USE_DEVICE_SELECTION. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:09:56 +01:00
Stefan Weil	595bb7df16	OpenCL: Remove unused code The OpenCL kernel pixSubtract is never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-05 16:41:20 +01:00
Nick White	b8de06430d	Ensure baseapi.h header is used by commontraining.h regardless of autotools usage	2019-01-04 20:20:00 +00:00
Nick White	cd34ee55ec	Add necessary intproto.h header to protos.cpp	2019-01-04 20:19:54 +00:00
Stefan Weil	62b635a74e	Remove unused functions from cluster.cpp Add also missing static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 13:16:31 +01:00
Stefan Weil	f76d8a14cd	Remove unused code from oldlist Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 12:27:10 +01:00
Stefan Weil	7719f80155	Add missing std namespace in tensorflow code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 11:15:36 +01:00
Stefan Weil	8a6fa452dc	Fix build for architectures without CPUID Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 09:32:36 +01:00
Stefan Weil	91af010200	Fix compiler warning gcc warning: src/training/text2image.cpp:694:35: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings] putenv expects a string which can be modified. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:49:04 +01:00
Stefan Weil	5dd606c631	Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:45:49 +01:00
Stefan Weil	d9600cd82e	Fix and simplify SIMD tests The tests for SSE and AVX must only be done if the correct compiler flags were used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 11:19:17 +01:00
zdenop	d3065520fa	fix 2 clang warnings	2018-12-30 20:25:24 +01:00
Stefan Weil	cb049133cd	Fix compiler warning clang warning: tesseractmain.cpp(512,21): warning: '&&' within '\|\|' [-Wlogical-op-parentheses] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-29 22:17:33 +01:00
zdenop	420fb0ced0	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2018-12-29 10:31:33 +01:00
zdenop	8885fe2ccb	provide info about compiled openmp version	2018-12-29 10:18:27 +01:00
Stefan Weil	993e56ffde	Don't try to create text output if other renderers failed (fix regression) Commit `49d7df6dc3` added error handling, but since that commit Tesseract used the text fallback if the user selected output failed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-27 10:23:28 +01:00
zdenop	cc997b53c7	add missing the implementation for TessBaseAPIGetAltoText method in C-API	2018-12-26 21:35:47 +01:00
Stefan Weil	db9c7e0312	Use std::stringstream to generate hOCR output Using std::stringstream simplifies the code and allows conversion of double to string independant of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-16 20:14:11 +01:00
zdenop	72d8df581b	Merge pull request #2121 from stweil/hocr Move code for hOCR renderer to new file	2018-12-16 16:26:27 +01:00
Stefan Weil	c7e8d30280	Fix value for PHYSICAL_IMG_NR in ALTO output Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-16 15:07:02 +01:00
Stefan Weil	457c53026d	Fix indentation of hOCR output Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 17:51:59 +01:00
Stefan Weil	5de3fc47bb	Format code in new file hocrrenderer.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 15:35:21 +01:00
Stefan Weil	48713f7df2	Move code for hOCR renderer to new file Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-15 15:33:47 +01:00
zdenop	1f5fb15af3	remove setting constant resolution from ImageThresholder::SetImage. Credible resolution with be set afterward. Fixes #2080.	2018-12-14 19:23:22 +01:00
zdenop	6d06d39bf4	Merge pull request #2118 from stweil/clean protos: Remove several unused macros, functions and global variables	2018-12-14 09:20:53 +01:00
Stefan Weil	b8c4f1b9fc	protos: Remove unused config variable Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-13 21:37:33 +01:00
Stefan Weil	f35eeb3b4a	protos: Remove several unused macros, functions and global variables The unused global variable TrainingData used a lot of runtime memory. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-13 21:32:56 +01:00
Stefan Weil	fbbbdb4565	Use std::stringstream to generate ALTO output and add <SP> element Using std::stringstream simplifies the code. The <SP> element is needed between two >String> elements. Remove also some unneeded spaces in the ALTO output. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-12 22:29:35 +01:00
Stefan Weil	7ebd3153ae	Fix several typos (most of them found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-10 18:59:58 +01:00
Stefan Weil	81ab302d52	FPRow: Remove three unused methods This fixes warnings from the Intel compiler: src/textord/cjkpitch.cpp(319): warning #177: function "<unnamed>::FPRow::good_gaps" was declared but never referenced src/textord/cjkpitch.cpp(383): warning #177: function "<unnamed>::FPRow::is_bad" was declared but never referenced src/textord/cjkpitch.cpp(387): warning #177: function "<unnamed>::FPRow::is_unknown" was declared but never referenced Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 16:43:52 +01:00
Stefan Weil	404f9cd147	SimpleStats: Remove unused method This fixes a warning from the Intel compiler: src/textord/cjkpitch.cpp(79): warning #177: function "<unnamed>::SimpleStats::maximum" was declared but never referenced Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 16:39:46 +01:00
Stefan Weil	a9121d28f3	Merge pull request #2107 from stweil/march Add check whether compiler supports -march=native flag	2018-12-08 10:53:09 +01:00
Stefan Weil	2c044df959	Fix wrong x_fsize in hOCR output (regression) The regression was caused by the latest commit `c9e85ab78f`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-08 10:39:31 +01:00
Stefan Weil	2ccc5810f3	Add check whether compiler supports -march=native flag Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-05 20:13:28 +01:00
Stefan Weil	c9e85ab78f	Fix wrong font attributes in hOCR output Instrumented code throws this runtime error during OCR: ../../src/api/baseapi.cpp:1616:5: runtime error: load of value 128, which is not a valid value for type 'bool' ../../src/api/baseapi.cpp:1627:5: runtime error: load of value 128, which is not a valid value for type 'bool' If there is no font information (typical for Tesseract with a LSTM model), the font attributes got random values resulting in wrong hOCR output. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-04 10:52:46 +01:00
Stefan Weil	0bdae8f8bf	GENERIC_2D_ARRAY: Fix runtime error in assignment operator Instrumented code throws this runtime error during OCR: ../../src/ccstruct/matrix.h:84:11: runtime error: null pointer passed as argument 2, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-04 10:48:46 +01:00
Stefan Weil	f0a4d04187	Add config variable for selection of dot product function All also a C++ implementation with more aggressive compiler options which is optimized for the CPU where the software was built. It is now possible to select the function used for the dot product with -c dotproduct=FUNCTION where FUNCTION can be one of those values: * auto selection based on detected hardware (default) * generic C++ code with default compiler options * native C++ code optimized for build host * avx optimized code for AVX * sse optimized code for SSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-01 00:19:28 +01:00
zdenop	b527b37825	Merge pull request #2097 from stweil/namespace SIMDDetect: Use tesseract namespace and format code	2018-12-01 00:02:18 +01:00
Stefan Weil	1910b1a72b	SIMDDetect: Use tesseract namespace and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:36:39 +01:00
Stefan Weil	66d3275d0b	IntSimdMatrixSSE: Remove unused include statement and simplify code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	048eb34934	Add missing static attribute to local inline functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	b73370aac9	Remove unneeded test for nullptr IntSimdMatrix::GetFastestMultiplier never returns a nullptr. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	e2419b1968	Fix potential crash in tprintf Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	6b6d9de497	Fix potential crash in STRING class Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 23:14:11 +01:00
Stefan Weil	59fb3370bb	Use -ffast-math for calculation of dot product This reduces the code size for intsimdmatrixavx2 from 2700 to 2668 and slightly improves the performance for fast models with AVX2. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 22:52:04 +01:00
Stefan Weil	fda3ba9009	IntSimdMatrixSSE: Fix comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 22:13:32 +01:00
zdenop	07b140364f	Merge pull request #2093 from stweil/python Updates for Python scripts	2018-11-30 08:10:20 +01:00
zdenop	53600c677e	Merge pull request #2092 from stweil/format Format new ALTO code with clang-format	2018-11-30 08:08:52 +01:00
zdenop	f6493dd5e8	Merge pull request #2090 from stweil/inline Optimize performance by using inline functions	2018-11-30 08:07:45 +01:00
Stefan Weil	c59c45fb3e	Fix Amharic font list This was reported for the Python code by LGTM. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 08:00:22 +01:00
Stefan Weil	b148644c1b	Make Python script executable Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 07:08:45 +01:00
Stefan Weil	ed48b2a8f5	Format new ALTO code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-30 06:37:25 +01:00
Jake Sebright	d7cee03a94	Add support for ALTO output	2018-11-30 06:09:36 +01:00
Stefan Weil	3c047f0ac8	Optimize performance by using inline function DotProduct This improves performace for the "best" models because it avoids function calls. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-29 21:43:41 +01:00
Stefan Weil	e161501df6	Optimize performance by using inline MatrixDotVectorInternal This improves performace for the "best" models because it avoids function calls. The compiler also knows the passed values for the parameters add_bias_fwd and skip_bias_back. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-29 21:37:32 +01:00
Egor Pugin	685b136d89	Fix incorrect condition.	2018-11-29 19:02:54 +03:00
Egor Pugin	267b79982d	Merge pull request #2076 from jbarlow83/pythonize-training RFC: Pythonize tesstrain.sh and friends	2018-11-25 13:31:48 +03:00
James R. Barlow	8aa25239ae	Fix some of Codacy's complaints	2018-11-24 16:59:01 -08:00
James R. Barlow	9122e6249e	Autoreformat code This increases the deviation from the bash scripts so is done separately.	2018-11-24 00:50:29 -08:00
James R. Barlow	d9ae7ecc49	Pythonize tesstrain.sh -> tesstrain.py This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently supports only LSTM and not the Tesseract 3 training mode. I attempted to keep source changes minimal so it would be easy to compare bash to Python in code review and confirm equivalence. Python 3.6+ is required. Ubuntu 18.04 ships Python 3.6 and it is a mandatory package (the package manager is also written in Python), so it is available in the baseline Tesseract 4.0 system. There are minor output and behavioral changes, and advantages. Python's loggingis used. Temporary files are only deleted on success, so they can be inspected if training files. Console output is more terse and the log file is more verbose. And there are progress bars! (The python3-tqdm package is required.) Where tesstrain.sh would sometimes fail without explanation and return an error code of 1, it is much easier to find the point of failure in this version. That was also the main motivation for this work. Argument checking is also more comprehensive.	2018-11-24 00:45:35 -08:00
pndaza	fc8a3d5bbc	combine condition with next	2018-11-24 09:21:05 +06:30
pndaza	5c85d8e03d	add missed letters and symbols - 0x104a to 0x104f -	2018-11-24 09:14:31 +06:30
Stefan Weil	9b783822a0	Remove unused include statements for tprintf.h Format also a call of tprintf and add a missing explicit include statement. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-18 17:25:01 +01:00
Stefan Weil	a93426c9ff	Fix wrong results from function streamtofloat The local variable k should be 10 ^ (number of digits after comma), but will overflow when there are more than 9 digits after the comma because an int value cannot store 10000000000. This results in wrong double values read from .tr files for example (or in a runtime exception if Tesseract was compiled with -ftrapv). Using uint64_t does not fix the general problem but allows more digits which should be sufficient for the data read by Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-17 20:02:21 +01:00
Stefan Weil	acca4fb999	Fix some unbound variables and other small issues in training shell scripts Fix also the logging helper functions to work without log file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-16 11:13:46 +01:00
Stefan Weil	a4b03fbb27	Fix warning from shellcheck shellcheck warning: In /tesseract/src/training/tesstrain_utils.sh line 209: TIMESTAMP=`date +%Y-%m-%d` ^-- SC2006: Use $(..) instead of legacy `..`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-15 17:45:20 +01:00
John Lin	bfe58aa56f	Fix unbound variable $FONTS	2018-11-15 17:43:15 +01:00
Stefan Weil	0915cbd535	Simplify shell script using mktemp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-11-15 13:36:52 +01:00
John Lin	edb76e281a	Simplify MKTEMP_DT logic	2018-11-15 10:38:40 +08:00
John Lin	dbfc89f9af	Fix mktemp in tesstrain_utils.sh The commit `10f2c45c00` unified the usage of mktemp, but with a incorrect bash syntax and unnecessary definition of LANG_CODE and TIMESTAMP. This patch fixes the above problems.	2018-11-14 09:04:34 +08:00
Ray Smith	ce88adbf32	fix issue #1192	2018-11-12 12:53:12 +01:00
zdenop	724957167e	fix typo in non VS build	2018-11-08 23:10:14 +01:00
zdenop	eb104f9fe4	VS build: fix warning C4996: The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name.	2018-11-08 22:55:04 +01:00
zdenop	cbef2ebe12	implement patches vcpkg tesseract	2018-11-08 21:37:47 +01:00
zdenop	7a7f226228	ocrclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de> # Conflicts: # src/ccutil/ocrclass.h	2018-11-08 20:23:36 +01:00
Zdenko Podobný	2dd753ee4c	replace VS implementation of gettimeofday with std::chrono::steady_clock::now(); fixes #2038	2018-11-08 19:43:46 +01:00
chrismamo1	439dfaaf8b	un-fix one of the warnings	2018-10-30 18:10:48 -06:00
chrismamo1	30be5aaaac	fix a couple minor compiler warnings	2018-10-30 18:00:32 -06:00
Stefan Weil	6f8bd340d9	Remove chopper.h It is no longer needed after some reordering of code in chopper.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-29 19:51:44 +01:00
Stefan Weil	286dfb031a	Remove unused include statements Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-29 19:46:58 +01:00
Stefan Weil	2098bb6daf	Remove unused function ComputeOrientation Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-29 19:43:56 +01:00
Stefan Weil	cad6ebb5ff	LIST: Remove old comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-29 19:43:56 +01:00
zdenop	99054f10c7	Merge pull request #2027 from stweil/warn Fix compiler warning	2018-10-24 07:31:15 +02:00
Stefan Weil	eefb8348f7	Fix compiler warning Compiler warning on macOS: tesscallback.h:29:7: warning: 'TessClosure' has no out-of-line virtual method definitions; its vtable will be emitted in every translation unit [-Wweak-vtables] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-23 17:01:53 +02:00
Noah Metzger	f7f5f41073	Fixed a mac compiler warning in recodebeam.cpp Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-10-23 16:57:39 +02:00
zdenop	e60318f9c0	set PANGOCAIRO_BACKEND=fc to avoid crash; fixes #736	2018-10-23 13:22:38 +02:00
Zdenko Podobný	3d508a65a7	set unlv_tilde_crunching to false; fixes #1449 #948	2018-10-23 09:26:32 +02:00
Stefan Weil	7ebbb7370a	ColPartition: Fix CID 1164543 (Division or modulo by float zero) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 22:14:15 +02:00
Stefan Weil	eaabe4a3ce	ErrorCounter: Fix CID 1164538 (Division or modulo by float zero) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 22:14:15 +02:00
Stefan Weil	8f615d44f1	osdetect: Fix CID 1164539 (Division or modulo by float zero) Avoid also a conversion from int16_t to double to float. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 22:14:15 +02:00
Stefan Weil	be0cf03778	tesseractmain: Fix memory leak Commit `49d7df6dc3` introduced a memory leak when the output file could not be created. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 18:50:47 +02:00
Stefan Weil	9c0799314e	Add parenthesis in boolean expression This fixes a compiler warning: scanutils.cpp:444:32: warning: '&&' within '\|\|' [-Wlogical-op-parentheses] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 17:48:17 +02:00
Stefan Weil	0f973e1d62	Add missing 'static' keyword This fixes a compiler warning: globaloc.cpp:33:6: warning: no previous extern declaration for non-static variable 'global_crash_pixes' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 17:48:17 +02:00
Stefan Weil	a71ad455be	Remove unused macros This fixes some compiler warnings: mainblk.cpp:28:9: warning: macro is not used [-Wunused-macros] mainblk.cpp:29:9: warning: macro is not used [-Wunused-macros] [...] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 17:48:17 +02:00
zdenop	dba7f456d5	Merge pull request #2018 from stweil/sort Get sorted list of available languages	2018-10-22 16:06:42 +02:00
Matthias Geerdsen	eac2880c24	avoid unbound variable TESSDATA_PREFIX set TESSDATA_PREFIX as empty, if not defined in environment to avoid an unbound variable	2018-10-22 14:28:14 +02:00
Stefan Weil	d75ef80f12	Get sorted list of available languages TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages without sorting, so the result was random and not user friendly. Now `tesseract --list-langs` shows the available languages and scripts in alphabetic order. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-22 14:07:03 +02:00
Matthias Geerdsen	95d9c8c57a	set default values for unset variables setting default values for posibly unset variables avoids unbount variabe errors	2018-10-21 21:30:52 +02:00
Matthias Geerdsen	7b32e64564	add shebang	2018-10-21 21:30:13 +02:00
zdenop	32c1e4f433	FLAGS_webtext_prefix: unbound variable; issue #2005	2018-10-21 14:00:06 +02:00
Stefan Weil	34a89e54db	Fix function ScrollViewCommand The format string which builds the command only takes one or two string arguments, so the function allocated too much memory and passed too many arguments to snprintf. This also fixes a compiler warning (clang). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-21 08:13:16 +02:00
zdenop	4d3b0bc798	use <cstdio> instead of <stdio.h>	2018-10-20 21:46:40 +02:00
zdenop	8103d17c72	use _strdup instead of strdup in MSVC	2018-10-20 21:43:38 +02:00
zdenop	a033261f63	add info about used backend in text2image	2018-10-20 21:41:09 +02:00
Stefan Weil	e232114089	Fix use of undefined macro USE_DEVICE_SELECTION This fixes compiler warnings. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-20 13:58:12 +02:00
Zdenko Podobný	486940687c	Exit training script if run command failed; fixes #2005	2018-10-20 13:00:39 +02:00
Egor Pugin	5a4288f2fc	Merge pull request #2011 from stweil/fix Small fix and optimization	2018-10-20 13:48:51 +03:00
Zdenko Podobný	1a523006a6	install training script with autotools.	2018-10-20 12:33:07 +02:00
Stefan Weil	b0ace0e850	ScrollView: Optimize local table_colors It is constant, and the values are in the range 0...255, so its size can be reduced. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-20 12:05:38 +02:00
Stefan Weil	d364750cb3	Remove type cast and fix compiler warning (-Wcast-qual) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-20 12:04:46 +02:00
Zdenko Podobný	1b2bda65e0	Revert "prefer to use FreeType for pango_cairo_font_map" This reverts commit `345e5ee1f3`.	2018-10-20 11:30:07 +02:00
Zdenko Podobný	276c6845ae	Revert "free PangoFontMap; fixes #1999 " This reverts commit `d1d73b9888`.	2018-10-20 11:28:20 +02:00
Zdenko Podobný	a03f23e05e	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2018-10-20 11:26:23 +02:00
Marco Atzeri	ebbd4e3efc	fixes #426 ; define NOUNDEFINED for cygwin	2018-10-20 11:25:28 +02:00
Stefan Weil	b40151c200	training: Don't hide global variables This fixes two warnings from LGTM: Parameter feature_defs hides a global variable with the same name. Parameter Config hides a global variable with the same name. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-19 22:37:37 +02:00
Stefan Weil	bb181ec8d3	Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-19 10:50:38 +02:00
Stefan Weil	df7d1e1f97	Rename API function for getting LSTM choices Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-19 10:50:38 +02:00
Stefan Weil	830b9c715a	BLOBNBOX: Declare signed bit field This fixes a warning from LGTM: Bit field area of type int should have explicitly unsigned integral, explicitly signed integral, or enumeration type. Maybe area should be unsigned, but that would require lots of other changes, so for now signedness is not changed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-19 10:30:05 +02:00
Stefan Weil	d9c472b988	cluster: Fix some potential overflows This fixes several issues reported by LGTM: Multiplication result may overflow 'int' before it is converted to 'size_type'. Multiplication result may overflow 'float' before it is converted to 'double'. Multiplication result may overflow 'int' before it is converted to 'unsigned long'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-19 10:23:17 +02:00
Zdenko Podobný	d1d73b9888	free PangoFontMap; fixes #1999	2018-10-19 00:48:20 +02:00
zdenop	bbe7a4cc10	Merge pull request #2002 from stweil/err Show error message when output file could not be created	2018-10-18 19:27:01 +02:00
Stefan Weil	49d7df6dc3	tesseractmain: Show error message when output file could not be created Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 19:22:49 +02:00
Stefan Weil	b0b8dfbc81	TessResultRenderer: Extend API to access status of renderer Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 19:22:48 +02:00
Stefan Weil	f0c9b753c6	BlamerBundle: Add declaration for copy assignment operator It does not need an implementation as it is currently not used. This fixes a warning from LGTM: No matching copy assignment operator in class BlamerBundle. It is good practice to match a copy constructor with a copy assignment operator. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:36:32 +02:00
Stefan Weil	e3658bbc78	C_OUTLINE_FRAG: Add declaration for copy constructor It does not need an implementation as it is currently not used. This fixes a warning from LGTM: No matching copy constructor in class C_OUTLINE_FRAG. It is good practice to match a copy assignment operator with a copy constructor. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:31:45 +02:00
Stefan Weil	5585ed8d85	ROW: Add declaration for copy constructor It does not need an implementation as it is currently not used. This fixes a warning from LGTM: No matching copy constructor in class ROW. It is good practice to match a copy assignment operator with a copy constructor. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:31:10 +02:00
Stefan Weil	a1f0c66be1	BLOB_CHOICE: Add copy assignment operator This fixes a warning from LGTM: No matching copy assignment operator in class BLOB_CHOICE. It is good practice to match a copy constructor with a copy assignment operator. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:29:07 +02:00
Stefan Weil	7100a14636	ParamsTrainingHypothesis: Add copy assignment operator This fixes a warning from LGTM: No matching copy assignment operator in class ParamsTrainingHypothesis. It is good practice to match a copy constructor with a copy assignment operator. Use also a simpler expression for the size of features. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:28:12 +02:00
Stefan Weil	0bbd5c5d1c	LineHypothesis: Add copy assignment operator This fixes a warning from LGTM: No matching copy assignment operator in class LineHypothesis. It is good practice to match a copy constructor with a copy assignment operator. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-18 15:23:28 +02:00
Noah Metzger	c13371d6e0	Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-10-17 16:43:39 +02:00
zdenop	e93e8f063f	Merge pull request #1994 from stweil/lgtm Fix several warnings from LGTM	2018-10-16 18:18:43 +02:00
Stefan Weil	4b800ccaa7	Fix sum computation in higher precision This also fixes two warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 18:01:27 +02:00
Stefan Weil	fd84f7b666	LLSQ: Replace sqrt by std::sqrt This should fix warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 17:57:26 +02:00
Stefan Weil	7c2af45713	Fix sum computation in higher precision This also fixes two warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. Replace also FALSE / TRUE by false / true for bool return value. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 17:50:12 +02:00
Stefan Weil	1730b8ccbe	classify/cluster: Replace Emalloc by std::vector This should fix a warning from LGTM: Multiplication result may overflow 'int' before it is converted to 'unsigned long'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 17:14:51 +02:00
Stefan Weil	5fb461a563	SVNetwork: Handle failed socket call (CID 1164597) This fixes a warning from Coverity Scan. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:24 +02:00
Stefan Weil	2d2b269e02	OpenclDevice: Catch negative index (CID 1395110) This fixes a warning from CoverityScan. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:24 +02:00
Stefan Weil	146d2caa9d	Classify: Fix new resource leak (CID 1396163) This fixes a warnings from Coverity Scan. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:23 +02:00
Stefan Weil	edbd07a5f9	lstmtraining: Handle failed remove syscall (CID 1396166) This fixes a warning from Coverity Scan. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:23 +02:00
Stefan Weil	32e1e4b6b4	TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396172) This fixes a warning from Coverity Scan Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:23 +02:00
Stefan Weil	d89ec15571	Revert "Fix CID 1396172 (Uninitialized members)" This reverts commit `cbd09de7fe`. The variable can be removed as it is not used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-16 16:53:23 +02:00
Zdenko Podobný	cbd09de7fe	Fix CID 1396172 (Uninitialized members)	2018-10-16 12:24:10 +02:00
Stefan Weil	d0d73da65a	commontraining: Fix two comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-15 11:15:49 +02:00
Zdenko Podobný	10f2c45c00	fix "mkdir -dt" for bds, mac and cygwin	2018-10-14 18:08:50 +02:00
zdenop	524c23de53	Merge pull request #1987 from tfmorris/1986_errno_include Add missing cerrno includes - fixes #1986	2018-10-13 22:06:00 +02:00
Tom Morris	14af3f720b	Add missing cerrno includes - fixes #1986	2018-10-13 16:02:48 -04:00
zdenop	83f80054f6	Merge pull request #1985 from stweil/win32 win32: Show TIFF errors on console	2018-10-13 20:51:26 +02:00
Stefan Weil	6ffb53f815	win32: Show TIFF errors on console Showing them in a window (default) is not acceptable for a console application like Tesseract which must be able to work in batch mode. Such error messages can be triggered by TIFF files which include vendor specific tags. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-13 20:42:14 +02:00
zdenop	4734317499	fixes #408 - text2image: comma in font name	2018-10-13 15:23:40 +02:00
zdenop	5f4f9372e9	revert debug message commited by mistake	2018-10-13 11:20:25 +02:00

... 6 7 8 9 10 ...

1165 Commits