tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-05 10:49:01 +08:00

Author	SHA1	Message	Date
Stefan Weil	f88a7f28e3	fontinfo: Fix wrong delete Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:16:04 +02:00
Stefan Weil	3dfe1b8807	classify: Modernize function UniformDensity This should fix an issue reported by Codacy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:13:45 +02:00
Stefan Weil	72c874140e	Modernize code by replacing C type casts This was done using clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 09:04:51 +02:00
zdenop	95a15a7a82	fix cmake&clang build	2019-04-06 15:31:53 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Robert Schubert	25a42ea42f	fixed failure report for tesstrain commands: - with `set -e` in effect, looking at stdout to detect failure is too late	2019-04-06 08:13:03 +02:00
Robert Schubert	d5584e793e	fixed failure report for tesstrain commands: - with `set -e` in effect, it does not make sense to query `$?` indirectly	2019-04-06 08:13:03 +02:00
zdenop	be617b3722	Merge pull request #2361 from Shreeshrii/truth Change message display for debug_level -1 during lstmtraining	2019-04-05 10:52:21 +02:00
zdenop	2982cb4ff3	Merge pull request #2368 from amitdo/no-legacy-fix disable-legacy build: Do not include unused headers	2019-04-05 09:35:04 +02:00
Stefan Weil	d35a6f2de5	Modernize code (clang-tidy check modernize-deprecated-headers) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
amitdo	fab9a54981	Remove unneeded 'SUBDIRS=' from 3 Makefile.am files	2019-04-04 19:31:39 +02:00
Shree	6673347986	Change page to line in message	2019-04-04 15:43:29 +00:00
Shree	51c3535310	Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1	2019-04-04 15:33:22 +00:00
Shree	84d4cc2e95	Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1	2019-04-04 15:33:22 +00:00
Amit D	2069c057d6	Merge branch 'master' into no-legacy-fix	2019-04-04 18:26:22 +03:00
Egor Pugin	2a1d238bd5	Merge pull request #2366 from stweil/modernize Modernize code with "using"	2019-04-04 15:13:10 +03:00
amitdo	546014aecd	disable-legacy build: Do not include unused headers	2019-04-04 15:09:08 +03:00
Stefan Weil	98346c2cd4	Modernize and format code The code was modernized using clang-tidy with "modernize-use-using". The modified files were then formatted using clang-tidy with "google-readability-braces-around-statements", then clang-format was applied. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-03 21:02:23 +02:00
Shreeshrii	613c2bf6e4	Change pages to lines in message The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.	2019-04-03 10:41:14 +05:30
Egor Pugin	af7cc1ce4c	Fix windows build.	2019-04-01 22:38:01 +03:00
Stefan Weil	81fbd878dd	Add more missing include statements for Windows build Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-01 08:10:25 +02:00
Stefan Weil	ab009fae94	Remove macro WINDLLNAME It is now no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:05:41 +02:00
Stefan Weil	77a5f2623e	Remove unused config variable tessedit_module_name It was only defined for Windows builds. Use also false instead of 0 to set the default value of two boolean config variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:04:00 +02:00
Stefan Weil	c150b9832d	Add missing include statements for Windows build The last commits which removed BOOL8 had broken the Windows build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 19:02:29 +02:00
Stefan Weil	802f42e821	Remove BOOL8, TRUE, FALSE from host.h Remove unneeded include statements for host.h, add required ones and update the comments for the remaining include statements. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:27:20 +02:00
Stefan Weil	be96b7b660	bits16: Format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:26:50 +02:00
Stefan Weil	146079f31d	api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:15:53 +02:00
Stefan Weil	4e0c726d6c	ccutil: replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:47 +02:00
Stefan Weil	da0c14ae45	cutil: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:19 +02:00
Stefan Weil	87a973652c	classify: Replace BOOL8, TRUE, FALSE by bool, true, false Simplify also some related code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:48 +02:00
Stefan Weil	30ee3afc29	textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:20 +02:00
Stefan Weil	b391ab84d0	wordrec: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:54:21 +02:00
Stefan Weil	cbb5e729a1	classify: Use bool and replace TRUE, FALSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:50 +02:00
Stefan Weil	46fa59aadc	ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:06 +02:00
Stefan Weil	92b9f9f8de	ccmain: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:52:09 +02:00
Stefan Weil	7db25e15c0	Remove unused config variable tessedit_single_match Replace also TRUE, FALSE by true, false. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:38:35 +02:00
Stefan Weil	ca2947a2c0	blobclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:36:46 +02:00
Stefan Weil	f2bd98e656	PageIterator: Remove useless const Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:35:43 +02:00
Stefan Weil	813b7803e0	pgedit: Replace BOOL8 by bool Replace also TRUE, FALSE by true, false and add some static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:29:15 +02:00
Stefan Weil	664811a869	Replace BOOL8, TRUE, FALSE by bool, true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:28:28 +02:00
Stefan Weil	51a2c2eae8	Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:24:02 +02:00
Stefan Weil	95ea778745	capi: Replace FALSE, TRUE and simplify and format code Format code using clang-format and clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:19:04 +02:00
Stefan Weil	89ba48b106	strngs: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:13:38 +02:00
Stefan Weil	127d0e31f0	serialis: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:12:11 +02:00
Stefan Weil	8b663e7620	helpers: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:06:19 +02:00
zdenop	3bb8f9cd49	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-03-31 16:54:15 +02:00
zdenop	5f06402755	python: optimize imports, reformat code	2019-03-31 16:53:39 +02:00
zdenop	2e9fd69c9e	use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"	2019-03-31 16:53:33 +02:00
zdenop	a0527b41bd	fix LGTM reports for python	2019-03-31 16:53:25 +02:00
Stefan Weil	1948f0d520	ocrclass: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:39:44 +02:00
Stefan Weil	85957e9673	WERD: Don't print space character after "FALSE" at end of line Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:32:42 +02:00
Stefan Weil	83d4433d3b	Modernize and format unichar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:30:15 +02:00
Stefan Weil	ac0b191f6b	Modernize and format genericvector.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:21:32 +02:00
Stefan Weil	36ed08636b	Modernize and format tesscallback.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:16:00 +02:00
zdenop	f47c7c92dd	fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer; CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142	2019-03-31 12:26:49 +02:00
Shreeshrii	ea36e94e58	fix Could not parse bool from flag (#2359 )	2019-03-29 14:50:21 +01:00
Stefan Weil	852598eecf	Remove file tessedit.h It only declared the unused global variable global_monitor which is now removed, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	6e59abcce2	Remove file cutil.h It only contained three type definitions which fit better in other include files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	b6bfb20f1d	Improve readability of conditional code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	36a1a30c22	Remove some old type casts Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	a44bf41f14	Modernize C++ loops The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-loop-convert' -fix Then the resulting code was cleaned manually. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 08:38:21 +01:00
Stefan Weil	ed011670c8	Modernize C++ code using bool literals The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-bool-literals' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:58:02 +01:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	36f768853a	Modernize C++ code using override The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-override' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:37:52 +01:00
Stefan Weil	f877640bc9	Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval tesstrain: check failure of subjobs	2019-03-25 16:10:09 +01:00
Stefan Weil	d8d2f6f48a	Fix broken shell scripts for training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 15:32:43 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ecaad2aca8	ccstruct/werd: Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 07:57:34 +01:00
Stefan Weil	b1e305f38c	Simplify code which tests for non-empty StringParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:35:52 +01:00
Stefan Weil	f9860cda41	Optimize functions ResetFrom The loop can terminate as soon as the parameter name was found. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:21:23 +01:00
Stefan Weil	41da5afe9d	UNICHARSET: Fix compiler warning (signed/unsigned mismatch) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:18:21 +01:00
Stefan Weil	91e2b253c0	Format modified code with clang-format Format the files which were changed in commit `297d7d86ce`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:10:29 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	58423d2f6c	Merge pull request #2328 from bertsky/lstm-with-user-patterns2 Add user words / patterns again	2019-03-24 19:38:40 +01:00
zdenop	0d36d9a9d7	Merge pull request #2341 from Shreeshrii/fix Fix	2019-03-24 18:21:09 +01:00
Stefan Weil	da6305b632	Fix compiler warnings caused by ASSERT_HOST The modified definition avoids warnings caused by redundant semicolons. Now a semicolon is required when using the macro, so a few code locations had to be updated. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:47:04 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	f4f34a87db	WERD_RES: Fix uninitialized member variable Credit to OSS-Fuzz which reported this issue: pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool' #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7 #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3 #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 14:59:08 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Shreeshrii	8749f3553e	LINEDATA=false	2019-03-23 19:16:49 +05:30
Shree	bcb7cf9846	sort arguments, use true/false instead of 1/0	2019-03-23 12:28:53 +00:00
Shree	c2db272134	Modify distort_image for Boolean	2019-03-22 17:02:46 +00:00
Shree	259d5af6b1	Add PSM values to the definition	2019-03-22 15:29:02 +00:00
Shree	8eafec0d17	Fix comments with current values of PSM codes	2019-03-22 14:10:49 +00:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Shree	9b915d5efb	add --distort_image	2019-03-22 05:39:38 +00:00
Shree	f7ffde99d5	add --distort_image	2019-03-22 05:34:00 +00:00
zdenop	ac7ea4322a	Merge pull request #2335 from Shreeshrii/master Changes to tesstrain.py - max_workers=8, distort_image=false	2019-03-17 15:27:34 +01:00
zdenop	26877ba703	check min. python version; os.uname is not available on windows	2019-03-17 15:25:48 +01:00
Shreeshrii	f8e8521606	Update tesstrain_utils.py	2019-03-17 15:32:35 +05:30
Shree	6fa8e1bb15	Set max_workers=8	2019-03-17 09:58:11 +00:00
Shree	e21499e81e	Set default value for distort_image	2019-03-17 09:54:16 +00:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Shree	d47b0d588a	Use LATIN_FONTS for kmr	2019-03-15 15:47:56 +00:00
Shree	3eee1d217a	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 15:37:49 +00:00
Robert Schubert	297d7d86ce	trying to add user words/patterns again: - pass in ParamsVectors from Tesseract (carrying values from langdata/config/api) into LSTMRecognizer::Load and LoadDictionary - after LSTMRecognizer's Dict is initialised (with default values), reset the variables user_{words,patterns}_{suffix,file} from the corresponding entries in the passed vector	2019-03-15 16:06:19 +01:00
Shree	b2ebf0195f	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 14:39:39 +00:00
Shree	37befdf6c4	Add option for --distort_image	2019-03-15 13:32:36 +00:00
zdenop	0a36b38169	Merge pull request #2317 from eighttails/master Added missing linker flags for MinGW.	2019-03-15 08:01:21 +01:00
Robert Schubert	14346e56b0	tesstrain: catch+handle SIGINT (to stop waiting on subjobs)	2019-03-15 00:03:16 +01:00
Robert Schubert	6cbad17e30	tesstrain: check all subjobs' retval	2019-03-14 14:38:51 +01:00
Robert Schubert	5316bcbb94	tesstrain: check failure of subjobs	2019-03-14 11:42:01 +01:00
Stefan Weil	4c2bbebecc	Fix compiler warning (-Wunused-value) Warning from clang++: ..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:56:03 +01:00
Stefan Weil	ed84ba0a44	Fix wrong comparison symbol_steps is a vector, so testing for a nullptr was wrong. clang++ reports: ..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare] if (&word_res_->symbol_steps == nullptr \|\| !LSTM_mode_) return nullptr; ~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:38:38 +01:00
Tadahito Yao	bbbd262a8d	Added missing linker flags for MinGW.	2019-03-13 22:10:36 +09:00
jm server2	1206362d30	`accumulated_timesteps` is not a pointer but a vector and in case we use ChoiceIterator without `lstm_choice_mode` tesseract crashes (or similar) because the check is true and we reference not existing item	2019-03-13 12:55:14 +01:00
Stefan Weil	3baf0d8076	Fix boolean assignments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 15:34:24 +01:00
Stefan Weil	8ad0489f0f	Remove svpaint.cpp from libtesseract svpaint is a standalone application (it includes a main function) and should not be part of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 12:22:53 +01:00
zdenop	7546a01020	Merge pull request #2310 from noahmetzger/LSTMChoiceRIL Lstm choice ril	2019-03-12 10:46:11 +01:00
Stefan Weil	35a999f91a	Fix assertion caused by wrong unicharset Credit to OSS-Fuzz: it found another case which triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 This is the OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:31:21 +01:00
Stefan Weil	56a39bda77	Fix float division by zero That runtime error is normally not visible because it does not abort the program, but is detected when the code was compiled with sanitizers. It can be triggered with this OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:28:16 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
zdenop	f80085c0bf	Merge pull request #2289 from Armyke/master Added an additional optional --tmp_dir parameter to specify the tempo…	2019-03-06 15:03:14 +01:00
Stefan Weil	1c7e00611b	Add initial support for traineddata files in standard archive formats This requires libarchive-dev. Tesseract can now load traineddata files in any of the archive formats which are supported by libarchive. Example of a zipped BagIt archive: $ unzip -l /usr/local/share/tessdata/zip.traineddata Archive: /usr/local/share/tessdata/zip.traineddata Length Date Time Name --------- ---------- ----- ---- 55 2019-03-05 15:27 bagit.txt 0 2019-03-05 15:25 data/ 1557 2019-03-05 15:28 manifest-sha256.txt 1082890 2019-03-05 15:25 data/eng.word-dawg 1487588 2019-03-05 15:25 data/eng.lstm 7477 2019-03-05 15:25 data/eng.unicharset 63346 2019-03-05 15:25 data/eng.shapetable 976552 2019-03-05 15:25 data/eng.inttemp 13408 2019-03-05 15:25 data/eng.normproto 4322 2019-03-05 15:25 data/eng.punc-dawg 4738 2019-03-05 15:25 data/eng.lstm-number-dawg 1410 2019-03-05 15:25 data/eng.freq-dawg 844 2019-03-05 15:25 data/eng.pffmtable 6360 2019-03-05 15:25 data/eng.lstm-unicharset 1012 2019-03-05 15:25 data/eng.lstm-recoder 1047 2019-03-05 15:25 data/eng.unicharambigs 4322 2019-03-05 15:25 data/eng.lstm-punc-dawg 16109842 2019-03-05 15:25 data/eng.bigram-dawg 80 2019-03-05 15:25 data/eng.version 6426 2019-03-05 15:25 data/eng.number-dawg 3694794 2019-03-05 15:25 data/eng.lstm-word-dawg --------- ------- 23468070 21 files `combine_tessdata -d` and `combine_tessdata -u` also work. The traineddata files in the new format can be generated with standard tools like zip or tar. More work is needed for other training tools and big endian support. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-05 17:18:48 +01:00
Armyke	56b04d4ea7	Added the same --tmp_dir flag to tesstrain_utils.sh	2019-03-04 14:05:25 +00:00
Armyke	25fa392887	Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive	2019-03-04 13:26:53 +00:00
Stefan Weil	7fbde96a04	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:26:07 +01:00
Stefan Weil	38fac625cd	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:01:48 +01:00
Shree	a0202bac70	Rename function to TessBaseAPIGetTsvText to be consistent to the Create method	2019-03-02 16:29:53 +00:00
zdenop	5de2a21b3f	Merge pull request #2283 from Shreeshrii/lstmbox Add missing renderers to C-API	2019-03-02 15:15:34 +01:00
Stefan Weil	9c90894ff0	PAGE_RES_IT: Optimize compare operators by using inline code Avoiding a function call will make both == and != operator faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:57:16 +01:00
Stefan Weil	295996ed05	commandlineflags: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:21:04 +01:00
Stefan Weil	eb14726aac	ICOORD: Fix old type casts This fixes compiler warnings and avoids unnecessary conversions between float and double. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	fb0f1bcf66	BoxChar: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	0e1a1fc3cf	Validator: Fix compiler warnings (signed/unsigned) This also fixes a regression in validate_grapheme_test introduced by commit `32e9d7c8f5`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 13:05:03 +01:00
Shree	c7e8131efc	Add TSV option to C-API	2019-03-02 09:50:54 +00:00
Shree	22c099348b	rename LSTMBOX to LSTMBox	2019-03-02 09:11:47 +00:00
zdenop	2ba8e0061a	Merge branch 'master' into mya	2019-03-01 18:37:24 +01:00
Shree	c33f03e33e	Add lstmboxand wordstrbox to capi.h	2019-03-01 17:16:59 +00:00
Shree	76ec21df3d	Add lstmbox and wordstrbox to C-API	2019-03-01 16:40:41 +00:00
zdenop	646b043d2c	use space instead of tab	2019-03-01 14:36:09 +01:00
Shree	5ee1deaea2	correct handling of 0BF0-0BFA Tamil numbers and symbols	2019-03-01 13:21:49 +00:00
zdenop	d7ddc4c5b7	Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER Treat U_ARABIC_NUMBER as LTR	2019-02-28 09:27:54 +01:00
zdenop	12c1225a5f	Merge pull request #2271 from stweil/refactor Refactor class Network	2019-02-27 07:43:13 +01:00
Michal Čihař	14c4494f42	Allow UTF-8 variant of C locale It behaves same in scanf, but it allows proper handling of unicode chars.	2019-02-26 21:37:33 +01:00
Stefan Weil	98dd3b6351	Refactor class Network That class is an abstract class with several pure virtual functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-26 16:55:31 +01:00
Shree	25b02bf1f2	Treat U_ARABIC_NUMBER as LTR	2019-02-26 09:51:21 +00:00
Shreeshrii	2f71fe280c	Use alternative way to comment a block of code (using the c preprocessor). https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382 Thanks @amitdo	2019-02-26 15:05:51 +05:30
Shree	449f1cd4ba	Remove test for Word started with a combiner	2019-02-25 18:47:42 +00:00
zdenop	25c43b1e7c	Merge branch 'master' into distort	2019-02-23 18:23:14 +01:00
Stefan Weil	b3e355a682	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-23 17:49:56 +01:00
Shreeshrii	34e4d6b1d7	Revert to 0 (50% percents of images inverted).	2019-02-23 17:59:00 +05:30
Shreeshrii	287d5341bf	TODO	2019-02-23 17:56:02 +05:30
Shreeshrii	3e3e1ed55d	Remove commented Code	2019-02-23 17:54:00 +05:30
zdenop	c02f5e99fc	Merge pull request #2259 from Shreeshrii/distort implement PrepareDistortedPix as part of DegradeImage	2019-02-22 21:06:29 +01:00
Shree	2aded47a3c	Implement distort_image in text2image - default false	2019-02-22 12:27:27 +00:00
Shree	49ed3a72d4	implement PrepareDistortedPix as part of DegradeImage	2019-02-21 14:48:29 +00:00
zdenop	e250f3422d	Merge pull request #2258 from stweil/doc Fix doxygen comments	2019-02-21 07:41:22 +01:00
Stefan Weil	2cbe723d03	Fix doxygen comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 21:11:38 +01:00
Stefan Weil	ef4d5b2e69	Optimize calculation of dot product for double vectors with AVX This improves the performance with best models and should also make training faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-20 17:45:38 +01:00
Stefan Weil	b3bd23edb7	Remove whitespace at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-19 13:53:31 +01:00
Stefan Weil	b95598a0b1	Merge pull request #2070 from pndaza/master add missed letters ( ၌ ၍ ၎ ၏ ) and symbols ( ၊ ။ ) - 0x104a to 0x104f -	2019-02-19 12:22:53 +01:00
Stefan Weil	38861be639	Use __builtin_trap instead of null pointer dereference to abort This fixes a warning from Apple's clang compiler: [ 34%] Building CXX object CMakeFiles/libtesseract.dir/src/ccutil/errcode.cpp.o /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference] reinterpret_cast<int>(0) = 0; ^~~~~~~~~~~~~~~~~~~~~~~~~~ /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: note: consider using __builtin_trap() or qualifying pointer with 'volatile' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-18 10:49:51 +01:00
Stefan Weil	ddea230b1b	Don't compute function tables at compile time with clang The current code fails to compile with clang compilers on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-17 08:38:42 +01:00
zdenop	15f2a4b2c1	Merge pull request #2231 from Shreeshrii/wordstr Add renderer to create WordStr box files from images	2019-02-16 13:48:06 +01:00
Stefan Weil	862322c18c	Fix check for images which are too small to scale Images with width == min_width are not too small. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-15 13:53:11 +01:00
Shree	a044f64375	fix Myanmar validation rules as per Unicode charts	2019-02-15 04:40:55 +00:00
Stefan Weil	c0523ee5a2	Fix compiler warning g++ warning: src/lstm/functions.h:152:35: warning: unused parameter ‘x’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	3556152412	Compute function tables at compile time This requires C++ 14. Older compilers still use the old code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-14 10:29:39 +01:00
Stefan Weil	f491eb6188	Simplify tanh and logistic functions and precompute function tables Both functions are called very often, so computing the table values at program start should be faster than computing them on demand. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-12 12:04:08 +01:00
Shree Devi Kumar	f3362a4b5b	Add renderer to create WordStr box files from images	2019-02-10 19:59:17 +00:00
zdenop	2ae65b2493	Merge pull request #2216 from Shreeshrii/lstmbox Lstmbox	2019-02-10 13:53:41 +01:00
Shree Devi Kumar	311053681c	put common code in AddBoxToLSTM	2019-02-10 09:16:45 +00:00
zdenop	e51f1885e6	Merge pull request #2229 from stweil/warn Fix some compiler warnings	2019-02-10 08:20:23 +01:00
Shree Devi Kumar	b51c1bf05a	change to const char* as suggested by @stweil	2019-02-10 05:13:18 +00:00
Stefan Weil	0c9f7db536	Fix compiler warning (-Wimplicit-fallthrough) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:53:44 +01:00
Stefan Weil	d91c316ab1	FontInfo: Make sure that deleted member variables can no longer be used Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	877e62db55	Fix compiler warning (-Wmaybe-uninitialized) gcc warning: src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized] It's a false positive, but setting the variable to 0 satisfies the compiler. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	33f6dc2a67	Fix compiler warnings (-Wformat-truncation=) gcc warnings: src/viewer/scrollview.cpp:404:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] src/viewer/scrollview.cpp:572:31: warning: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	2a355ea103	Fix compiler warnings (-Wimplicit-fallthrough) gcc warnings: src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=] src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=] [...] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Stefan Weil	aa2dcca295	Fix compiler warnings (-Wstringop-truncation) gcc warnings: src/api/tesseractmain.cpp:252:14: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 255 equals destination size [-Wstringop-truncation] src/ccutil/unicharset.h:66:12: warning: ‘char* strncpy(char, const char, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation] src/ccutil/unicharset.cpp:806:12: warning: ‘char* strncpy(char, const char, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:09 +01:00
Stefan Weil	d42413dd17	OpenCL: Remove PERF_COUNT framework It was rarely used, but added a lot of code and an unconditional dependency on openclwrapper.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 10:58:15 +01:00
Shree Devi Kumar	0f42fd8c69	change to use bbox coordinates for TEXTLINE for all characters (cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)	2019-02-05 14:03:29 +00:00
Shree Devi Kumar	9c89cd51cf	Add a new renderer to create box files from images for LSTM training (cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a) fix typo (cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd) Add lstmboxrenderer to CMakeLists (cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e) fix formatting (cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)	2019-02-05 14:03:29 +00:00
Shreeshrii	c28a68115e	Merge branch 'master' into boxtiff	2019-02-02 23:42:39 +05:30
Shree Devi Kumar	d9590f8adf	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:35:45 +00:00
Shree Devi Kumar	323361b902	allow user specified box/tiff pairs with tesstrain.sh	2019-02-02 11:33:32 +00:00
Shree Devi Kumar	ad223296af	use --xsize instead of --x_size (cherry picked from commit 94b8988b8cca3812137933db00750bd6e2e84e32)	2019-02-02 11:08:34 +00:00
Mikhail Akopov	7be04342cf	Fix typo	2019-02-01 09:58:44 +01:00
Stefan Weil	b49806766e	Fix AVX2 support for Windows builds with MSC It was never detected, so the existing code for AVX2 was compiled but never used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-30 11:40:17 +01:00
Shree Devi Kumar	4d9bc11fd3	add --xsize as parameter for tesstrain	2019-01-27 07:00:25 +00:00
zdenop	12c1abcb6b	Merge pull request #2189 from stweil/fix Fix memory leak for PNG images	2019-01-24 07:59:55 +01:00
zdenop	059c50be8c	Merge pull request #2184 from stweil/tests Fix and enable stringrenderer_test	2019-01-24 07:59:07 +01:00
Stefan Weil	9e6e3a0232	Fix memory leak for PNG images Commit `5fe1390748` used an implementation which created a new Pix object. That object was never destroyed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 20:05:10 +01:00
Diego de la Hera	1a398a5b5d	removed reference to unbound variable	2019-01-23 15:04:16 -03:00
Stefan Weil	ecf73f5bc7	training: Don't terminate after processing 8 fonts or 8 images tesstrain_utils.sh sets the shell flag -e, so it exits immediately if a command exits with a non-zero status. The following command returns a non-zero status as soon as counter is a multiple of par_factor (par_factor=8, that means as soon as 8 fonts or images are processed): let rem=counter%par_factor The new code fixes this undesired exit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 17:26:40 +01:00
Stefan Weil	32e9d7c8f5	training: Fix some compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Stefan Weil	e4b862d588	pango_font_info: Fix runtime error messages from Pango pango_coverage_get and pango_coverage_unref should not be called with coverage == nullptr. pango_font_get_coverage should not be called with font == nullptr. Otherwise Pango prints runtime error messages: (process:12657): Pango-CRITICAL : pango_coverage_get: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_coverage_unref: assertion 'coverage != NULL' failed (process:12657): Pango-CRITICAL : pango_font_get_coverage: assertion 'font != NULL' failed (process:12657): GLib-GObject-CRITICAL : g_object_unref: assertion 'G_IS_OBJECT (object)' failed Typically those errors occur if a required font is not installed, so this can be a quite common error. Fix also a potential resource leak in PangoFontInfo::CoversUTF8Text. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-23 13:55:13 +01:00
Shree Devi Kumar	77d0b6ce8e	fix WORDLIST filename	2019-01-22 15:49:55 +01:00
Stefan Weil	564482db30	Fix selection of IntSimdMatrix method Commit `d36231e3e4` did not distinguish between AVX and AVX2, so AVX2 code was enabled for IntSimdMatrix even when only AVX was supported. This resulted in an illegal instruction. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-20 22:13:04 +01:00
Stefan Weil	66e31bfd8c	OpenCL: Fix alloc-dealloc mismatch Bug message from AddressSanitizer: ==7153==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs free) on 0x602000072cb0 #0 0x7ffff70c6a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10) #1 0x555557188638 in writeProfileToFile ../../../../../src/opencl/openclwrapper.cpp:541 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 08:06:26 +01:00
Stefan Weil	ad19183b92	OpenCL: Fix heap buffer overflow Bug message from AddressSanitizer: ==6158==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fffe774b7fc at pc 0x555557086b54 bp 0x7fffffffcee0 sp 0x7fffffffced8 READ of size 1 at 0x7fffe774b7fc thread T0 #0 0x555557086b53 in tesseract::HistogramRect(Pix, int, int, int, int, int, int) ../../../../../src/ccstruct/otsuthr.cpp:163 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-19 07:58:16 +01:00
Stefan Weil	502bb624c2	More optimisations for IntSimdMatrix * Move IntDotProductSSE. That allows inlining of the code. * Improve IntDotProductSSE by moving some instructions. * Remove unused num_input_groups_ from IntSimdMatrix. * Re-order elements in IntSimdMatrix to avoid padding. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	95606398f5	Clean code for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7fc7d28dd0	Compile files for AVX, AVX2 or SSE only when needed Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	a9a1035e55	Move IntSimdMatrixNative from IntSimdMatrix to unittest It is only used for the unit test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	d36231e3e4	Set best or user selected IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	605b4d66c7	Replace dynamically allocated IntSimdMatrix instances by constants Two header files are no longer needed and could be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	26be7c5d2e	Use constructor with parameters for IntSimdMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	e237a38405	Add const attributes to IntSimMatrix multiplier IntSimMatrix no longer contains variable members. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	7c70147701	Move shaped weights from IntSimMatrix to WeightMatrix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	ea4d0d354b	Format comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
Stefan Weil	c79d613b65	Replace ASSERT_HOST by assert Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 21:34:37 +01:00
zdenop	f75b2c1948	Merge pull request #310 from nickjwhite/hocrcharboxes Character boxes in hOCR output	2019-01-14 19:19:04 +01:00
Stefan Weil	9adf6e442b	Revert `59fb3370bb` (-ffast-math) It breaks intsimdmatrix_test. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-14 17:56:35 +01:00
Nick White	ebbf907c56	Fix typo in hocr character box output	2019-01-13 16:28:31 +00:00
Nick White	4ce797b6f6	Fix hocr character box info to use new hocr renderer correctly	2019-01-13 13:01:14 +00:00
Nick White	c43e4501e3	Merge remote-tracking branch 'origin/master' into hocrcharboxes	2019-01-13 12:41:42 +00:00
zdenop	238cb219d5	Merge pull request #2152 from stweil/clean Remove opencl_device_selection.h	2019-01-09 15:02:59 +01:00
Stefan Weil	a0e6586e63	Fix documentation for page segmentation mode 2 It never worked, so add a comment that the implementation is missing. Add also a to-do comment. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 13:51:44 +01:00
Stefan Weil	0fae848b58	OpenCL: Add comments to users of openclwrapper.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:11:00 +01:00
Stefan Weil	e0fc4f2945	Remove opencl_device_selection.h Always use OpenCL device selection if OpenCL is enabled. This fixes a regression which was introduced by commit `5c6a57b727` which removed the definition for USE_DEVICE_SELECTION. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-09 12:09:56 +01:00
Stefan Weil	595bb7df16	OpenCL: Remove unused code The OpenCL kernel pixSubtract is never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-05 16:41:20 +01:00
Nick White	b8de06430d	Ensure baseapi.h header is used by commontraining.h regardless of autotools usage	2019-01-04 20:20:00 +00:00
Nick White	cd34ee55ec	Add necessary intproto.h header to protos.cpp	2019-01-04 20:19:54 +00:00
Stefan Weil	62b635a74e	Remove unused functions from cluster.cpp Add also missing static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 13:16:31 +01:00
Stefan Weil	f76d8a14cd	Remove unused code from oldlist Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 12:27:10 +01:00
Stefan Weil	7719f80155	Add missing std namespace in tensorflow code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 11:15:36 +01:00
Stefan Weil	8a6fa452dc	Fix build for architectures without CPUID Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-03 09:32:36 +01:00
Stefan Weil	91af010200	Fix compiler warning gcc warning: src/training/text2image.cpp:694:35: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings] putenv expects a string which can be modified. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:49:04 +01:00
Stefan Weil	5dd606c631	Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 22:45:49 +01:00
Stefan Weil	d9600cd82e	Fix and simplify SIMD tests The tests for SSE and AVX must only be done if the correct compiler flags were used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-01-01 11:19:17 +01:00
zdenop	d3065520fa	fix 2 clang warnings	2018-12-30 20:25:24 +01:00
Stefan Weil	cb049133cd	Fix compiler warning clang warning: tesseractmain.cpp(512,21): warning: '&&' within '\|\|' [-Wlogical-op-parentheses] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-29 22:17:33 +01:00
zdenop	420fb0ced0	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2018-12-29 10:31:33 +01:00
zdenop	8885fe2ccb	provide info about compiled openmp version	2018-12-29 10:18:27 +01:00
Stefan Weil	993e56ffde	Don't try to create text output if other renderers failed (fix regression) Commit `49d7df6dc3` added error handling, but since that commit Tesseract used the text fallback if the user selected output failed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-27 10:23:28 +01:00
zdenop	cc997b53c7	add missing the implementation for TessBaseAPIGetAltoText method in C-API	2018-12-26 21:35:47 +01:00
Stefan Weil	db9c7e0312	Use std::stringstream to generate hOCR output Using std::stringstream simplifies the code and allows conversion of double to string independant of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-12-16 20:14:11 +01:00
zdenop	72d8df581b	Merge pull request #2121 from stweil/hocr Move code for hOCR renderer to new file	2018-12-16 16:26:27 +01:00

... 3 4 5 6 7 ...

996 Commits