tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-16 02:09:30 +08:00

Author	SHA1	Message	Date
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	18edac4086	Fix CID 1164623 (Uninitialized scalar field) Fix it by combining constructor and Init method. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-07-06 17:34:46 +02:00
Stefan Weil	d2febafdcd	Fix compiler warnings [-Wmissing-prototypes] Add missing include statements, add missing "static" qualifiers or remove functions which are not used at all. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-07-05 16:03:02 +02:00
Stefan Weil	bb7bb1f0b8	Remove old comments for exceptions Exceptions are no longer used. Remove also some history comments and fix several comments. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-07-03 14:53:00 +02:00
Stefan Weil	faae87beaa	Replace FLOAT32 by float data type On most systems float is the IEEE 754 single-precision binary floating-point format (32 bits). Tesseract does not support other systems. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-07-02 13:29:39 +02:00
Stefan Weil	509a6f0ce0	Fix some typos (most found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-05-27 18:49:43 +02:00
Alexander Zaitsev	e7e8e20119	Remove deprecated in C++11 'register' keyword (removed since C++17).	2018-05-20 01:49:26 +03:00
Alexander Zaitsev	0248c7ff9d	Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>).	2018-05-20 00:52:04 +03:00
Egor Pugin	e95ff1159e	Move sources into src dir. Update build scripts.	2018-04-25 11:02:54 +03:00

20 Commits