Commit Graph

3765 Commits

Author SHA1 Message Date
zdenop
3bb8f9cd49 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-03-31 16:54:15 +02:00
zdenop
5f06402755 python: optimize imports, reformat code 2019-03-31 16:53:39 +02:00
zdenop
2e9fd69c9e use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable" 2019-03-31 16:53:33 +02:00
zdenop
a0527b41bd fix LGTM reports for python 2019-03-31 16:53:25 +02:00
Stefan Weil
1948f0d520 ocrclass: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:39:44 +02:00
Stefan Weil
85957e9673 WERD: Don't print space character after "FALSE" at end of line
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:32:42 +02:00
Stefan Weil
83d4433d3b Modernize and format unichar.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:30:15 +02:00
Stefan Weil
ac0b191f6b Modernize and format genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:21:32 +02:00
Stefan Weil
36ed08636b Modernize and format tesscallback.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:16:00 +02:00
zdenop
f47c7c92dd fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer;
CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142
2019-03-31 12:26:49 +02:00
Egor Pugin
497d1c543c
Update appveyor.yml 2019-03-30 04:17:32 +03:00
Egor Pugin
5767fe4ae8
Update appveyor.yml 2019-03-29 19:06:00 +03:00
Shreeshrii
ea36e94e58 fix Could not parse bool from flag (#2359) 2019-03-29 14:50:21 +01:00
zdenop
521707ebfb
Merge pull request #2350 from stweil/fuzzing
Add test code for fuzzing
2019-03-29 07:48:50 +01:00
Stefan Weil
852598eecf Remove file tessedit.h
It only declared the unused global variable global_monitor
which is now removed, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-27 19:03:42 +01:00
Stefan Weil
6e59abcce2 Remove file cutil.h
It only contained three type definitions which fit better in other
include files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-27 19:03:42 +01:00
Zdenko Podobný
3bbe4327c0 fix #2344 libpthread under-linking on FreeBSD 2019-03-27 15:37:14 +01:00
Stefan Weil
4ccbb9f830 configure: Check support of compile flags with -Werror
gcc fails if an unsupported compile flag is given, but clang and clang++
normally only emit a warning "argument unused during compilation".

The old test had accepted flags like -mavx for clang++ on non Intel hosts.
This resulted in build failures because Intel code was included.

Now the check runs with -Werror, and unsupported flags are detected as
an error. This fixes the build problem with clang++ on non Intel hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 16:41:52 +01:00
Stefan Weil
b6bfb20f1d Improve readability of conditional code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 12:35:56 +01:00
Stefan Weil
36a1a30c22 Remove some old type casts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 12:35:56 +01:00
Stefan Weil
59b90bd466 Update googletest
This fixes a regression which was introduced in
commit 37b0c36e32.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 12:33:52 +01:00
Stefan Weil
2718b81a3e fuzzer-api: Use environment variable TESSDATA_PREFIX if set
Clean also the code a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 11:09:22 +01:00
Stefan Weil
7e9970b4b1 Format fuzzer code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 11:09:22 +01:00
Stefan Weil
270e466d75 Add build rule for fuzzer-api
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 11:09:22 +01:00
Stefan Weil
7cd012f3dd Move fuzzer-api.cpp to subdirectory unittest/fuzzers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 11:09:10 +01:00
Guido Vranken
b6b00083b0 Use relative path to set TESSDATA_PREFIX 2019-03-26 10:20:14 +01:00
Guido Vranken
e9b72d8ca3 Add API fuzzer 2019-03-26 10:20:14 +01:00
zdenop
aa78a720a3
Merge pull request #2348 from stweil/modernize
Modernize C++ code
2019-03-26 09:33:27 +01:00
Stefan Weil
a44bf41f14 Modernize C++ loops
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-loop-convert' -fix

Then the resulting code was cleaned manually.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 08:38:21 +01:00
Stefan Weil
ed011670c8 Modernize C++ code using bool literals
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-bool-literals' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:58:02 +01:00
Stefan Weil
a0fd90583b Modernize C++ code using auto
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
36f768853a Modernize C++ code using override
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-override' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:37:52 +01:00
Stefan Weil
f877640bc9
Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval
tesstrain: check failure of subjobs
2019-03-25 16:10:09 +01:00
Stefan Weil
d8d2f6f48a Fix broken shell scripts for training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 15:32:43 +01:00
Stefan Weil
aaf8c50a12 unittest: Use range-for-loops
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 09:36:32 +01:00
Stefan Weil
631882a346 Fix compiler warnings (signed / unsigned mismatch)
clang warnings:

    src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 08:36:07 +01:00
Stefan Weil
ecaad2aca8 ccstruct/werd: Format code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 07:57:34 +01:00
Stefan Weil
b1e305f38c Simplify code which tests for non-empty StringParam
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:35:52 +01:00
Stefan Weil
f9860cda41 Optimize functions ResetFrom
The loop can terminate as soon as the parameter name was found.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:21:23 +01:00
Stefan Weil
41da5afe9d UNICHARSET: Fix compiler warning (signed/unsigned mismatch)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:18:21 +01:00
Stefan Weil
91e2b253c0 Format modified code with clang-format
Format the files which were changed in
commit 297d7d86ce.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:10:29 +01:00
Stefan Weil
06acbaf99c IntegerMatcher: Fix division by zero
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 19:39:31 +01:00
Stefan Weil
58423d2f6c
Merge pull request #2328 from bertsky/lstm-with-user-patterns2
Add user words / patterns again
2019-03-24 19:38:40 +01:00
zdenop
0d36d9a9d7
Merge pull request #2341 from Shreeshrii/fix
Fix
2019-03-24 18:21:09 +01:00
zdenop
b233fd3d4c
Merge pull request #2345 from stweil/assert
Fix compiler warnings caused by ASSERT_HOST
2019-03-24 18:20:35 +01:00
Stefan Weil
da6305b632 Fix compiler warnings caused by ASSERT_HOST
The modified definition avoids warnings caused by redundant semicolons.
Now a semicolon is required when using the macro, so a few code locations
had to be updated.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 17:47:04 +01:00
Stefan Weil
44a6d9f4d4 intmatcher: Catch more out of bounds reads
Credit to OSS-Fuzz which reported this issue:

intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*, short) tesseract/src/classify/intmatcher.cpp:1121:17
	    #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11
	    #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f16ee in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads in release builds.
Add also assertions for debug builds.

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 17:27:43 +01:00
Stefan Weil
5fd7228414 intmatcher: Catch out of bounds reads
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*) tesseract/src/classify/intmatcher.cpp:1163:17
	    #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11
	    #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f35fd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f201e in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads, but does not fix the primary
reason: ProtoLengths currently gets values which are larger than the
allowed index.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 15:44:33 +01:00
Stefan Weil
509ee95023 IntegerMatcher: Fix data type of loop counters
ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for
the related loop counters, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 15:35:06 +01:00
Stefan Weil
f4f34a87db WERD_RES: Fix uninitialized member variable
Credit to OSS-Fuzz which reported this issue:

    pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool'
	    #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7
	    #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3
	    #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 14:59:08 +01:00