This fixes a warning from LGTM:
Poor global variable name 'rgb'. Prefer longer, descriptive
names for globals (eg. kMyGlobalConstant, not foo).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a warning from LGTM:
This parameter of type ScrollView is 144 bytes
- consider passing a pointer/reference instead.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* 'master' of https://github.com/tesseract-ocr/tesseract: (27 commits)
Rework check for readable input file
fix "mktemp -d --tmpdir" on Mac OS; see #1453
pgedit: Change some variables from global to local ones
improve description of min_characters_to_try variable
WERD_RES: Remove comparisons which are constant
GENERIC_2D_ARRAY: Pass parameters by reference
genericvector: Pass parameters by reference
chop: Use more efficient float calculations for sqrt
rect: Use more efficient float calculations for ceil, floor
intproto: Use more efficient float calculations for floor
genericvector: Rewrite code to satisfy static code analyzer
Fix constructor for class Dict (uninitialized member variables)
Fix use of wrong UNICHARSET
lstmtraining: Remove dead code for purified model name
combine_tessdata: Handle failures when extracting
lstmtraining: Check write permission for output model
implement parameter min_characters_to_try for minimum characters to try to skip page entirely. fixes#1729
Merge and enhance documentation on language and script models
Document some more config options for tesseract
Add Makefile rule to build HTML manpages
...
This fixes compiler warnings and a warning from LGTM:
Poor global variable name 'pe'. Prefer longer, descriptive names [...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from LGTM:
Comparison is always false because id >= 0.
Comparison is always true because mirrored >= 1.
Comparison is always false because id >= 0.
INVALID_UNICHAR_ID is -1, so the warnings are correct.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from LGTM:
This parameter of type FontClassInfo is 192 bytes
- consider passing a pointer/reference instead.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings like the following one from LGTM:
This parameter of type ParamsTrainingHypothesis is 112 bytes
- consider passing a pointer/reference instead.
Most parameters can also get the const attribute.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from LGTM:
Multiplication result may overflow 'float' before it is converted
to 'double'.
While the sqrt function always calculates with double, here the
overloaded std::sqrt can be used to handle the float arguments
more efficiently.
Replace also an old C++ type cast by a static_cast.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from LGTM:
Multiplication result may overflow 'float' before it is converted
to 'double'.
While the floor function always calculates with double, here the
overloaded std::floor can be used to handle the float arguments
more efficiently.
Replace also old C++ type casts by static_cast.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from LGTM:
Multiplication result may overflow 'float' before it is converted
to 'double'.
While the floor function always calculates with double, here the
overloaded std::floor can be used to handle the float arguments
more efficiently.
Replace also old C++ type casts by static_cast.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Warning from LGTM:
Resource data_ is acquired by class GenericVector<FontSpacingInfo *>
but not released in the destructor.
LGTM complains about data_ not being deleted in the destructor.
The destructor calls the clear() method, but the delete there
was conditional which confuses the static code analyzer.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
wildcard_unichar_id_, apostrophe_unichar_id_, question_unichar_id_ and
slash_unichar_id_ were not initialized in the constructor.
slash_unichar_id_ was used later in a conditional.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Report an error and terminate if that fails.
Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main()
and add missing return at end of main().
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This is done by creating a temporary file.
Report an error and terminate if that fails.
Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main().
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Clarify also the name(s) of the generated OCR result file(s):
Tesseract does not create a file named outbase.txt by default.
Fix also a sentence in the language section.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- move Tesseract 4 release note to other release notes
- format command line options in text
- add link to release notes (wiki)
- add link to contributors (GitHub)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Setting the page segmentation mode to 6 ("Assume a single uniform block
of text") typically improves the layout detection for such texts, but
should not be done in the config file.
unlvtests/runtestset.sh adds `--psm 6` explicitly, so test results
won't change when using that script.
This is similar to commit ecfee53bac.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
While orientation and script detection (OSD) normally requires
osd.traineddata to detect both, it must also be possible to do
only orientation detection with eng.traineddata or any other
traineddata.
Enforce osd.traineddata only if there was no `-l` command line option.
Commit 27ce472666 was too restrictive.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Setting the page segmentation mode in those config files gives unexpected
results: the text recognized when no config or only txt is given changes
if both txt and any of hocr, pdf or tsv is chosen.
In a test set of nearly 200 pages from historical books, using
segmentation mode 1 is typically slightly better than the default,
but there are also cases where it is much worse. Therefore the user
should be able to decide which page segmentation mode is best.
Old results for hocr, pdf or tsv now need an explicit `--psm 1` for
reproduction.
Signed-off-by: Stefan Weil <sw@weilnetz.de>