The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-loop-convert' -fix
Then the resulting code was cleaned manually.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-bool-literals' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-override' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The modified definition avoids warnings caused by redundant semicolons.
Now a semicolon is required when using the macro, so a few code locations
had to be updated.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- pass in ParamsVectors from Tesseract
(carrying values from langdata/config/api)
into LSTMRecognizer::Load and LoadDictionary
- after LSTMRecognizer's Dict is initialised
(with default values), reset the variables
user_{words,patterns}_{suffix,file} from the
corresponding entries in the passed vector
Warning from clang++:
..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
symbol_steps is a vector, so testing for a nullptr was wrong.
clang++ reports:
..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare]
if (&word_res_->symbol_steps == nullptr || !LSTM_mode_) return nullptr;
~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- ignore matrix outputs in ComputeTopN if they
belong to a disabled unichar_id
- pass UNICHARSET refs to check that
- in SetBlackAndWhitelist, also update the unicharset
of the lstm_recognizer_ instance, if any
gcc warnings:
src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instrumented code throws this runtime error during OCR:
../../src/api/baseapi.cpp:1616:5: runtime error: load of value 128,
which is not a valid value for type 'bool'
../../src/api/baseapi.cpp:1627:5: runtime error: load of value 128,
which is not a valid value for type 'bool'
If there is no font information (typical for Tesseract with a LSTM model),
the font attributes got random values resulting in wrong hOCR output.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes some compiler warnings:
mainblk.cpp:28:9: warning: macro is not used [-Wunused-macros]
mainblk.cpp:29:9: warning: macro is not used [-Wunused-macros]
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a warning from LGTM:
No matching copy assignment operator in class LineHypothesis.
It is good practice to match a copy constructor
with a copy assignment operator.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
isspace() must only used with an unsigned char or EOF argument,
and even then its result can depend on the current locale settings.
While this is not a problem for C/C++ executables which use the default
"C" locale, it becomes a problem when the Tesseract API is called from
languages like Python or Java which don't use the "C" locale.
By calling isasci() before calling isspace() this uncertainty can be
avoided, because any locale will hopefully give identical results for
the basic ASCII character set.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a warning from LGTM:
This parameter of type ScrollView is 144 bytes
- consider passing a pointer/reference instead.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings and a warning from LGTM:
Poor global variable name 'pe'. Prefer longer, descriptive names [...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings from clang:
src/ccmain/mutableiterator.h:44:7: warning:
'MutableIterator' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings from clang:
src/ccmain/ltrresultiterator.h:48:16: warning:
'LTRResultIterator' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Either it was not needed, or it could be replaced by checking
for not _WIN32.
This fixes a compiler warning from clang:
src/ccutil/platform.h:41:9: warning:
macro name is a reserved identifier [-Wreserved-id-macro]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/genblob.cpp:34:20: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/genblob.cpp:34:32: warning:
cast from 'const void *' to 'C_BLOB **' drops const qualifier [-Wcast-qual]
src/ccstruct/genblob.cpp:35:20: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/genblob.cpp:35:32: warning:
cast from 'const void *' to 'C_BLOB **' drops const qualifier [-Wcast-qual]
The function c_blob_comparator is only used in fixspace.cpp,
so move it to that file, make it a local function, and remove
genblob.cpp and genblob.h which are no longer needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instead of adding an empty TBOX at the end of the box list,
that corner case is now handled by passing a nullptr (like
it was already done for the first box in the list).
This avoids the calls of BoxMissMetric with a TBOX
which raises an assertion there (b == 0).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Let's hope that word->best_choice is never NULL.
Overwise both the old and the new code would abort with SIGSEGV.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.
The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The changes are based on an analysis done with include-what-you-use.
Replace also some standard header files by the corresponding
standard C++ header files.
Signed-off-by: Stefan Weil <sw@weilnetz.de>