This fixes compiler warnings from clang:
src/viewer/scrollview.h:86:7: warning:
'SVEventHandler' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings from clang:
src/ccmain/mutableiterator.h:44:7: warning:
'MutableIterator' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings from clang:
src/ccmain/ltrresultiterator.h:48:16: warning:
'LTRResultIterator' has no out-of-line virtual method definitions;
its vtable will be emitted in every translation unit [-Wweak-vtables]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Either it was not needed, or it could be replaced by checking
for not _WIN32.
This fixes a compiler warning from clang:
src/ccutil/platform.h:41:9: warning:
macro name is a reserved identifier [-Wreserved-id-macro]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warning from clang:
src/api/pdfrenderer.cpp:848:28: warning:
cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
size_t would require a different format string. Here an unsigned int
is sufficient in both cases, so use that.
This error was found by lgtm, see
https://lgtm.com/projects/g/tesseract-ocr/tesseract/.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/textord/makerow.cpp:2579:36: warning:
cast from 'const void *' to 'BLOBNBOX **' drops const qualifier [-Wcast-qual]
src/textord/makerow.cpp:2581:36: warning:
cast from 'const void *' to 'BLOBNBOX **' drops const qualifier [-Wcast-qual]
src/textord/makerow.cpp:2601:31: warning:
cast from 'const void *' to 'TO_ROW **' drops const qualifier [-Wcast-qual]
src/textord/makerow.cpp:2603:31: warning:
cast from 'const void *' to 'TO_ROW **' drops const qualifier [-Wcast-qual]
src/textord/makerow.cpp:2623:31: warning:
cast from 'const void *' to 'TO_ROW **' drops const qualifier [-Wcast-qual]
src/textord/makerow.cpp:2625:31: warning:
cast from 'const void *' to 'TO_ROW **' drops const qualifier [-Wcast-qual]
Warning from lgtm:
Local variable 'blob' hides a parameter of the same name.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/werd.cpp:128:4: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/werd.cpp:394:18: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/werd.cpp:394:27: warning:
cast from 'const void *' to 'WERD **' drops const qualifier [-Wcast-qual]
src/ccstruct/werd.cpp:395:18: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/werd.cpp:395:27: warning:
cast from 'const void *' to 'WERD **' drops const qualifier [-Wcast-qual]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/polyblk.cpp:194:16: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:195:16: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:292:45: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:30:9: warning:
macro is not used [-Wunused-macros]
src/ccstruct/polyblk.cpp:348:8: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:358:12: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:362:26: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:383:21: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:383:36: warning:
cast from 'const void *' to 'ICOORDELT **' drops const qualifier [-Wcast-qual]
src/ccstruct/polyblk.cpp:384:21: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/polyblk.cpp:384:36:
warning: cast from 'const void *' to 'ICOORDELT **' drops const qualifier [-Wcast-qual]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/ocrblock.cpp:74:12: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/ocrblock.cpp:74:21: warning:
cast from 'const void *' to 'ROW **' drops const qualifier [-Wcast-qual]
src/ccstruct/ocrblock.cpp:75:16: warning:
cast from 'const void *' to 'ROW **' drops const qualifier [-Wcast-qual]
src/ccstruct/ocrblock.cpp:75:7: warning:
use of old-style cast [-Wold-style-cast]
Make also the function decreasing_top_order a local function as it is
only used locally and remove its global declarations (2 locations).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/mod128.cpp:57:15: warning:
no previous extern declaration for non-static variable 'dirtab' [-Wmissing-variable-declarations]
src/ccstruct/mod128.cpp:57:24: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/mod128.cpp:57:35: warning:
cast from 'const short *' to 'ICOORD *' drops const qualifier [-Wcast-qual]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Compiler warnings from clang:
src/ccstruct/genblob.cpp:34:20: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/genblob.cpp:34:32: warning:
cast from 'const void *' to 'C_BLOB **' drops const qualifier [-Wcast-qual]
src/ccstruct/genblob.cpp:35:20: warning:
use of old-style cast [-Wold-style-cast]
src/ccstruct/genblob.cpp:35:32: warning:
cast from 'const void *' to 'C_BLOB **' drops const qualifier [-Wcast-qual]
The function c_blob_comparator is only used in fixspace.cpp,
so move it to that file, make it a local function, and remove
genblob.cpp and genblob.h which are no longer needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It is only used in textord/topitch.cpp, so move it into that file.
Remove also the inline attribute as it has not effect here and
update the type casts to fix some compiler warnings from clang.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- add linefeed after last line
- remove blanks at line endings
This fixes some warnings from clang:
src/training/validate_javanese.h:63:51: warning:
no newline at end of file [-Wnewline-eof]
src/training/validate_javanese.cpp:269:26: warning:
no newline at end of file [-Wnewline-eof]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instead of adding an empty TBOX at the end of the box list,
that corner case is now handled by passing a nullptr (like
it was already done for the first box in the list).
This avoids the calls of BoxMissMetric with a TBOX
which raises an assertion there (b == 0).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It looks like the check cblob_ptr != nullptr is not needed.
If cblob_ptr were NULL, we would have seen crashes in compute_bounding_box.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Let's hope that word->best_choice is never NULL.
Overwise both the old and the new code would abort with SIGSEGV.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
Page segmentation mode "OSD only" requires osd.traineddata,
so use it automatically.
Report a warning if the user specified a different language.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
By default, that script creates two new temporary directories with random
names in /tmp.
The new command line flag --workspace_dir PATH uses the given path as
a base directory for all temporary files.
That allows better reproducable training results (no random directory
names in log files).
Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.
The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
One of the checks was too restrictive, as lstmeval deserializes
char arrays with 14000000 elements, so raise the limit to 30000000.
That check was added in commit 992031e824.
Add also assertions which help finding such problems in debug mode.
Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
It is needed for running the training tutorial on Linux.
The correct mode was lost when moving the files in
commit 104fe7931c.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The Serialize method is used indirectly by MasterTrainer::Serialize,
but there is no corresponding MasterTrainer::DeSerialize.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
OpenclDevice::getDeviceSelection crashed when outdated information
was read from file and device.score was not set.
Change also the struct definitions from C to C++ and
eliminate some type casts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit 4d514d5a60 introduced tprintf_internal
with an additional argument "level" which was removed again in commit
7dc5296fe9.
So we can now restore the original state without tprintf_internal.
Remove also the declaration of debug_window_on (it does not exist since
commit 030aae9896) and make the
configuration parameter debug_file local as it is only used by tprintf.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
`int depth = strtol(*str + 1, str, 10);`
`**str` holds the words in the VGSL specification, and `*str` holds a single word, lets say, `Fr64`. Now, the `strtol` function modifies `str` to point to the first character which a non-digit number, and assumes that ` *str+1 ` points to a number (of valid integer format) as a string (automatically skipping all the white spaces, and no other characters), where in reality, it seems to point to `r` in `Fr164`.This is a bad argument, which results in strtol returning 0.
` strtol (*str + 2, str, 10)` should be passed instead.
Limit the matrix to UINT16_MAX x UINT16_MAX.
Larger dimensions could also result in an arithmetic overflow
when multiplying the two dimensions.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Wrong file data could give a large value for the number of vector elements
resulting in very large memory allocations.
Limit the allowed data range to UINT16_MAX (65535) elements
which hopefully should be sufficient for all use cases.
Changing the data type of the related member variables from int to
uint32_t allowed removing several type casts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Add break in default case to avoid potential problems with
future case statements following the default case.
* Remove empty statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warnings:
src/ccstruct/coutln.cpp:231:15: warning:
variable 'destindex' may be uninitialized when used here [-Wconditional-uninitialized]
src/wordrec/language_model.cpp:1170:27: warning:
variable 'expected_gap' may be uninitialized when used here [-Wconditional-uninitialized]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warnings:
src/api/baseapi.cpp:1642:18: warning:
possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:31: warning:
possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:45: warning:
possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:16: warning:
possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:30: warning:
possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1662:17: warning:
possible misuse of comma operator here [-Wcomma]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warning:
src/ccstruct/polyblk.cpp:48:36: warning:
constructor parameter 'box' shadows the field 'box' of 'POLY_BLOCK'
[-Wshadow-field-in-constructor]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warning:
src/lstm/networkio.cpp:56:15: warning:
'this' pointer cannot be null in well-defined C++ code;
comparison may be assumed to always evaluate to true [-Wtautological-undefined-compare]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warning:
src/lstm/lstmrecognizer.cpp:411:13: warning:
unused function 'NullIsBest' [-Wunused-function]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang warning:
src/lstm/network.cpp:249:7:
warning: 'break' will never be executed [-Wunreachable-code-break]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The functions TessBaseAPIInitLangMod, TessBaseAPIClearAdaptiveClassifier
and TessBaseAPIDetectOrientationScript need conditional compilation.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
On most systems float is the IEEE 754 single-precision binary
floating-point format (32 bits). Tesseract does not support other systems.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
On most systems double is the IEEE 754 double-precision binary
floating-point format (64 bits). Tesseract does not support other systems.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It did not cause a problem as both arguments were 0.
Update also the function prototype of HistogramRectOCL to
accept a void pointer which allows removing a type cast.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The division was made with integers, giving a wrong result.
* Avoid division and use pure integer operations.
* Add missing "static" attribute.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Remove unneeded assignments and a wrong comment in the destructor.
Fix wrong data type for local variable xstarts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The changes are based on an analysis done with include-what-you-use.
Replace also some standard header files by the corresponding
standard C++ header files.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
genericvector.h used a mix of assert and ASSERT_HOST.
By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.
Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Coverity Scan does not like incrementing of a null pointer,
so increment an index value instead of a pointer.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like
#include "capi.h"
must now change that to
#include "tesseract/capi.h"
This avoids name conflicts with header files from other projects.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Check whether the top right point of the block is inside of the
thresholded image t_pix. Otherwise the following code would make
illegal memory accesses.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit 87d33b6c9e added code which uses bool.
Therefore stdbool.h must be included for compilations with a C compiler.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Normal C++ programs like those which are built for tesseract automatically
set the locale "C".
There can be different locale settings if the tesseract library is used
in other software.
A wrong locale can cause wrong results from sscanf which is used at
different places in the tesseract code, so make sure that we have the
right locale settings and fail if that is not the case.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The assertions introduced by commit 8bea6bcc12
were too strict. The first one failed in osd_test, the second one failed
in `tesseract IMAGE BASE --psm 13 lstm.train`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Raise an assertion for unexpected arguments and use size_t instead of int
for the size argument which is typically sizeof(some_datatype).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
When Tesseract is called without any argument, the help message is still
printed, but the exit status no longer indicates success (EXIT_OK).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The progress reporting function returns a boolean. The returned
value is never used by the tesseract and its meaing is not
documented, which renders the value meaningless. Still, lack of
return should not be premitted.
The C API is missing the ability to monitor the progress of the
recognition. This patch adds C wrappers to the progress monitor
that allow monitoring the progress and canceling the recognition
process early.
The progress_callback field in the ETEXT_DESC monitor type does not
take any 'context' parameter, which may make implementing callback
functions difficult and may require use of global variables.
The new function receives the ETEXT_DESC pointer as an argument.
This makes it possible to share the cancel_this field as context
carrier if required.
The change is backwards-compatible: the old pointer remains as a
member of the class, and the default value for the new pointer is
a function calling the classic progress notifier. This way the code
unaware of the new member will continue to work as before.
Commit 0248c7ff9d replaced math.h by cmath.
Therefore isinf and isnan are no longer declared.
Replace them by their C++ 11 variant.
Signed-off-by: Stefan Weil <stweil@ub-blade-02.bib.uni-mannheim.de>
The following code caused a crash when Tesseract was compiled with -ftrapv:
1259 int width = right - left;
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff665c231 in __GI_abort () at abort.c:79
#2 0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3 0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
at ../../../src/textord/colpartitiongrid.cpp:1259
#4 0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
#5 0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
#6 0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
#7 0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
at ../../../src/ccmain/pagesegmain.cpp:141
#8 0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
#9 0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
warning: ‘tesseract::TabFind::v_it_’ will be initialized after [-Wreorder]
warning: ‘ICOORD tesseract::TabFind::image_origin_’ [-Wreorder]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
warning: ‘BLOCK::filename’ will be initialized after [-Wreorder]
warning: ‘PDBLK BLOCK::pdblk’ [-Wreorder]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The commit effa574 in 20.01.2017 added the bool textonly to the constructor of TessPDFRenderer. To maintain the compatibility to older APIs which are still using only two parameter, a default value for the textonly parameter is set.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
The else statement is never executed.
Remove also an unused element from the names array
and add the "static" attribute.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It's still possible to set the warning level in the project settings,
but single source files should normally not disable compiler warnings.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Coverity ID: 1386084 the set_font method has accessed resolution_ before it was initialized by the set_resolution method.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
Tesseract code does not use strings.h (strngs.h was once called strings.h),
so that dependency can also be removed from cmake and cppan configuration.
Signed-off-by: Stefan Weil <sw@weilnetz.de>