Tesseract crashes with an unhandled exception (std::bad_alloc) if it gets
a bad tessdata file where the numEntries data field is very large (also
after swapping), for example 0x77777777.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It's not necessary to initialize the vector with 0,
because the initial values are read from file.
Fix also an assertion when trying to read an empty file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
There exist standard macro definitions for the printf format specifiers.
MS Visual Studio does not support that standard (at least not in older
versions), so local definitions are needed there.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Both functions simply call malloc, free.
Remove also unneeded null pointer checks and use calloc where possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a compiler warning:
ccutil/scanutils.cpp:284:7: warning:
variable 'sign' set but not used [-Wunused-but-set-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.
As a result of this file removal, we now use:
std::unordered_map
std::unordered_set
std::unique_ptr
directly in the codebase with '#include' for the needed headers.
This fixes a compiler warning from clang:
ccutil/platform.h:88:13: warning:
macro name is a reserved identifier [-Wreserved-id-macro]
#define _TESS_FILE_BASENAME_ \
Signed-off-by: Stefan Weil <sw@weilnetz.de>
clang++ report:
api/baseapi.h:852:4: warning:
extra ';' after member function definition [-Wextra-semi]
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
GNU compiler report (cross build for Windows on Debian):
In file included from ../ccutil/host.h:63:0,
from ../arch/dotproductsse.h:22,
from ../arch/dotproductsse.cpp:43:
../ccutil/platform.h:27:0: warning: "NOMINMAX" redefined
#define NOMINMAX
In file included from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/c++config.h:495:0,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/cstdlib:41,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/stdlib.h:36,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/mm_malloc.h:27,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/xmmintrin.h:34,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/emmintrin.h:31,
from ../arch/dotproductsse.cpp:40:
/usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/os_defines.h:45:0:
note: this is the location of the previous definition
#define NOMINMAX 1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The compare method is called very often, so even small improvements
are important.
The new code avoids one comparison in each loop iteration.
This results in smaller code (60 bytes for x86_64, gcc).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
POSIX provides portable data types for signed and unsigned integer values
of different size.
This patch maps those POSIX data types to the Tesseract specific types.
In a next step, the Tesseract data types can be eliminated by replacing
them with the POSIX data types.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Coverity report:
CID 1340278 (#1 of 1): Resource leak (RESOURCE_LEAK)
11. leaked_storage: Variable output_file going out of scope leaks the storage it points to.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The implementation for MS C did not pass the variable arguments to
tprintf.
The standard is supported since C99 / C++11, so one implementation
is sufficient.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
As pointed out by Stefan Weil, conditionally defining off_t using a
macro isn't a valid approach. off_t does not have a fixed size and is
used in ABI definitions (e.g. syscalls), so silently guessing its size
risks breaking the build. Additionally, all sane and modern platforms
will have off_t.
gcc reports these warnings with -Wextra:
ccstruct/pageres.h:330:3: warning:
base class 'class ELIST_LINK' should be explicitly initialized
in the copy constructor [-Wextra]
ccstruct/ratngs.cpp:115:1: warning:
base class 'class ELIST_LINK' should be explicitly initialized
in the copy constructor [-Wextra]
ccstruct/ratngs.h:291:3: warning:
base class 'class ELIST_LINK' should be explicitly initialized
in the copy constructor [-Wextra]
ccutil/genericvector.h:435:3: warning:
base class 'class GenericVector<WERD_RES*>' should be explicitly initialized
in the copy constructor [-Wextra]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes compiler warnings like this one:
api/baseapi.h:739:32: warning:
type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
In VC14 snprintf function is provided in standard library there triggering error. "snprintf Do not define snprintf as a macro. Macro definition of snprintf conflicts with Standard Library function declaration"
Signed-off-by: Jaka Konda <jaka.konda@outlook.com>
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.
Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.