Commit Graph

72 Commits

Author SHA1 Message Date
Stefan Weil
32098b7d4d IndexMap: Define virtual destructor in .cpp file
This fixes compiler warnings from clang:

src/ccutil/indexmapbidi.h:102:7: warning:
 'IndexMapBiDi' has no out-of-line virtual method definitions;
 its vtable will be emitted in every translation unit [-Wweak-vtables]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 07:45:28 +02:00
Stefan Weil
5b8162f0ef CCUtil: Define virtual destructor in .cpp file
This fixes compiler warnings from clang:

src/ccutil/ccutil.h:51:7: warning:
 'CCUtil' has no out-of-line virtual method definitions;
 its vtable will be emitted in every translation unit [-Wweak-vtables]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 07:44:27 +02:00
Stefan Weil
c635cdf5d5 Do not define or use macro __UNIX__
Either it was not needed, or it could be replaced by checking
for not _WIN32.

This fixes a compiler warning from clang:

src/ccutil/platform.h:41:9: warning:
 macro name is a reserved identifier [-Wreserved-id-macro]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 07:34:11 +02:00
Stefan Weil
69a111a739 Clean use of qsort function sort_floats
It is only used in textord/topitch.cpp, so move it into that file.

Remove also the inline attribute as it has not effect here and
update the type casts to fix some compiler warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-31 23:17:27 +02:00
Stefan Weil
7a2f8d9010 Move class tesseract::File from training to ccutil
This allows using the class for unittests, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-25 18:16:46 +02:00
Stefan Weil
6a28cce96b Fix whitespace issues
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
Stefan Weil
132c540c85 Increase limit for deserialization of large arrays
The last limit was still too small.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-21 11:10:09 +02:00
Stefan Weil
f577e292c2 Increase limit and add assertions for deserialization of large arrays
One of the checks was too restrictive, as lstmeval deserializes
char arrays with 14000000 elements, so raise the limit to 30000000.
That check was added in commit 992031e824.

Add also assertions which help finding such problems in debug mode.

Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
2018-07-20 11:47:49 +02:00
Stefan Weil
88b3d940be TessdataManager: Use new serialization API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 17:28:13 +02:00
Stefan Weil
da0217fa75 STRING: Use new serialization API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 17:17:22 +02:00
Stefan Weil
5e05f2cb84 IndexMap: Use new serialization API and optimize code
By changing the type of sparse_size_ from int to int32_t,
a local copy can be removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 17:12:44 +02:00
Stefan Weil
edff1d1882 BitVector: Use new serialization API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 17:07:03 +02:00
Stefan Weil
66bc012d27 UNICHARSET: Use new serialization API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 16:22:02 +02:00
Stefan Weil
eb90068b5f RecodedCharID: Use new serialization API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 16:22:01 +02:00
Stefan Weil
c383b1aaca TFile: Add helper functions for serialization of simple data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 11:19:37 +02:00
Stefan Weil
16832f9878 Add helper functions for serialization of simple data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 11:19:37 +02:00
zdenop
e9cd6024d7
Merge pull request #1767 from stweil/unused
Remove unused macros and fix comments
2018-07-07 21:55:17 +02:00
Stefan Weil
0d4975933e Replace tprintf_internal by tprintf and clean tprintf code
Commit 4d514d5a60 introduced tprintf_internal
with an additional argument "level" which was removed again in commit
7dc5296fe9.

So we can now restore the original state without tprintf_internal.

Remove also the declaration of debug_window_on (it does not exist since
commit 030aae9896) and make the
configuration parameter debug_file local as it is only used by tprintf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:47:10 +02:00
Stefan Weil
8bd9567355 Fix some comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:19:01 +02:00
Stefan Weil
7e80a850ad Remove unused macros
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:19:01 +02:00
Stefan Weil
0eb239ee8b Fix typo in comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-06 22:00:00 +02:00
Stefan Weil
4bb41b8952 Fix CID 1164693 (Untrusted value as argument)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-06 16:11:29 +02:00
Stefan Weil
992031e824 Fix CID 1164702 (Untrusted value as argument)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-06 16:11:29 +02:00
Stefan Weil
8871f4d622 Fix CID 1164686 (Use of untrusted scalar value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-06 16:11:29 +02:00
Stefan Weil
92e2ad0471 Fix CID 1164703 (Untrusted value as argument)
Wrong file data could give a large value for the number of vector elements
resulting in very large memory allocations.

Limit the allowed data range to UINT16_MAX (65535) elements
which hopefully should be sufficient for all use cases.

Changing the data type of the related member variables from int to
uint32_t allowed removing several type casts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-06 16:11:10 +02:00
Stefan Weil
d2febafdcd Fix compiler warnings [-Wmissing-prototypes]
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 16:03:02 +02:00
Stefan Weil
bdf09f40b1 Fix compiler warnings [-Wzero-as-null-pointer-constant]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 20:40:56 +02:00
Stefan Weil
c8b5a29ce9 Remove unneeded type casts
This removes unneded type casts to (char*) and (const char*).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 14:23:55 +02:00
Stefan Weil
6d170a15ec Replace tabs by blanks in source code
blobs.cpp had many tabs and was formatted with clang-format.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 16:29:14 +02:00
Stefan Weil
626a229cac Remove nwmain.h
The macro DECLARE_MAIN is not used by the current Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 15:54:41 +02:00
Stefan Weil
faae87beaa Replace FLOAT32 by float data type
On most systems float is the IEEE 754 single-precision binary
floating-point format (32 bits). Tesseract does not support other systems.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
f6c3c8cf4d Replace MAX_FLOAT32 by standard FLT_MAX and remove unused MIN_FLOAT32
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
919901eb19 Replace FLOAT64 by double data type
On most systems double is the IEEE 754 double-precision binary
floating-point format (64 bits). Tesseract does not support other systems.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 08:07:37 +02:00
Stefan Weil
9bb5a87760 Remove stderr.h and its include statements
MEMORY_OUT is no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:14:20 +02:00
Stefan Weil
db7f2009d9 Remove memry.cpp, memry.h
The proprietary memory allocators alloc_string, alloc_mem
are no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:13:59 +02:00
Stefan Weil
20e243d5c9 strngs: Replace alloc_mem, free_mem by standard functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:31:48 +02:00
Stefan Weil
91522dfba5 Remove memry.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Stefan Weil
1a151781ea Clean some include statements
The changes are based on an analysis done with include-what-you-use.

Replace also some standard header files by the corresponding
standard C++ header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Stefan Weil
484a1be98a Remove unneeded include statements for scanutils.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-22 19:16:08 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826 Clean usage of assert.h
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff Remove public API file ndminx.h
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
0cb128d56b Remove errcode.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 06:20:26 +02:00
Stefan Weil
44450094c3 Replace ASSERT_HOST in genericvector.h
genericvector.h used a mix of assert and ASSERT_HOST.

By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.

Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 22:32:17 +02:00
Stefan Weil
d6391ee811 Fix CID 1393540 (Explicit null dereferenced)
Coverity Scan does not like incrementing of a null pointer,
so increment an index value instead of a pointer.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 17:32:02 +02:00
Stefan Weil
e87e8967d7 Remove more header files from public API
Install only those headers which are needed by third party applications.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 11:54:38 +02:00
Stefan Weil
c1c87d73ee Require tesseract/ for API header files (fixes potential name conflicts)
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like

    #include "capi.h"

must now change that to

    #include "tesseract/capi.h"

This avoids name conflicts with header files from other projects.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-17 22:01:19 +02:00
Stefan Weil
02277bed34 scanutils: Fix typos in comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 07:53:20 +02:00
Stefan Weil
280db06bbf scanutils: Fix illegal memory access
Format strings which contain "%*s" show this error in Valgrind:

==32503== Conditional jump or move depends on uninitialised value(s)
==32503==    at 0x2B8BB0: tvfscanf(_IO_FILE*, char const*, __va_list_tag*) (scanutils.cpp:486)
==32503==    by 0x2B825A: tfscanf(_IO_FILE*, char const*, ...) (scanutils.cpp:234)
==32503==    by 0x272B01: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:54)
==32503==    by 0x1753CD: tesseract::Tesseract::SegmentPage(STRING const*, BLOCK_LIST*, tesseract::Tesseract*, OSResults*) (pagesegmain.cpp:115)
==32503==    by 0x1363CD: tesseract::TessBaseAPI::FindLines() (baseapi.cpp:2291)
==32503==    by 0x130CF1: tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) (baseapi.cpp:802)
==32503==    by 0x1322D3: tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1176)
==32503==    by 0x131A84: tesseract::TessBaseAPI::ProcessPagesMultipageTiff(unsigned char const*, unsigned long, char const*, char const*, int, tesseract::TessResultRenderer*, int) (baseapi.cpp:1013)
==32503==    by 0x132052: tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1129)
==32503==    by 0x131B1E: tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1032)
==32503==    by 0x12E00C: main (tesseractmain.cpp:537)
==32503==  Uninitialised value was created by a stack allocation
==32503==    at 0x272A60: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:41)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 15:28:30 +02:00
zdenop
ee2ab73224
Merge pull request #1637 from paulk124/master
Reserve extra byte in LoadDataFromFile() in case caller wants to appe…
2018-06-05 16:57:40 +02:00