Commit Graph

111 Commits

Author SHA1 Message Date
Stefan Weil
a32d24fa65 Remove empty tessbox.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
91522dfba5 Remove memry.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Stefan Weil
1a151781ea Clean some include statements
The changes are based on an analysis done with include-what-you-use.

Replace also some standard header files by the corresponding
standard C++ header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Egor Pugin
15f64e0232 Remove recursive header. 2018-06-23 17:32:42 +03:00
Stefan Weil
484a1be98a Remove unneeded include statements for scanutils.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-22 19:16:08 +02:00
Stefan Weil
11f2b12fda Remove arch header files from public API
The arch header files are only used in the Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 21:46:48 +02:00
Stefan Weil
2bafff4c64 Remove LSTM header files from public API
The LSTM header files are only used in the Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 21:46:48 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826 Clean usage of assert.h
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff Remove public API file ndminx.h
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
0cb128d56b Remove errcode.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 06:20:26 +02:00
Stefan Weil
44450094c3 Replace ASSERT_HOST in genericvector.h
genericvector.h used a mix of assert and ASSERT_HOST.

By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.

Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 22:32:17 +02:00
Stefan Weil
2a5a092469 Fix CID 1393241 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
09976e6125 Fix CID 1393238 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
27a5908a55 Fix CID 1393239 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
f482ebdca1 Fix CID 1393243 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 20:06:28 +02:00
Stefan Weil
2ceb200186 Fix CID 1393244 and CID 1393244 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 19:28:04 +02:00
Stefan Weil
d6391ee811 Fix CID 1393540 (Explicit null dereferenced)
Coverity Scan does not like incrementing of a null pointer,
so increment an index value instead of a pointer.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 17:32:02 +02:00
Stefan Weil
e87e8967d7 Remove more header files from public API
Install only those headers which are needed by third party applications.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 11:54:38 +02:00
Stefan Weil
c1c87d73ee Require tesseract/ for API header files (fixes potential name conflicts)
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like

    #include "capi.h"

must now change that to

    #include "tesseract/capi.h"

This avoids name conflicts with header files from other projects.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-17 22:01:19 +02:00
Amit D
6f85de22bc
WordFontAttributes: Check that word != nullptr earlier. Fix #1665 2018-06-13 23:38:27 +03:00
Egor Pugin
8b64602a86
Merge pull request #1660 from Shreeshrii/master
Change default width for images output by text2image
2018-06-11 14:23:22 +03:00
Shreeshrii
a27e91c4f9
Update tesstrain_utils.sh 2018-06-11 09:35:14 +05:30
Shreeshrii
fdc243b363
Change default width for images output by text2image
Fixes
Image too large to learn!! Size = 2594x48
Image not trainable

See https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-271244655
for related discussion
2018-06-11 09:34:07 +05:30
Stefan Weil
fcdcba70f4 Remove some header files from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-10 16:19:58 +02:00
Stefan Weil
5812972775 block_edges: Add assertions for block coordinates
Check whether the top right point of the block is inside of the
thresholded image t_pix. Otherwise the following code would make
illegal memory accesses.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 14:06:33 +02:00
Egor Pugin
cd58a861d9
Merge pull request #1653 from stweil/typo
scanutils: Fix typos in comments
2018-06-09 11:00:22 +03:00
Stefan Weil
a709018e94 capi: Fix regression caused by use of bool data type
Commit 87d33b6c9e added code which uses bool.
Therefore stdbool.h must be included for compilations with a C compiler.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 08:45:45 +02:00
Stefan Weil
02277bed34 scanutils: Fix typos in comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 07:53:20 +02:00
zdenop
e7c1e0739c
Merge pull request #1649 from stweil/locale
Test for correct locale settings
2018-06-08 19:02:38 +02:00
Stefan Weil
3292484f67 Test for correct locale settings
Normal C++ programs like those which are built for tesseract automatically
set the locale "C".

There can be different locale settings if the tesseract library is used
in other software.

A wrong locale can cause wrong results from sscanf which is used at
different places in the tesseract code, so make sure that we have the
right locale settings and fail if that is not the case.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 17:40:10 +02:00
Stefan Weil
280db06bbf scanutils: Fix illegal memory access
Format strings which contain "%*s" show this error in Valgrind:

==32503== Conditional jump or move depends on uninitialised value(s)
==32503==    at 0x2B8BB0: tvfscanf(_IO_FILE*, char const*, __va_list_tag*) (scanutils.cpp:486)
==32503==    by 0x2B825A: tfscanf(_IO_FILE*, char const*, ...) (scanutils.cpp:234)
==32503==    by 0x272B01: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:54)
==32503==    by 0x1753CD: tesseract::Tesseract::SegmentPage(STRING const*, BLOCK_LIST*, tesseract::Tesseract*, OSResults*) (pagesegmain.cpp:115)
==32503==    by 0x1363CD: tesseract::TessBaseAPI::FindLines() (baseapi.cpp:2291)
==32503==    by 0x130CF1: tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) (baseapi.cpp:802)
==32503==    by 0x1322D3: tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1176)
==32503==    by 0x131A84: tesseract::TessBaseAPI::ProcessPagesMultipageTiff(unsigned char const*, unsigned long, char const*, char const*, int, tesseract::TessResultRenderer*, int) (baseapi.cpp:1013)
==32503==    by 0x132052: tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1129)
==32503==    by 0x131B1E: tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1032)
==32503==    by 0x12E00C: main (tesseractmain.cpp:537)
==32503==  Uninitialised value was created by a stack allocation
==32503==    at 0x272A60: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:41)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 15:28:30 +02:00
zdenop
d47cebcdc8
Merge pull request #1641 from stweil/fix
training: Add missing linefeed to error message
2018-06-06 22:13:26 +02:00
Stefan Weil
0215d91f45 training: Add missing linefeed to error message
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-06 21:32:16 +02:00
zdenop
ee2ab73224
Merge pull request #1637 from paulk124/master
Reserve extra byte in LoadDataFromFile() in case caller wants to appe…
2018-06-05 16:57:40 +02:00
Paul Kitchen
805fb7699d Reserve extra byte in LoadDataFromFile() in case caller wants to append '\0' 2018-06-05 08:19:41 -06:00
Stefan Weil
52fddc3ca9 TFile: Relax assertion and allow FRead, FWrite with count == 0
The assertions introduced by commit 8bea6bcc12
were too strict. The first one failed in osd_test, the second one failed
in `tesseract IMAGE BASE --psm 13 lstm.train`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 22:42:19 +02:00
Egor Pugin
83ae900549
Merge pull request #1629 from stweil/bool
src/training: Replace more proprietary BOOL8 by standard bool data type
2018-06-04 18:54:31 +03:00
Stefan Weil
4f3b266efe src/training: Replace more proprietary BOOL8 by standard bool data type
Update also callers of the modified functions to use
false / true instead of 0 / 1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
b292013bdc cntraining: Replace proprietary BOOL8 by standard bool data type
Add also "static" attribute to local functions and remove an old comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
8bea6bcc12 TFile: Improve handling of potential integer overflow
Raise an assertion for unexpected arguments and use size_t instead of int
for the size argument which is typically sizeof(some_datatype).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 13:53:36 +02:00
Stefan Weil
f2698c256d src/training: Replace proprietary BOOL8 by standard bool data type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-03 21:13:40 +02:00
Stefan Weil
629ded223c tesseractmain: Allow combinations of the different help options
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
724a72a278 tesseractmain: Always use EXIT_SUCCESS and EXIT_FAILURE macros for exit status
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
b5ac8502bc tesseractmain: EXIT_FAILURE if tesseract is called without arguments
When Tesseract is called without any argument, the help message is still
printed, but the exit status no longer indicates success (EXIT_OK).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
6dba34dd8c tesseractmain: No command line options between image and outputbase
The image name and the outputbase should not be separated by
command line options.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
zdenop
e313ed1bb9
Merge pull request #1614 from j-kubik/master
Recognition progress in C API
2018-06-02 08:54:21 +02:00
Stefan Weil
6f7206f574 tesseractmain: Remove unneeded duplicate code
The --list-langs option is already handled by other code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-01 20:45:53 +02:00
Stefan Weil
d4ed0f841a tesseractmain: Fail if bad command line option is given
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-01 20:04:35 +02:00
Jaroslaw Kubik
e6c9967b83 Fixed a typo in progres monitor C API
TessMonitorcDelete -> TessMonitorDelete
2018-06-01 19:42:28 +02:00