Commit Graph

542 Commits

Author SHA1 Message Date
Stefan Weil
790b410fd6 Remove unused API function TessBaseAPIDetectOS
It was not declared in capi.h, so external users could not use it anyway.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 14:49:48 +02:00
Stefan Weil
f107f116d9 Fix compiler warnings [-Wconditional-uninitialized]
clang warnings:

src/ccstruct/coutln.cpp:231:15: warning:
 variable 'destindex' may be uninitialized when used here [-Wconditional-uninitialized]
src/wordrec/language_model.cpp:1170:27: warning:
 variable 'expected_gap' may be uninitialized when used here [-Wconditional-uninitialized]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Stefan Weil
a74d467e90 Fix compiler warnings [-Wcomma]
clang warnings:

src/api/baseapi.cpp:1642:18: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:31: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:45: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:16: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:30: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1662:17: warning:
 possible misuse of comma operator here [-Wcomma]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Stefan Weil
296a836f4e Fix compiler warnings [-Wunused-const-variable]
clang warnings:

src/classify/trainingsampleset.cpp:39:11: warning:
 unused variable 'kMinOutlierSamples' [-Wunused-const-variable]
src/lstm/lstmrecognizer.cpp:45:11: warning:
 unused variable 'kMaxChoices' [-Wunused-const-variable]
src/training/dawg2wordlist.cpp:28:11: warning:
 unused variable 'kDictDebugLevel' [-Wunused-const-variable]
src/training/stringrenderer.cpp:50:21: warning:
 unused variable 'kWordJoiner' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Stefan Weil
787bde5630 Fix syntax errors introduced by last commit (regression)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 07:34:04 +02:00
Stefan Weil
d960a50c12 Fix compiler warning [-Wshadow-field-in-constructor]
clang warning:

src/ccstruct/polyblk.cpp:48:36: warning:
 constructor parameter 'box' shadows the field 'box' of 'POLY_BLOCK'
 [-Wshadow-field-in-constructor]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 21:58:33 +02:00
Stefan Weil
c1be1024be Fix compiler warning [-Wtautological-undefined-compare]
clang warning:

src/lstm/networkio.cpp:56:15: warning:
 'this' pointer cannot be null in well-defined C++ code;
 comparison may be assumed to always evaluate to true [-Wtautological-undefined-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 21:58:33 +02:00
Stefan Weil
52d392da50 Fix compiler warning [-Wunused-function]
clang warning:

src/lstm/lstmrecognizer.cpp:411:13: warning:
 unused function 'NullIsBest' [-Wunused-function]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 21:58:33 +02:00
Stefan Weil
6cc35646f8 Fix compiler warning [-Wunreachable-code-break]
clang warning:

src/lstm/network.cpp:249:7:
 warning: 'break' will never be executed [-Wunreachable-code-break]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 21:58:33 +02:00
Stefan Weil
bdf09f40b1 Fix compiler warnings [-Wzero-as-null-pointer-constant]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 20:40:56 +02:00
Stefan Weil
60fcff5ed9 Fix build with legacy engine disabled (part 2)
The functions TessBaseAPIInitLangMod, TessBaseAPIClearAdaptiveClassifier
and TessBaseAPIDetectOrientationScript need conditional compilation.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
Stefan Weil
081793ff48 Fix build with legacy engine disabled
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
zdenop
20e53b119a
Merge pull request #1742 from stweil/casts
Remove unneeded type casts
2018-07-04 15:35:49 +02:00
Stefan Weil
c8b5a29ce9 Remove unneeded type casts
This removes unneded type casts to (char*) and (const char*).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 14:23:55 +02:00
Amit D
62c7b796da
Merge branch 'master' into disable-legacy 2018-07-04 11:14:33 +03:00
amitdo
15fb491be4 Add missing #ifdef in tesseractmain.cpp 2018-07-04 09:57:12 +03:00
amitdo
134779f758 Fix duplicate #ifndef in blobclass.cpp 2018-07-04 08:49:58 +03:00
amitdo
aa9f4b4861 Add an option to compile tesseract without the code of the legacy OCR engine 2018-07-03 18:49:42 +03:00
Stefan Weil
6d170a15ec Replace tabs by blanks in source code
blobs.cpp had many tabs and was formatted with clang-format.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 16:29:14 +02:00
Stefan Weil
626a229cac Remove nwmain.h
The macro DECLARE_MAIN is not used by the current Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 15:54:41 +02:00
Stefan Weil
f8684cb0fd Fix syntax error (regression)
It was introduced in commit bb7bb1f0b8.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 15:53:42 +02:00
zdenop
4b26b8d9a9
Merge pull request #1735 from stweil/pdblock
Remove blckerr.h
2018-07-03 15:24:09 +02:00
Stefan Weil
bb7bb1f0b8 Remove old comments for exceptions
Exceptions are no longer used.

Remove also some history comments and fix several comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 14:53:00 +02:00
Stefan Weil
889f7eaa1b Remove blckerr.h
Move the two ERRCODE constants which are still in use to pdblock.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 14:08:57 +02:00
Stefan Weil
872813245d Replace function DoError and remove danerror.cpp, danerror.h
This allows also removing all error trap macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 13:21:17 +02:00
Stefan Weil
6a553f9f28 Clean up cutil.h
* Remove unrelated include statements from cutil.h.
* Remove macros FALSE, TRUE.
* Move macro CHARS_PER_LINE from cutil.h to dict.h.
* Remove unneeded macro _ARGS.
* Remove unused typedef statements.
* Remove macro new_line (only used once).
* Remove unused macro print_string.
* Update include statements for other source files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 11:31:41 +02:00
zdenop
a0ed0b4987
Merge pull request #1732 from stweil/headerfiles
Remove unused include files
2018-07-03 07:57:15 +02:00
zdenop
66ea6c4470
Merge pull request #1730 from stweil/pi
Remove unneeded macro definition for M_PI
2018-07-03 07:26:59 +02:00
Stefan Weil
9325fbe322 Remove unused include files
ccstruct/hpdsizes.h was not used at all.
cutil/const.h was included, but not needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 07:25:38 +02:00
Stefan Weil
2cd2d3200f Remove functions open_file, exists_file
cutil.cpp is now no longer needed and removed, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 06:45:34 +02:00
Stefan Weil
cbd7b15788 Remove unneeded macro definition for M_PI
There is already one in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:59:16 +02:00
Stefan Weil
f7b61891bc Replace macro PI by macro M_PI
One definition for pi is sufficient.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:26:53 +02:00
zdenop
c323312c17
Merge pull request #1725 from stweil/doerror
Replace Efopen by fopen and remove efio.cpp, efio.h
2018-07-02 20:53:28 +02:00
Stefan Weil
3840a769d6 Remove unused function long_rand
Remove also some old comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 20:11:42 +02:00
Stefan Weil
b57afc7c78 Replace Efopen by fopen and remove efio.cpp, efio.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 17:46:28 +02:00
Stefan Weil
faae87beaa Replace FLOAT32 by float data type
On most systems float is the IEEE 754 single-precision binary
floating-point format (32 bits). Tesseract does not support other systems.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
f6c3c8cf4d Replace MAX_FLOAT32 by standard FLT_MAX and remove unused MIN_FLOAT32
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 13:29:39 +02:00
Stefan Weil
919901eb19 Replace FLOAT64 by double data type
On most systems double is the IEEE 754 double-precision binary
floating-point format (64 bits). Tesseract does not support other systems.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 08:07:37 +02:00
Stefan Weil
abbd78a053 Fix CID 1340271, 1340272, 1340273, 1340274 (Use after free)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 20:18:39 +02:00
Stefan Weil
52b44c5ebf Fix CID 1164530 (Logically dead code)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 20:01:56 +02:00
Stefan Weil
57970443b4 Fix CID 1393661 (Arguments in wrong order)
It did not cause a problem as both arguments were 0.

Update also the function prototype of HistogramRectOCL to
accept a void pointer which allows removing a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:40:44 +02:00
Stefan Weil
09da044a77 Fix CID 1164553 (Division or modulo by float zero)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:27:01 +02:00
Stefan Weil
1b303e5d37 Fix CID 1393662 Resource leak
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:27:01 +02:00
Stefan Weil
d3c4642d8f Fix CID 1393662 (Resource leak) 2018-07-01 19:27:01 +02:00
Stefan Weil
98758fb300 opencl: Use std::vector and clean code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:27:01 +02:00
Stefan Weil
53795a88b5 Fix CID 1158180 Argument cannot be negative
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:18:32 +02:00
Stefan Weil
6801085376 pdfrenderer: Fix ClipBaseline and optimize code
The division was made with integers, giving a wrong result.

* Avoid division and use pure integer operations.
* Add missing "static" attribute.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 08:33:56 +02:00
Stefan Weil
e8e94d372c Fix CID 1340287 (Unchecked return value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
a49b8f1d21 Fix CID 1297960 (Dereference after null check)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
86eb4dfcdc Fix CID 1164646 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
de072cc01e Format OpenCL code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 18:32:53 +02:00
Stefan Weil
740a821c76 Fix CID 1393673 (Ignoring number of bytes read)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:25:09 +02:00
Stefan Weil
075dc984e9 Fix CID 1393671 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:10:25 +02:00
Stefan Weil
8f33d10bfb Fix CID 1393670 (Resource leak)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:10:16 +02:00
Stefan Weil
12a601fffa Fix CID 1393669 (Resource leak)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
1de55c8604 Fix CID 1393668 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
1e1f35cd5c Fix CID 1393667 (Copy into fixed size buffer)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
85794ca188 Fix CID 1393666 (Big parameter passed by value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
3d2f73503e Fix CID 1393665 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
a95917a6a4 Fix CID 1393664 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:21 +02:00
Stefan Weil
c9737c7f93 Fix CID 1393663 (Big parameter passed by value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:20 +02:00
Stefan Weil
53596f7837 Fix CID 1393662 (Resource leak)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 15:07:19 +02:00
Stefan Weil
fcff2f2ce2 Fix CID 1242849 (Unused value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 11:46:19 +02:00
Stefan Weil
eabd10d8f2 Fix CID 1158180 (Argument cannot be negative) and clean code a bit
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 11:41:41 +02:00
Stefan Weil
4cc103cd42 Fix CID 1157757 (Logically dead code)
deviceNameStart cannot be NULL here.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-30 10:36:45 +02:00
Stefan Weil
36c985b715 Fix CID 1164746 (Big parameter passed by value)
Use std::vector instead of GenericVector.

Fix also several signed / unsigned compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-29 22:24:00 +02:00
Stefan Weil
20cd6d2328 dotproductsse: Fix include statements
The changes are based on an analysis done with include-what-you-use.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 19:15:37 +02:00
Stefan Weil
9bb5a87760 Remove stderr.h and its include statements
MEMORY_OUT is no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:14:20 +02:00
Stefan Weil
db7f2009d9 Remove memry.cpp, memry.h
The proprietary memory allocators alloc_string, alloc_mem
are no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:13:59 +02:00
Stefan Weil
cda04b1d6d tordmain: Replace alloc_mem, free_mem by C++ std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:13:59 +02:00
Stefan Weil
3032b65b48 pithsync: Replace alloc_mem, free_mem by C++ std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:13:59 +02:00
Stefan Weil
cb9eec355b oldbasel: Replace alloc_mem, free_mem by C++ new, delete, std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:13:59 +02:00
Stefan Weil
77db9b4390 makerow: Replace alloc_mem, free_mem by C++ new, delete, std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 15:17:02 +02:00
Stefan Weil
556a1c1e28 qspline: Replace alloc_mem, free_mem by C++ new, delete
Remove unneeded assignments and a wrong comment in the destructor.
Fix wrong data type for local variable xstarts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:55:36 +02:00
Stefan Weil
52218c3d99 pitsync1: Remove unneeded include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:35:46 +02:00
Stefan Weil
9b2dc5c25a gap_map: Replace alloc_mem, free_mem by C++ new, delete
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:33:46 +02:00
Stefan Weil
20e243d5c9 strngs: Replace alloc_mem, free_mem by standard functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:31:48 +02:00
Stefan Weil
8953f4149f qspline: Remove unneeded include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:30:23 +02:00
Stefan Weil
b282d2cb16 adaptions: Remove unneeded include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:28:04 +02:00
Stefan Weil
f99be62c4c coutln: Replace alloc_mem, free_mem by standard functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:27:35 +02:00
Stefan Weil
7768f9b336 Clean more include files and include statements
The changes are based on an analysis done with include-what-you-use.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
a32d24fa65 Remove empty tessbox.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
91522dfba5 Remove memry.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Stefan Weil
1a151781ea Clean some include statements
The changes are based on an analysis done with include-what-you-use.

Replace also some standard header files by the corresponding
standard C++ header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Egor Pugin
15f64e0232 Remove recursive header. 2018-06-23 17:32:42 +03:00
Stefan Weil
484a1be98a Remove unneeded include statements for scanutils.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-22 19:16:08 +02:00
Stefan Weil
11f2b12fda Remove arch header files from public API
The arch header files are only used in the Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 21:46:48 +02:00
Stefan Weil
2bafff4c64 Remove LSTM header files from public API
The LSTM header files are only used in the Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 21:46:48 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826 Clean usage of assert.h
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff Remove public API file ndminx.h
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
0cb128d56b Remove errcode.h from public API
It is no longer needed by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 06:20:26 +02:00
Stefan Weil
44450094c3 Replace ASSERT_HOST in genericvector.h
genericvector.h used a mix of assert and ASSERT_HOST.

By using assert only, it does no longer depend on errcode.h
which defines the ASSERT_HOST macro.

Other files which still use ASSERT_HOST now need an explicit
include statement for errcode.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 22:32:17 +02:00
Stefan Weil
2a5a092469 Fix CID 1393241 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
09976e6125 Fix CID 1393238 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
27a5908a55 Fix CID 1393239 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
f482ebdca1 Fix CID 1393243 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 20:06:28 +02:00
Stefan Weil
2ceb200186 Fix CID 1393244 and CID 1393244 (Uninitialized scalar variable)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 19:28:04 +02:00
Stefan Weil
d6391ee811 Fix CID 1393540 (Explicit null dereferenced)
Coverity Scan does not like incrementing of a null pointer,
so increment an index value instead of a pointer.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 17:32:02 +02:00
Stefan Weil
e87e8967d7 Remove more header files from public API
Install only those headers which are needed by third party applications.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 11:54:38 +02:00
Stefan Weil
c1c87d73ee Require tesseract/ for API header files (fixes potential name conflicts)
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like

    #include "capi.h"

must now change that to

    #include "tesseract/capi.h"

This avoids name conflicts with header files from other projects.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-17 22:01:19 +02:00
Amit D
6f85de22bc
WordFontAttributes: Check that word != nullptr earlier. Fix #1665 2018-06-13 23:38:27 +03:00
Egor Pugin
8b64602a86
Merge pull request #1660 from Shreeshrii/master
Change default width for images output by text2image
2018-06-11 14:23:22 +03:00
Shreeshrii
a27e91c4f9
Update tesstrain_utils.sh 2018-06-11 09:35:14 +05:30
Shreeshrii
fdc243b363
Change default width for images output by text2image
Fixes
Image too large to learn!! Size = 2594x48
Image not trainable

See https://github.com/tesseract-ocr/tesseract/issues/590#issuecomment-271244655
for related discussion
2018-06-11 09:34:07 +05:30
Stefan Weil
fcdcba70f4 Remove some header files from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-10 16:19:58 +02:00
Stefan Weil
5812972775 block_edges: Add assertions for block coordinates
Check whether the top right point of the block is inside of the
thresholded image t_pix. Otherwise the following code would make
illegal memory accesses.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 14:06:33 +02:00
Egor Pugin
cd58a861d9
Merge pull request #1653 from stweil/typo
scanutils: Fix typos in comments
2018-06-09 11:00:22 +03:00
Stefan Weil
a709018e94 capi: Fix regression caused by use of bool data type
Commit 87d33b6c9e added code which uses bool.
Therefore stdbool.h must be included for compilations with a C compiler.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 08:45:45 +02:00
Stefan Weil
02277bed34 scanutils: Fix typos in comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-09 07:53:20 +02:00
zdenop
e7c1e0739c
Merge pull request #1649 from stweil/locale
Test for correct locale settings
2018-06-08 19:02:38 +02:00
Stefan Weil
3292484f67 Test for correct locale settings
Normal C++ programs like those which are built for tesseract automatically
set the locale "C".

There can be different locale settings if the tesseract library is used
in other software.

A wrong locale can cause wrong results from sscanf which is used at
different places in the tesseract code, so make sure that we have the
right locale settings and fail if that is not the case.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 17:40:10 +02:00
Stefan Weil
280db06bbf scanutils: Fix illegal memory access
Format strings which contain "%*s" show this error in Valgrind:

==32503== Conditional jump or move depends on uninitialised value(s)
==32503==    at 0x2B8BB0: tvfscanf(_IO_FILE*, char const*, __va_list_tag*) (scanutils.cpp:486)
==32503==    by 0x2B825A: tfscanf(_IO_FILE*, char const*, ...) (scanutils.cpp:234)
==32503==    by 0x272B01: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:54)
==32503==    by 0x1753CD: tesseract::Tesseract::SegmentPage(STRING const*, BLOCK_LIST*, tesseract::Tesseract*, OSResults*) (pagesegmain.cpp:115)
==32503==    by 0x1363CD: tesseract::TessBaseAPI::FindLines() (baseapi.cpp:2291)
==32503==    by 0x130CF1: tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) (baseapi.cpp:802)
==32503==    by 0x1322D3: tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1176)
==32503==    by 0x131A84: tesseract::TessBaseAPI::ProcessPagesMultipageTiff(unsigned char const*, unsigned long, char const*, char const*, int, tesseract::TessResultRenderer*, int) (baseapi.cpp:1013)
==32503==    by 0x132052: tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1129)
==32503==    by 0x131B1E: tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1032)
==32503==    by 0x12E00C: main (tesseractmain.cpp:537)
==32503==  Uninitialised value was created by a stack allocation
==32503==    at 0x272A60: read_unlv_file(STRING, int, int, BLOCK_LIST*) (blread.cpp:41)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 15:28:30 +02:00
zdenop
d47cebcdc8
Merge pull request #1641 from stweil/fix
training: Add missing linefeed to error message
2018-06-06 22:13:26 +02:00
Stefan Weil
0215d91f45 training: Add missing linefeed to error message
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-06 21:32:16 +02:00
zdenop
ee2ab73224
Merge pull request #1637 from paulk124/master
Reserve extra byte in LoadDataFromFile() in case caller wants to appe…
2018-06-05 16:57:40 +02:00
Paul Kitchen
805fb7699d Reserve extra byte in LoadDataFromFile() in case caller wants to append '\0' 2018-06-05 08:19:41 -06:00
Stefan Weil
52fddc3ca9 TFile: Relax assertion and allow FRead, FWrite with count == 0
The assertions introduced by commit 8bea6bcc12
were too strict. The first one failed in osd_test, the second one failed
in `tesseract IMAGE BASE --psm 13 lstm.train`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 22:42:19 +02:00
Egor Pugin
83ae900549
Merge pull request #1629 from stweil/bool
src/training: Replace more proprietary BOOL8 by standard bool data type
2018-06-04 18:54:31 +03:00
Stefan Weil
4f3b266efe src/training: Replace more proprietary BOOL8 by standard bool data type
Update also callers of the modified functions to use
false / true instead of 0 / 1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
b292013bdc cntraining: Replace proprietary BOOL8 by standard bool data type
Add also "static" attribute to local functions and remove an old comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 16:08:03 +02:00
Stefan Weil
8bea6bcc12 TFile: Improve handling of potential integer overflow
Raise an assertion for unexpected arguments and use size_t instead of int
for the size argument which is typically sizeof(some_datatype).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-04 13:53:36 +02:00
Stefan Weil
f2698c256d src/training: Replace proprietary BOOL8 by standard bool data type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-03 21:13:40 +02:00
Stefan Weil
629ded223c tesseractmain: Allow combinations of the different help options
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
724a72a278 tesseractmain: Always use EXIT_SUCCESS and EXIT_FAILURE macros for exit status
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
b5ac8502bc tesseractmain: EXIT_FAILURE if tesseract is called without arguments
When Tesseract is called without any argument, the help message is still
printed, but the exit status no longer indicates success (EXIT_OK).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
Stefan Weil
6dba34dd8c tesseractmain: No command line options between image and outputbase
The image name and the outputbase should not be separated by
command line options.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-02 09:03:56 +02:00
zdenop
e313ed1bb9
Merge pull request #1614 from j-kubik/master
Recognition progress in C API
2018-06-02 08:54:21 +02:00
Stefan Weil
6f7206f574 tesseractmain: Remove unneeded duplicate code
The --list-langs option is already handled by other code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-01 20:45:53 +02:00
Stefan Weil
d4ed0f841a tesseractmain: Fail if bad command line option is given
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-01 20:04:35 +02:00
Jaroslaw Kubik
e6c9967b83 Fixed a typo in progres monitor C API
TessMonitorcDelete -> TessMonitorDelete
2018-06-01 19:42:28 +02:00
Jaroslaw Kubik
e254c9fa38 Return a value from default progress report function
The progress reporting function returns a boolean. The returned
value is never used by the tesseract and its meaing is not
documented, which renders the value meaningless. Still, lack of
return should not be premitted.
2018-05-30 10:44:13 +02:00
Jaroslaw Kubik
8f6242fd4f Fixed a typo in the C API progress monitor 2018-05-30 00:22:06 +02:00
Jaroslaw Kubik
87d33b6c9e Add progress monitoring C api
The C API is missing the ability to monitor the progress of the
recognition. This patch adds C wrappers to the progress monitor
that allow monitoring the progress and canceling the recognition
process early.
2018-05-29 23:26:41 +02:00
Jaroslaw Kubik
217e5e5881 Add a context-aware progress monitor pointer
The progress_callback field in the ETEXT_DESC monitor type does not
take any 'context' parameter, which may make implementing callback
functions difficult and may require use of global variables.
The new function receives the ETEXT_DESC pointer as an argument.
This makes it possible to share the cancel_this field as context
carrier if required.
The change is backwards-compatible: the old pointer remains as a
member of the class, and the default value for the new pointer is
a function calling the classic progress notifier. This way the code
unaware of the new member will continue to work as before.
2018-05-29 21:48:51 +02:00
Egor Pugin
0c50ae3a9c Fix windows build. 2018-05-29 19:15:01 +03:00
Stefan Weil
509a6f0ce0 Fix some typos (most found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-27 18:49:43 +02:00
Alexander
a2e72f258a
Remove unused variable 2018-05-26 22:20:45 +03:00
Alexander
69fbb52930
Merge branch 'master' into fix_smart_pointers 2018-05-26 00:37:07 +03:00
Egor Pugin
5a56d0c291
Merge pull request #1588 from ZaMaZaN4iK/fix_own_bool
Use standard bool instead of BOOL8.
2018-05-23 16:53:12 +03:00
Egor Pugin
78857cab8b
Merge pull request #1591 from ZaMaZaN4iK/fix_default_two
Use default keyword instead of empty ctors/dtors.
2018-05-22 21:59:48 +03:00
Alexander Zaitsev
58e8538138 Add more std::unique_ptr 2018-05-22 17:55:45 +03:00
Alexander Zaitsev
df49d470ca Use std::unique_ptr instead of manual memory management. 2018-05-22 14:36:37 +03:00
Stefan Weil
cdf035d9b1 Fix compilation with g++-5
Commit 0248c7ff9d replaced math.h by cmath.
Therefore isinf and isnan are no longer declared.
Replace them by their C++ 11 variant.

Signed-off-by: Stefan Weil <stweil@ub-blade-02.bib.uni-mannheim.de>
2018-05-21 22:09:48 +02:00
Alexander Zaitsev
d14a7ca043 Use default keyword instead of empty ctors/dtors. Add more default. 2018-05-21 14:11:03 +03:00
Alexander Zaitsev
785b5e8134 Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
Alexander Zaitsev
abca191293 Add missing file change. 2018-05-21 00:43:22 +03:00
Alexander Zaitsev
6ff0b56597 More fixes BOOL8 -> bool 2018-05-21 00:40:58 +03:00
Stefan Weil
fedae91482 ColPartition: Add missing initialisation for median_left, median_right
The following code caused a crash when Tesseract was compiled with -ftrapv:

1259	  int width = right - left;

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff665c231 in __GI_abort () at abort.c:79
#2  0x00007ffff69e34d8 in __subvsi3 () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000055555560c1c5 in tesseract::ColPartitionGrid::FindVPartitionPartners (this=0x55555717e3c0, to_the_left=true, part=0x5555571fa380)
    at ../../../src/textord/colpartitiongrid.cpp:1259
#4  0x000055555560bda0 in tesseract::ColPartitionGrid::FindPartitionPartners (this=0x55555717e3c0) at ../../../src/textord/colpartitiongrid.cpp:1196
#5  0x00005555555f52b6 in tesseract::ColumnFinder::FindBlocks (this=0x55555717e280, pageseg_mode=tesseract::PSM_AUTO, scaled_color=0x0, scaled_factor=-1,
    input_block=0x555555f91390, photo_mask_pix=0x555555f73300, thresholds_pix=0x555555f76290, grey_pix=0x555555f762e0, pixa_debug=0x7ffff7fc88d8, blocks=0x7fffffffd250,
    diacritic_blobs=0x7fffffffd330, to_blocks=0x7fffffffd328) at ../../../src/textord/colfind.cpp:431
#6  0x00005555555c240d in tesseract::Tesseract::AutoPageSeg (this=0x7ffff7fa5010, pageseg_mode=tesseract::PSM_AUTO, blocks=0x555555f761d0, to_blocks=0x7fffffffd328,
    diacritic_blobs=0x7fffffffd330, osd_tess=0x0, osr=0x7fffffffd6d0) at ../../../src/ccmain/pagesegmain.cpp:229
#7  0x00005555555c1ffd in tesseract::Tesseract::SegmentPage (this=0x7ffff7fa5010, input_file=0x555555f7bd90, blocks=0x555555f761d0, osd_tess=0x0, osr=0x7fffffffd6d0)
    at ../../../src/ccmain/pagesegmain.cpp:141
#8  0x0000555555582540 in tesseract::TessBaseAPI::FindLines (this=0x555555a9a580 <main::api>) at ../../../src/api/baseapi.cpp:2291
#9  0x000055555557ce42 in tesseract::TessBaseAPI::Recognize (this=0x555555a9a580 <main::api>, monitor=0x0) at ../../../src/api/baseapi.cpp:802

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-20 23:39:18 +02:00
Alexander Zaitsev
6f580bad77 Add miss changes to bool fixes. 2018-05-20 23:04:03 +03:00
Alexander Zaitsev
a040bc2da5 Use standard bool instead of BOOL8. 2018-05-20 22:46:46 +03:00
Stefan Weil
bb34351fdb Remove remaining deprecated "register" keyword
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-20 20:49:08 +02:00
Alexander Zaitsev
6049225d01 Merge remote-tracking branch 'my_repo/small_fixes' into small_fixes 2018-05-20 18:48:30 +03:00
Alexander Zaitsev
d54d7486b4 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 17:49:48 +03:00
Alexander Zaitsev
14ae0b8727 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 16:18:07 +03:00
Alexander Zaitsev
c34e145b1a Use numeric_limits instead of INT_MAX. 2018-05-20 14:49:35 +03:00
Alexander Zaitsev
7d08e117d8 Added more const. 2018-05-20 14:21:07 +03:00
Alexander Zaitsev
e7e8e20119 Remove deprecated in C++11 'register' keyword (removed since C++17). 2018-05-20 01:49:26 +03:00
Alexander Zaitsev
a50b966af5 Added more const and one small fix. 2018-05-20 01:43:43 +03:00
Alexander Zaitsev
0697235bb2 Use using instead of typedef. Reason: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rt-using 2018-05-20 01:31:03 +03:00
Alexander Zaitsev
0248c7ff9d Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
Alexander Zaitsev
96f8f853c8 Small enhancements (adding const, etc.) 2018-05-19 23:07:28 +03:00
zdenop
6f4e195489
Merge pull request #1581 from Shreeshrii/patch-1
Copy .box and .tif files along with .lstmf files from /tmp
2018-05-19 13:17:49 +02:00
Robert Clayton
663b4d2d4b
Update Makefile.am 2018-05-19 01:43:31 -05:00
Robert Clayton
684e875612
Update Makefile.am 2018-05-19 01:42:57 -05:00
Stefan Weil
c8f8f6365c TabFind: Change order of initialization code
This fixes a compiler warning:

warning: ‘tesseract::TabFind::v_it_’ will be initialized after [-Wreorder]
warning:   ‘ICOORD tesseract::TabFind::image_origin_’ [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-18 12:08:22 +02:00
Stefan Weil
6436a69677 BLOCK: Change order of initialization code
This fixes a compiler warning:

warning: ‘BLOCK::filename’ will be initialized after [-Wreorder]
warning:   ‘PDBLK BLOCK::pdblk’ [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-18 12:08:10 +02:00
Shreeshrii
6c08ec02e4
Copy .box and .tif files along with .lstmf files from /tmp 2018-05-17 22:45:22 +05:30
zdenop
45a6546324
Merge pull request #1569 from noahmetzger/winfix
Added downward compatibility for older APIs
2018-05-08 21:43:46 +02:00
Noah Metzger
43d47f3583 Added downward compatibility for older APIs
The commit effa574 in 20.01.2017 added the bool textonly to the constructor of TessPDFRenderer. To maintain the compatibility to older APIs which are still using only two parameter, a default value for the textonly parameter is set.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-05-08 17:22:06 +02:00
Stefan Weil
932a108b4d Revert "fixes #388 by using raw bytes utf8 encoding"
This reverts commit 941e1c4c84. It is no
longer needed since commit f54800f14b.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-07 06:06:42 +02:00
Stefan Weil
7cf7d62929 Fix CID 1390821 (Uninitialized variable)
It was introduced by my latest commit 21d5ce5717.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-04 17:57:37 +02:00
Stefan Weil
11609f9509 Fix CID 1386109 (Logically dead code)
The else statement is never executed.

Remove also an unused element from the names array
and add the "static" attribute.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 18:32:42 +02:00
zdenop
c3ed6f0360
Merge pull request #1556 from noahmetzger/winfix
Fixed CID 1164537 (possible division by zero)
2018-05-03 17:45:24 +02:00
Noah Metzger
2193f81702 Fixed CID 1164537 (possible division by zero)
If height_count stays zero the maximal error calculation contains a division by zero.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-05-03 14:55:41 +02:00
Stefan Weil
c9b585cfc5 Don't disable compiler warnings for Visual Studio
It's still possible to set the warning level in the project settings,
but single source files should normally not disable compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 14:26:20 +02:00
zdenop
9ae97508ae
Merge pull request #1551 from stweil/bigendian
Fix Tesseract for big endian machines
2018-05-03 08:22:32 +02:00
Stefan Weil
dc3d28ccd7 Use more override specifiers
Now all methods which override Network methods use the override specifier.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 08:06:00 +02:00
Stefan Weil
21d5ce5717 Fix issue with big endian handling
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 07:19:59 +02:00
Stefan Weil
9c1fe092f1 Add assertion to detect wrong endianness handling
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 07:18:55 +02:00
Noah Metzger
a7d1402e5d Fixed access to uninitialized variable
Coverity ID: 1386084 the set_font method has accessed resolution_ before it was initialized by the set_resolution method.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-05-02 16:11:35 +02:00
Stefan Weil
0efc528684 Remove unneeded include statements for string / strings.h
Tesseract code does not use strings.h (strngs.h was once called strings.h),
so that dependency can also be removed from cmake and cppan configuration.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-30 18:16:34 +02:00
Stefan Weil
950469e645 Remove old hack for Visual Studio
It should not be needed with newer versions of Visual Studio.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-30 15:21:04 +02:00
Sebastian Buchwald
4ac3063cbf Add missing override specifiers 2018-04-27 22:59:19 +02:00
Stefan Weil
fbeb55cd4e Fix CID 1164526 (Resource leak in object)
stream_ was allocated in the constructor,
but the destructor did not free it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-26 18:21:12 +02:00
Stefan Weil
b87fc523ca Fix CID 1386084 (Uninitialized scalar variable)
The set_font method used the uninitialized member variable resolution_.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-26 18:02:43 +02:00
Stefan Weil
e2135de022 Fix CID 1385633 (Dereference before null check)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-26 17:18:15 +02:00
Stefan Weil
4f9493c409 Partial fix for autotools configuration after source tree reorganisation
This should fix "make" and "make training".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 21:33:28 +02:00
Stefan Weil
dabf3c299f Fix file endings
Text files should end with a LF, but not additional empty lines.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:35:33 +02:00
Stefan Weil
9ceb0c6430 Fix line endings
Replace DOS line endings (CRLF) by standard (LF only).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-25 19:04:50 +02:00
Egor Pugin
104fe7931c Move training to src. 2018-04-25 11:35:26 +03:00
Egor Pugin
e95ff1159e Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00