Commit Graph

56 Commits

Author SHA1 Message Date
Noah Metzger
91c7504a35 Added a feature to enrich the hOCR output with glyph confidences
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.

The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-25 18:18:58 +02:00
Stefan Weil
8bd9567355 Fix some comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:19:01 +02:00
Stefan Weil
609edd4600 Add missing include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 21:19:00 +02:00
Stefan Weil
55f0ca5842 Add missing include statements and clean some include statements
The changes are based on an analysis done with include-what-you-use.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 16:24:53 +02:00
Egor Pugin
a078ce02bb
Merge pull request #1756 from stweil/cov
Fix two issues reported by Coverity Scan
2018-07-05 23:10:39 +03:00
Stefan Weil
036c72ca2f Fix CID 1164733 (Resource leak)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 22:07:36 +02:00
Stefan Weil
8d60a1849c Remove unused iterator
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 21:20:27 +02:00
Stefan Weil
d2febafdcd Fix compiler warnings [-Wmissing-prototypes]
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 16:03:02 +02:00
Stefan Weil
bdf09f40b1 Fix compiler warnings [-Wzero-as-null-pointer-constant]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 20:40:56 +02:00
Stefan Weil
081793ff48 Fix build with legacy engine disabled
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
Amit D
62c7b796da
Merge branch 'master' into disable-legacy 2018-07-04 11:14:33 +03:00
amitdo
aa9f4b4861 Add an option to compile tesseract without the code of the legacy OCR engine 2018-07-03 18:49:42 +03:00
Stefan Weil
6d170a15ec Replace tabs by blanks in source code
blobs.cpp had many tabs and was formatted with clang-format.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 16:29:14 +02:00
Stefan Weil
626a229cac Remove nwmain.h
The macro DECLARE_MAIN is not used by the current Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 15:54:41 +02:00
Stefan Weil
872813245d Replace function DoError and remove danerror.cpp, danerror.h
This allows also removing all error trap macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 13:21:17 +02:00
Stefan Weil
2cd2d3200f Remove functions open_file, exists_file
cutil.cpp is now no longer needed and removed, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-03 06:45:34 +02:00
Stefan Weil
b57afc7c78 Replace Efopen by fopen and remove efio.cpp, efio.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 17:46:28 +02:00
Stefan Weil
abbd78a053 Fix CID 1340271, 1340272, 1340273, 1340274 (Use after free)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 20:18:39 +02:00
Stefan Weil
52b44c5ebf Fix CID 1164530 (Logically dead code)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 20:01:56 +02:00
Stefan Weil
09da044a77 Fix CID 1164553 (Division or modulo by float zero)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 19:27:01 +02:00
Stefan Weil
9bb5a87760 Remove stderr.h and its include statements
MEMORY_OUT is no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 16:14:20 +02:00
Stefan Weil
b282d2cb16 adaptions: Remove unneeded include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-25 14:28:04 +02:00
Stefan Weil
7768f9b336 Clean more include files and include statements
The changes are based on an analysis done with include-what-you-use.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
a32d24fa65 Remove empty tessbox.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
1a151781ea Clean some include statements
The changes are based on an analysis done with include-what-you-use.

Replace also some standard header files by the corresponding
standard C++ header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-23 21:15:54 +02:00
Stefan Weil
484a1be98a Remove unneeded include statements for scanutils.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-22 19:16:08 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
112aeb9826 Clean usage of assert.h
Remove unneeded include statements, remove conditional statements and
replace the remaining assert.h by their standard C++ variant cassert.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 19:31:05 +02:00
Stefan Weil
a9e2574eff Remove public API file ndminx.h
It is not needed for the Tesseract code, and the Tesseract API
should not provide MIN / MAX macros.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 08:33:30 +02:00
Stefan Weil
09976e6125 Fix CID 1393238 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
e87e8967d7 Remove more header files from public API
Install only those headers which are needed by third party applications.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 11:54:38 +02:00
Stefan Weil
c1c87d73ee Require tesseract/ for API header files (fixes potential name conflicts)
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like

    #include "capi.h"

must now change that to

    #include "tesseract/capi.h"

This avoids name conflicts with header files from other projects.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-17 22:01:19 +02:00
Amit D
6f85de22bc
WordFontAttributes: Check that word != nullptr earlier. Fix #1665 2018-06-13 23:38:27 +03:00
Jaroslaw Kubik
217e5e5881 Add a context-aware progress monitor pointer
The progress_callback field in the ETEXT_DESC monitor type does not
take any 'context' parameter, which may make implementing callback
functions difficult and may require use of global variables.
The new function receives the ETEXT_DESC pointer as an argument.
This makes it possible to share the cancel_this field as context
carrier if required.
The change is backwards-compatible: the old pointer remains as a
member of the class, and the default value for the new pointer is
a function calling the classic progress notifier. This way the code
unaware of the new member will continue to work as before.
2018-05-29 21:48:51 +02:00
Stefan Weil
509a6f0ce0 Fix some typos (most found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-27 18:49:43 +02:00
Alexander
69fbb52930
Merge branch 'master' into fix_smart_pointers 2018-05-26 00:37:07 +03:00
Egor Pugin
5a56d0c291
Merge pull request #1588 from ZaMaZaN4iK/fix_own_bool
Use standard bool instead of BOOL8.
2018-05-23 16:53:12 +03:00
Alexander Zaitsev
df49d470ca Use std::unique_ptr instead of manual memory management. 2018-05-22 14:36:37 +03:00
Alexander Zaitsev
d14a7ca043 Use default keyword instead of empty ctors/dtors. Add more default. 2018-05-21 14:11:03 +03:00
Alexander Zaitsev
785b5e8134 Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
Alexander Zaitsev
abca191293 Add missing file change. 2018-05-21 00:43:22 +03:00
Alexander Zaitsev
6ff0b56597 More fixes BOOL8 -> bool 2018-05-21 00:40:58 +03:00
Alexander Zaitsev
6f580bad77 Add miss changes to bool fixes. 2018-05-20 23:04:03 +03:00
Alexander Zaitsev
a040bc2da5 Use standard bool instead of BOOL8. 2018-05-20 22:46:46 +03:00
Alexander Zaitsev
d54d7486b4 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 17:49:48 +03:00
Alexander Zaitsev
c34e145b1a Use numeric_limits instead of INT_MAX. 2018-05-20 14:49:35 +03:00
Alexander Zaitsev
7d08e117d8 Added more const. 2018-05-20 14:21:07 +03:00
Alexander Zaitsev
0697235bb2 Use using instead of typedef. Reason: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rt-using 2018-05-20 01:31:03 +03:00
Alexander Zaitsev
0248c7ff9d Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
Stefan Weil
c9b585cfc5 Don't disable compiler warnings for Visual Studio
It's still possible to set the warning level in the project settings,
but single source files should normally not disable compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-05-03 14:26:20 +02:00