Commit Graph

28 Commits

Author SHA1 Message Date
Stefan Weil
d75ef80f12 Get sorted list of available languages
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.

Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 14:07:03 +02:00
Stefan Weil
e232114089 Fix use of undefined macro USE_DEVICE_SELECTION
This fixes compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 13:58:12 +02:00
Stefan Weil
bb181ec8d3 Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
df7d1e1f97 Rename API function for getting LSTM choices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Noah Metzger
c13371d6e0 Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
d86d520fd0 Remove tab character in source files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
Stefan Weil
67bf9062df Rework check for readable input file
This reverts commit 1a096441d0 and
implements an alternate check which allows input from stdin.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 22:33:02 +02:00
zdenop
a0564fd4ec Allow user to specify dpi for input image 2018-09-28 20:28:52 +02:00
zdenop
5fe1390748 remove alpha channel from png: issue #1914 2018-09-27 19:40:15 +02:00
Zdenko Podobný
5d22fdfeed replace deprecated C++ headers (reported by clan-tidy) - partially supersedes PR #1605 2018-09-18 18:51:11 +02:00
Stefan Weil
be1393b1e8 Replace macro MINGW by __MINGW32__
MINGW is no longer used and now removed from configure.ac.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 16:05:27 +02:00
Noah Metzger
663be426f6 Added the option for character accumulated glyph confidences.
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-20 10:43:58 +02:00
Noah Metzger
91c7504a35 Added a feature to enrich the hOCR output with glyph confidences
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.

The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-25 18:18:58 +02:00
Stefan Weil
55f0ca5842 Add missing include statements and clean some include statements
The changes are based on an analysis done with include-what-you-use.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-07 16:24:53 +02:00
Stefan Weil
d2febafdcd Fix compiler warnings [-Wmissing-prototypes]
Add missing include statements, add missing "static" qualifiers or
remove functions which are not used at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 16:03:02 +02:00
Stefan Weil
a74d467e90 Fix compiler warnings [-Wcomma]
clang warnings:

src/api/baseapi.cpp:1642:18: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:31: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1642:45: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:16: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1652:30: warning:
 possible misuse of comma operator here [-Wcomma]
src/api/baseapi.cpp:1662:17: warning:
 possible misuse of comma operator here [-Wcomma]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-05 12:07:04 +02:00
Amit D
62c7b796da
Merge branch 'master' into disable-legacy 2018-07-04 11:14:33 +03:00
amitdo
aa9f4b4861 Add an option to compile tesseract without the code of the legacy OCR engine 2018-07-03 18:49:42 +03:00
Stefan Weil
f7b61891bc Replace macro PI by macro M_PI
One definition for pi is sufficient.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-02 21:26:53 +02:00
Stefan Weil
e8e94d372c Fix CID 1340287 (Unchecked return value)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
a49b8f1d21 Fix CID 1297960 (Dereference after null check)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
86eb4dfcdc Fix CID 1164646 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-01 07:54:11 +02:00
Stefan Weil
a32d24fa65 Remove empty tessbox.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-24 19:45:12 +02:00
Stefan Weil
1371980f9f Replace string.h by standard C++ cstring
Remove the unneeded include statement in platform.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-21 20:40:26 +02:00
Stefan Weil
27a5908a55 Fix CID 1393239 (Dereference null return value)
Add also some error handling if fopen fails.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-20 21:17:02 +02:00
Stefan Weil
3292484f67 Test for correct locale settings
Normal C++ programs like those which are built for tesseract automatically
set the locale "C".

There can be different locale settings if the tesseract library is used
in other software.

A wrong locale can cause wrong results from sscanf which is used at
different places in the tesseract code, so make sure that we have the
right locale settings and fail if that is not the case.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-08 17:40:10 +02:00
Alexander Zaitsev
d54d7486b4 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 17:49:48 +03:00
Egor Pugin
e95ff1159e Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00