Commit Graph

1110 Commits

Author SHA1 Message Date
Zdenko Podobný
8daef71a83 added row attributes to hocr output 2016-02-05 11:20:01 +01:00
Hamid Safdari
0968896fc6 correct minor syntax errors language-specific.sh 2016-02-05 11:19:19 +01:00
Stefan Weil
f684a77529 Fix compiler warnings (remove unused constants)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:19:07 +01:00
Amit Dovev
3379973063 Update Makefile.am 2016-02-05 11:18:59 +01:00
Stefan Weil
369f5b472c Get tessdata prefix from executable path (only for Windows)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:18:48 +01:00
amitdo
fe5ee13229 Add missing ')'_to make the code compile 2016-02-05 11:18:40 +01:00
amitdo
270214e667 If there is no explicit renderer(s), default to TessTextRenderer
Revert fd429c32, 43834da7, 05de195e.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
2016-02-05 11:18:34 +01:00
Stefan Weil
d7b6c9655f Fix grammar in license file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:30 +01:00
Stefan Weil
4a7cf319fc tesseractmain: Prettify help message
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.

* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
  (partially revert commit 99110df757).

In addition, whitespace characters at end of lines were removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:08 +01:00
Nick White
8a08da7d35 Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.

So whereas previously one would have used this:
  --fontlist Times New Roman + Arial Black
Now they should be specified like this:
  --fontlist "Times New Roman" "Arial Black"
2016-02-05 11:16:56 +01:00
Nick White
796188072a Set default exposure settings for grc training 2016-02-05 11:16:45 +01:00
Nick White
af9c969818 Remove NUMBER_DAWG_FACTOR and WORD_DAWG_FACTOR from grc rules
These aren't used anywhere, and are difficult to calculate for grc,
so leave them as the default.
2016-02-05 11:16:38 +01:00
Nick White
890ec2876b Use different font list for grc training
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
2016-02-05 11:16:22 +01:00
Stefan Weil
b848caa151 Fix free of buffer which was not allocated
Coverity bug report: CID 1270420 "Free of address-of expression"

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:16:08 +01:00
Stefan Weil
613140a1ac pdfrenderer: Fix uninitialized local variables
Coverity bug reports:

CID 1270405: Uninitialized scalar variable
CID 1270408: Uninitialized scalar variable
CID 1270409: Uninitialized scalar variable
CID 1270410: Uninitialized scalar variable

Those variables are set conditionally in the while loop
and must keep their values in following iterations, so
they must be declared outside of the loop.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:54 +01:00
amitdo
d36ee9c4d0 tesseractmain.cpp: Split huge main() to sub functions
Add these functions to api/tesseractmain.cpp:
PrintVersionInfo()
PrintUsage()
PrintHelpForPSM()
PrintHelpMessage()
SetVariablesFromCLArgs()
PrintLangsList()
FixPageSegMode()
ParseArgs()
PreloadRenderers()
2016-02-05 11:15:38 +01:00
Stefan Weil
9bdaa0ad5a Fix duplicate fclose
Coverity bug report: CID 1270401 (#1 of 1): Use after free

As the comment (which was also fixed) says, ReadNextBox() already
calls fclose(box_file), so don't call it a 2nd time.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:16 +01:00
Stefan Weil
8c4b027292 tesseractmain: Fix unterminated string
Coverity bug report: CID 1270421 "Buffer not null terminated".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:06 +01:00
Stefan Weil
af9212c459 ccmain: Remove unused private class member
This fixes a warning from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:55 +01:00
Stefan Weil
56c2347e98 Remove checks for this == NULL
This fixes warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:41 +01:00
Stefan Weil
c6b758b11d Remove register attribute for local variables
This fixes clang compiler warnings like this one:

wordrec/gradechop.cpp:52:3: warning:
 'register' storage class specifier is deprecated [-Wdeprecated-register]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:29 +01:00
Stefan Weil
c8114811a5 Fix compiler warnings for copy constructors
gcc reports these warnings with -Wextra:

ccstruct/pageres.h:330:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.cpp:115:1: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.h:291:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccutil/genericvector.h:435:3: warning:
 base class 'class GenericVector<WERD_RES*>' should be explicitly initialized
 in the copy constructor [-Wextra]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:51 +01:00
Stefan Weil
18939a725f ccstruct: Fix compiler warning (disable buggy code)
gcc reports a potential bad array access:

ccstruct/mod128.cpp:98:20: warning:
 array subscript has type 'char' [-Wchar-subscripts]

dir is of type 'char'. Most compilers use signed char by default.
Then the value of dir is in the range -128 ... 127 and cannot be
used to access an array with 256 elements.

Don't fix that but disable the buggy code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:41 +01:00
Stefan Weil
cd946dc30d api: Fix printing of a size_t value
size_t is not always the same as long, especially not for 64 bit Windows:

api/pdfrenderer.cpp:549:31: warning:
 format '%ld' expects argument of type 'long int',
 but argument 4 has type 'size_t {aka long long unsigned int}' [-Wformat=]

size_t normally requires a format string "%zu", but this is unsupported
by Visual Studio, so use a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:34 +01:00
Stefan Weil
c0f4e86ef5 Fix case of include file name
Windows.h works on Windows, but not for cross builds on Linux hosts
with case sensitive file systems which only provide windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:23 +01:00
Stefan Weil
f7368ecb14 Don't use NULL for integer arguments
This fixes compiler warnings:

api/baseapi.cpp:1422:49: warning:
 passing NULL to non-pointer argument 6 of
 'int MultiByteToWideChar(UINT, DWORD, LPCCH, int, LPWSTR, int)'
 [-Wconversion-null]
api/baseapi.cpp:1427:54:
 warning: passing NULL to non-pointer argument 6 of
 'int WideCharToMultiByte(UINT, DWORD, LPCWCH, int, LPSTR, int, LPCCH, LPBOOL)'
 [-Wconversion-null]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:15 +01:00
Stefan Weil
1f4c8d0567 Remove unneeded const qualifiers
This fixes compiler warnings like this one:

api/baseapi.h:739:32: warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:00 +01:00
Stefan Weil
03a6e516ca viewer: Fix typos in comments
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:46 +01:00
Stefan Weil
9cbda9238e training: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:39 +01:00
Stefan Weil
9daf61f4d9 textord: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:29 +01:00
Stefan Weil
40dc71676b testing: Fix typo in comment (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:21 +01:00
Stefan Weil
02a071c593 opencl: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:02 +01:00
Stefan Weil
32d179e0a6 Fix more typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:11:54 +01:00
Zdenko Podobný
1890ba5f2a autotools: fail if g++ or clang++ compiler is not found; Fixes #130 (commit 34f34ead) 2016-02-05 11:11:39 +01:00
Stefan Weil
7e9a7827c1 viewer: Fix format string
Variable port is an int, so "%d" is needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:03:10 +01:00
Stefan Weil
3de9dd91f5 cube: Use local variable which was reported as unused
The local variable first_lower was assigned a value which was not used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:03:02 +01:00
Stefan Weil
7a14c0114f ccmain: Remove unused local variables
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:02:53 +01:00
Zdenko Podobný
90b5947b5f Detect presence of 'off_t' by configure test (partial cherry-pick from 87c21aaa5c) 2016-02-05 11:02:15 +01:00
Felix Janda
24fe797734 viewer/svutils.cpp: Include <sys/select.h> for FD_SET, ... 2016-02-05 10:58:40 +01:00
amitdo
0bb5a7d6f0 Added osd renderer for psm 0.
Works for single page and multi-page.
2016-02-05 10:58:29 +01:00
ws233
bceb532a2f Type mismatch on a 64bit platforms has been fixed. 2016-02-05 10:58:11 +01:00
amitdo
79ed9a30c7 OSD: Print script name instead of meaningless script id 2016-02-05 10:57:45 +01:00
John Slade
0fdfa98c0f training/unicharset_extractor.cpp: Print whether WCTYPE is included
Character properties are autogenerated only if wctype is found on the
system.  However, it is not possible to know if a version of
unicharset_extractor was compiled with this support (especially if it
was installed as a pre-compiled binary).

This commit adds a print to the usage details to output if the binary
was compiled with wctype support.
2016-02-05 10:56:26 +01:00
John Slade
85c404e582 configure.ac: Detect wchar_t using wchar.h header
The wchar_t type is defined in `wchar.h` and if this header is not
included by autoconf the detection of the type will fail.  This type is
required by `unicharset_extractor` to autogenerate the character
properties.

This problem was detected when running under Fedora 21.
2016-02-05 10:56:18 +01:00
Pepe Bawagan
a153a51f39 adds sudo to "make install" command
for consistency with instructions that show up while installing
2016-02-05 10:55:52 +01:00
Stefan Weil
67c7d4a2cb wordrec: Fix typos in comments
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 10:55:32 +01:00
Stefan Weil
bb2e239989 Java: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 10:55:19 +01:00
Stefan Weil
4a85f4b6bd Doxyfile: Fix typo in comment (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 10:55:11 +01:00
Stefan Weil
e9cf8cf95e dict: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 10:55:03 +01:00
Stefan Weil
72ee298e4d cutil: Fix typos in comments
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 10:54:51 +01:00