Commit Graph

1126 Commits

Author SHA1 Message Date
Tom Morris
c3ad0de69b Fix varsize array for Microsoft compiler 2016-02-18 09:05:37 +01:00
Amit Dovev
473c30fa87 Update README.md 2016-02-18 09:05:04 +01:00
Zdenko Podobný
255c31fe31 update release date 2016-02-16 22:27:01 +01:00
Tom Morris
134ebc3df3 INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.
2016-02-16 22:26:12 +01:00
Zdenko Podobný
8473e5a262 update autotools files 2016-02-13 00:06:11 +01:00
Zdenko Podobný
56c1a4a21f add option "make training-uninstall" 2016-02-13 00:04:59 +01:00
Zdenko Podobný
ebadb00e4d fix version number => 3.04.01 2016-02-12 23:28:40 +01:00
Amit Dovev
8bdfb5bfc4 Update README.md 2016-02-12 09:59:57 +01:00
James R. Barlow
9cec609bdf Replace pdf.ttf with sharp2.ttf, keep name the same
As discussed at length in issue #182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince.  It does
seem to fix Kindle and OS X Preview.
2016-02-12 09:53:35 +01:00
Zdenko Podobný
0d791d6de7 Fix cygwin build (partial fix from da3852d) 2016-02-05 11:32:37 +01:00
Dennis Schridde
3003411a91 Compatibility with Leptonica 1.73
http://www.leptonica.org/source/version-notes.html:
       Naming changes (to avoid collisions):
         #defines MALLOC --> LEPT_MALLOC, CALLOC --> LEPT_CALLOC, etc.
         ByteBuffer --> L_ByteBuffer

Introduction of the TESSERACT_LIBLEPT_PREREQ macro allows backward compatibility with Leptonica <1.73.
2016-02-05 11:21:55 +01:00
amitdo
337a9b52c4 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-02-05 11:21:42 +01:00
Ryan Baumann
7ffc77b3a2 Add Junicode to neo-Latin fonts 2016-02-05 11:21:20 +01:00
Ryan Baumann
52d9ed4d69 Use different font list and exposures for "lat" language training 2016-02-05 11:21:11 +01:00
amitdo
b60bb806bf Fix #184. Training should work now 2016-02-05 11:20:57 +01:00
Stefan Weil
ac6b17e918 Remove unneeded definition for NULL
NULL is already defined in stddef.h,
so a local definition is not be needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:20:36 +01:00
Zdenko Podobný
8daef71a83 added row attributes to hocr output 2016-02-05 11:20:01 +01:00
Hamid Safdari
0968896fc6 correct minor syntax errors language-specific.sh 2016-02-05 11:19:19 +01:00
Stefan Weil
f684a77529 Fix compiler warnings (remove unused constants)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:19:07 +01:00
Amit Dovev
3379973063 Update Makefile.am 2016-02-05 11:18:59 +01:00
Stefan Weil
369f5b472c Get tessdata prefix from executable path (only for Windows)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:18:48 +01:00
amitdo
fe5ee13229 Add missing ')'_to make the code compile 2016-02-05 11:18:40 +01:00
amitdo
270214e667 If there is no explicit renderer(s), default to TessTextRenderer
Revert fd429c32, 43834da7, 05de195e.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
2016-02-05 11:18:34 +01:00
Stefan Weil
d7b6c9655f Fix grammar in license file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:30 +01:00
Stefan Weil
4a7cf319fc tesseractmain: Prettify help message
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.

* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
  (partially revert commit 99110df757).

In addition, whitespace characters at end of lines were removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:08 +01:00
Nick White
8a08da7d35 Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.

So whereas previously one would have used this:
  --fontlist Times New Roman + Arial Black
Now they should be specified like this:
  --fontlist "Times New Roman" "Arial Black"
2016-02-05 11:16:56 +01:00
Nick White
796188072a Set default exposure settings for grc training 2016-02-05 11:16:45 +01:00
Nick White
af9c969818 Remove NUMBER_DAWG_FACTOR and WORD_DAWG_FACTOR from grc rules
These aren't used anywhere, and are difficult to calculate for grc,
so leave them as the default.
2016-02-05 11:16:38 +01:00
Nick White
890ec2876b Use different font list for grc training
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
2016-02-05 11:16:22 +01:00
Stefan Weil
b848caa151 Fix free of buffer which was not allocated
Coverity bug report: CID 1270420 "Free of address-of expression"

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:16:08 +01:00
Stefan Weil
613140a1ac pdfrenderer: Fix uninitialized local variables
Coverity bug reports:

CID 1270405: Uninitialized scalar variable
CID 1270408: Uninitialized scalar variable
CID 1270409: Uninitialized scalar variable
CID 1270410: Uninitialized scalar variable

Those variables are set conditionally in the while loop
and must keep their values in following iterations, so
they must be declared outside of the loop.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:54 +01:00
amitdo
d36ee9c4d0 tesseractmain.cpp: Split huge main() to sub functions
Add these functions to api/tesseractmain.cpp:
PrintVersionInfo()
PrintUsage()
PrintHelpForPSM()
PrintHelpMessage()
SetVariablesFromCLArgs()
PrintLangsList()
FixPageSegMode()
ParseArgs()
PreloadRenderers()
2016-02-05 11:15:38 +01:00
Stefan Weil
9bdaa0ad5a Fix duplicate fclose
Coverity bug report: CID 1270401 (#1 of 1): Use after free

As the comment (which was also fixed) says, ReadNextBox() already
calls fclose(box_file), so don't call it a 2nd time.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:16 +01:00
Stefan Weil
8c4b027292 tesseractmain: Fix unterminated string
Coverity bug report: CID 1270421 "Buffer not null terminated".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:06 +01:00
Stefan Weil
af9212c459 ccmain: Remove unused private class member
This fixes a warning from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:55 +01:00
Stefan Weil
56c2347e98 Remove checks for this == NULL
This fixes warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:41 +01:00
Stefan Weil
c6b758b11d Remove register attribute for local variables
This fixes clang compiler warnings like this one:

wordrec/gradechop.cpp:52:3: warning:
 'register' storage class specifier is deprecated [-Wdeprecated-register]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:14:29 +01:00
Stefan Weil
c8114811a5 Fix compiler warnings for copy constructors
gcc reports these warnings with -Wextra:

ccstruct/pageres.h:330:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.cpp:115:1: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.h:291:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccutil/genericvector.h:435:3: warning:
 base class 'class GenericVector<WERD_RES*>' should be explicitly initialized
 in the copy constructor [-Wextra]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:51 +01:00
Stefan Weil
18939a725f ccstruct: Fix compiler warning (disable buggy code)
gcc reports a potential bad array access:

ccstruct/mod128.cpp:98:20: warning:
 array subscript has type 'char' [-Wchar-subscripts]

dir is of type 'char'. Most compilers use signed char by default.
Then the value of dir is in the range -128 ... 127 and cannot be
used to access an array with 256 elements.

Don't fix that but disable the buggy code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:41 +01:00
Stefan Weil
cd946dc30d api: Fix printing of a size_t value
size_t is not always the same as long, especially not for 64 bit Windows:

api/pdfrenderer.cpp:549:31: warning:
 format '%ld' expects argument of type 'long int',
 but argument 4 has type 'size_t {aka long long unsigned int}' [-Wformat=]

size_t normally requires a format string "%zu", but this is unsupported
by Visual Studio, so use a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:34 +01:00
Stefan Weil
c0f4e86ef5 Fix case of include file name
Windows.h works on Windows, but not for cross builds on Linux hosts
with case sensitive file systems which only provide windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:23 +01:00
Stefan Weil
f7368ecb14 Don't use NULL for integer arguments
This fixes compiler warnings:

api/baseapi.cpp:1422:49: warning:
 passing NULL to non-pointer argument 6 of
 'int MultiByteToWideChar(UINT, DWORD, LPCCH, int, LPWSTR, int)'
 [-Wconversion-null]
api/baseapi.cpp:1427:54:
 warning: passing NULL to non-pointer argument 6 of
 'int WideCharToMultiByte(UINT, DWORD, LPCWCH, int, LPSTR, int, LPCCH, LPBOOL)'
 [-Wconversion-null]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:15 +01:00
Stefan Weil
1f4c8d0567 Remove unneeded const qualifiers
This fixes compiler warnings like this one:

api/baseapi.h:739:32: warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:13:00 +01:00
Stefan Weil
03a6e516ca viewer: Fix typos in comments
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:46 +01:00
Stefan Weil
9cbda9238e training: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:39 +01:00
Stefan Weil
9daf61f4d9 textord: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:29 +01:00
Stefan Weil
40dc71676b testing: Fix typo in comment (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:21 +01:00
Stefan Weil
02a071c593 opencl: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:12:02 +01:00
Stefan Weil
32d179e0a6 Fix more typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:11:54 +01:00
Zdenko Podobný
1890ba5f2a autotools: fail if g++ or clang++ compiler is not found; Fixes #130 (commit 34f34ead) 2016-02-05 11:11:39 +01:00