Commit Graph

1143 Commits

Author SHA1 Message Date
Stefan Weil
3bad589431 Add missing argument for tprintf
The format string expects 3 int arguments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-21 20:32:05 +02:00
Stefan Weil
2b8f137c8c Print version to stdout instead to stderr
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.

Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-21 20:31:46 +02:00
Edward Carroll
47bb8141ec Fix other instance of VS2015 compiler problem
As with 0c492cb, in VC14 snprintf function is provided in standard library there triggering error. "snprintf Do not define snprintf as a macro. Macro definition of snprintf conflicts with Standard Library function declaration"
2016-08-21 20:31:29 +02:00
Zdenko Podobný
feeec2232f update ChangeLog;
remove ReleaseNotes (a relevant information are in Changelog file and there is Release note wiki online)
2016-08-21 20:31:12 +02:00
Zdenko Podobný
07e8022b72 check for pdf support in leptonica 2016-08-21 20:30:57 +02:00
Philip Rinn
f00ff67c17 Fix ABI break introduced in 3.04.00, fixes #254 2016-03-08 17:37:00 +01:00
amitdo
6f4dca803f Don't display tesseract's banner when quiet mode is active 2016-03-08 17:36:48 +01:00
Zdenko Podobný
65a42bccb3 update Release Notes (fixes #250) 2016-03-08 17:36:33 +01:00
Zdenko Podobný
285c3fba6a solve segfault for box.train; fixes #57 2016-03-08 17:36:22 +01:00
Zdenko Podobný
6385577b4c improve tesseract.pc.in - fixes #241 2016-03-08 17:36:09 +01:00
Zdenko Podobný
52e60a320e move new&delete histogramAllChannels inside the #ifdef USE_OPENCL; fixes #248 2016-03-08 17:35:48 +01:00
Tom Morris
2eec8e8747 Document hocr_font_info in config 2016-03-08 17:32:14 +01:00
Stefan Weil
a8e086d58b Fix compiler warning (signed / unsigned mismatch)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-18 09:08:00 +01:00
Clemens Eisserer
8c1c20e008 Initialize output parameters of word_char_quality() to zero before early exit 2016-02-18 09:07:49 +01:00
Zdenko Podobný
e4711bfcd5 increase version number in 3.04 branch 2016-02-18 09:07:34 +01:00
Tom Morris
4ef68a036c Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-18 09:05:54 +01:00
Tom Morris
381b3a56c6 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-18 09:05:46 +01:00
Tom Morris
c3ad0de69b Fix varsize array for Microsoft compiler 2016-02-18 09:05:37 +01:00
Amit Dovev
473c30fa87 Update README.md 2016-02-18 09:05:04 +01:00
Zdenko Podobný
255c31fe31 update release date 2016-02-16 22:27:01 +01:00
Tom Morris
134ebc3df3 INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.
2016-02-16 22:26:12 +01:00
Zdenko Podobný
8473e5a262 update autotools files 2016-02-13 00:06:11 +01:00
Zdenko Podobný
56c1a4a21f add option "make training-uninstall" 2016-02-13 00:04:59 +01:00
Zdenko Podobný
ebadb00e4d fix version number => 3.04.01 2016-02-12 23:28:40 +01:00
Amit Dovev
8bdfb5bfc4 Update README.md 2016-02-12 09:59:57 +01:00
James R. Barlow
9cec609bdf Replace pdf.ttf with sharp2.ttf, keep name the same
As discussed at length in issue #182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince.  It does
seem to fix Kindle and OS X Preview.
2016-02-12 09:53:35 +01:00
Zdenko Podobný
0d791d6de7 Fix cygwin build (partial fix from da3852d) 2016-02-05 11:32:37 +01:00
Dennis Schridde
3003411a91 Compatibility with Leptonica 1.73
http://www.leptonica.org/source/version-notes.html:
       Naming changes (to avoid collisions):
         #defines MALLOC --> LEPT_MALLOC, CALLOC --> LEPT_CALLOC, etc.
         ByteBuffer --> L_ByteBuffer

Introduction of the TESSERACT_LIBLEPT_PREREQ macro allows backward compatibility with Leptonica <1.73.
2016-02-05 11:21:55 +01:00
amitdo
337a9b52c4 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-02-05 11:21:42 +01:00
Ryan Baumann
7ffc77b3a2 Add Junicode to neo-Latin fonts 2016-02-05 11:21:20 +01:00
Ryan Baumann
52d9ed4d69 Use different font list and exposures for "lat" language training 2016-02-05 11:21:11 +01:00
amitdo
b60bb806bf Fix #184. Training should work now 2016-02-05 11:20:57 +01:00
Stefan Weil
ac6b17e918 Remove unneeded definition for NULL
NULL is already defined in stddef.h,
so a local definition is not be needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:20:36 +01:00
Zdenko Podobný
8daef71a83 added row attributes to hocr output 2016-02-05 11:20:01 +01:00
Hamid Safdari
0968896fc6 correct minor syntax errors language-specific.sh 2016-02-05 11:19:19 +01:00
Stefan Weil
f684a77529 Fix compiler warnings (remove unused constants)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:19:07 +01:00
Amit Dovev
3379973063 Update Makefile.am 2016-02-05 11:18:59 +01:00
Stefan Weil
369f5b472c Get tessdata prefix from executable path (only for Windows)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:18:48 +01:00
amitdo
fe5ee13229 Add missing ')'_to make the code compile 2016-02-05 11:18:40 +01:00
amitdo
270214e667 If there is no explicit renderer(s), default to TessTextRenderer
Revert fd429c32, 43834da7, 05de195e.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
2016-02-05 11:18:34 +01:00
Stefan Weil
d7b6c9655f Fix grammar in license file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:30 +01:00
Stefan Weil
4a7cf319fc tesseractmain: Prettify help message
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.

* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
  (partially revert commit 99110df757).

In addition, whitespace characters at end of lines were removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:17:08 +01:00
Nick White
8a08da7d35 Use shell quoting rather than pluses to separate font arguments in tesstrain.sh
The way tesstrain.sh handled font names was really weird, using '+'
signs as a delimiter. However quoting arguments is a much more
straightforward, standard and sensible way to do things.

So whereas previously one would have used this:
  --fontlist Times New Roman + Arial Black
Now they should be specified like this:
  --fontlist "Times New Roman" "Arial Black"
2016-02-05 11:16:56 +01:00
Nick White
796188072a Set default exposure settings for grc training 2016-02-05 11:16:45 +01:00
Nick White
af9c969818 Remove NUMBER_DAWG_FACTOR and WORD_DAWG_FACTOR from grc rules
These aren't used anywhere, and are difficult to calculate for grc,
so leave them as the default.
2016-02-05 11:16:38 +01:00
Nick White
890ec2876b Use different font list for grc training
This font list contains a selection fonts produced by the Greek Font
Society <http://greekfontsociety.gr>, and is the result of testing
with a large corpus of a variety of scanned works.
2016-02-05 11:16:22 +01:00
Stefan Weil
b848caa151 Fix free of buffer which was not allocated
Coverity bug report: CID 1270420 "Free of address-of expression"

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:16:08 +01:00
Stefan Weil
613140a1ac pdfrenderer: Fix uninitialized local variables
Coverity bug reports:

CID 1270405: Uninitialized scalar variable
CID 1270408: Uninitialized scalar variable
CID 1270409: Uninitialized scalar variable
CID 1270410: Uninitialized scalar variable

Those variables are set conditionally in the while loop
and must keep their values in following iterations, so
they must be declared outside of the loop.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:54 +01:00
amitdo
d36ee9c4d0 tesseractmain.cpp: Split huge main() to sub functions
Add these functions to api/tesseractmain.cpp:
PrintVersionInfo()
PrintUsage()
PrintHelpForPSM()
PrintHelpMessage()
SetVariablesFromCLArgs()
PrintLangsList()
FixPageSegMode()
ParseArgs()
PreloadRenderers()
2016-02-05 11:15:38 +01:00
Stefan Weil
9bdaa0ad5a Fix duplicate fclose
Coverity bug report: CID 1270401 (#1 of 1): Use after free

As the comment (which was also fixed) says, ReadNextBox() already
calls fclose(box_file), so don't call it a 2nd time.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-05 11:15:16 +01:00