Commit Graph

3822 Commits

Author SHA1 Message Date
Egor Pugin
f1c2e6eaa9 Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2016-06-30 00:30:10 +03:00
Egor Pugin
57605d99e9 Implement CPPAN support for easy Windows building. 2016-06-30 00:29:55 +03:00
Nick White
382e15bec0 Merge branch 'master' into hocrcharboxes
Conflicts:
	configure.ac
2016-06-29 09:32:38 +01:00
Nick White
78ae2cc073 Fix bug with linebreaking in hOCR
The hOCR output could incorrectly close span, p, and div tags
early. Oops, my bad.
2016-06-29 09:25:44 +01:00
zdenop
647b88daf0 Merge pull request #359 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation (addition to #358)
2016-06-27 22:19:22 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
zdenop
828f8528a8 Merge pull request #358 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation
2016-06-27 09:09:12 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00
zdenop
5ca73cca26 Merge pull request #355 from amitdo/pango-name-is-empty
Check that pango's suggested font name is not an empty string
2016-06-20 10:26:11 +02:00
zdenop
ba2ea39caa Merge pull request #356 from stweil/cygwin
Fix Cygwin compatibility – part III
2016-06-20 10:24:41 +02:00
Stefan Weil
ed053aab94 Fix Cygwin compatibility – part III
Commit 65504c8cd2 misplaced the #endif.
The definition of _GNU_SOURCE is only needed for Cygwin.

Defining _GNU_SOURCE on Linux results in compiler warnings because this
macro is already defined by the compiler.

Fix this by moving the #endif to the right place. In addition the code
for Cygwin is made more robust: If a future Cygwin compiler defines
_GNU_SOURCE, too, the code will still work.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-06-19 22:38:03 +02:00
amitdo
724fb894ac Check that pango's suggested font name is not an empty string
On msys2 pango seems to always returns empty string for the suggested
font. It's a good idea to check that the string is not empty before
printing it - on all platforms.
2016-06-19 13:40:17 +03:00
Amit
96720c785d Merge pull request #351 from amitdo/cygwin-compat
Fix Cygwin compatibility
2016-06-19 12:43:35 +03:00
Stefan Weil
65504c8cd2 Fix Cygwin compatibility - Part II 2016-06-19 11:59:58 +03:00
Amit
91fe9ef51a Merge pull request #354 from stweil/makefile
Makefile: Fix phony training target
2016-06-19 09:25:51 +03:00
Stefan Weil
c2574609e4 Makefile: Fix phony training target
This fixes wrong behaviour of "make training" when dependencies for
training were incomplete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-06-19 08:20:53 +02:00
Amit Dovev
13d789d4df Merge pull request #288 from nickjwhite/opentypeligatures
Enable all ligatures available in a font for text2image rendering
2016-06-19 03:33:32 +03:00
Amit Dovev
ad66dd2912 Merge pull request #282 from ianblenke/master
Dockerify using travis build script
2016-06-19 03:29:00 +03:00
Philipp Nordhus
c05ff3456e Remove duplicate destructor
Destructor of base class GenericVector calls base class clear()
method, deallocating the memory.
2016-06-17 23:20:03 +02:00
Philipp Nordhus
907de5995f Do not allocate in GenericVector default ctor 2016-06-17 22:38:41 +02:00
Philipp Nordhus
b6db68f083 Remove indirection in LanguageModelDawgInfo 2016-06-17 22:38:38 +02:00
Marco Atzeri
b1c921b59e Fix Cygwin compatibility 2016-06-17 15:52:01 +03:00
Amit Dovev
034d666e7a Replace use of TLOG_FATAL() with tprintf() and exit(1) (#349)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-16 12:10:53 +03:00
Amit Dovev
32d5ef6e53 Merge pull request #345 from amitdo/training-noasserts
Replace asserts with tprintf() and exit(1)
2016-06-15 10:49:35 +03:00
Shreeshrii
c3a7fab349 Replace asserts with tprintf() and exit(1)
Asserts should not be used for missing or invalid input in the command
line! This leads to a bad UX.
2016-06-14 14:35:05 +03:00
Amit Dovev
86acff5a03 Merge pull request #340 from amitdo/fix-292
Bypass Leptonica error message with pixGenHalftoneMask()
2016-06-07 12:36:24 +03:00
scottb89
3dcb5c2488 Bypass Leptonica error message with pixGenHalftoneMask()
Fixes #292
2016-06-05 17:38:43 +03:00
Amit Dovev
1958b3bfc3 Merge pull request #336 from amitdo/amitdo-contributing
Create CONTRIBUTING.md
2016-05-29 15:17:07 +03:00
Amit Dovev
99832f306b CONTRIBUTING.md: Fix a typo 2016-05-29 13:27:33 +03:00
Amit Dovev
d19c522e0c Create CONTRIBUTING.md 2016-05-28 22:43:44 +03:00
Amit Dovev
98e87d2e22 Merge pull request #335 from stweil/configure
configure: Enclose most macro arguments in []
2016-05-27 17:39:16 +03:00
Stefan Weil
4cbe9622d1 configure: Enclose most macro arguments in []
This is not strictly necessary, but recommended in the GNU autoconf manual.
No [] was added to arguments like true or false.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-27 15:30:17 +02:00
zdenop
d8c04e82d1 Merge pull request #332 from hoiqs/fix_add_str_double
Fix of add_str_double
2016-05-25 16:53:13 +02:00
Heiko Oberdiek
dec38db7ce Fix for constant kMaxDoubleSize (from 15 to 16),
which is used by method STRING::add_str_double.
2016-05-25 16:26:41 +02:00
zdenop
daa8a53e5f Merge pull request #331 from stweil/master
configure: Fix check for dependencies needed for training
2016-05-24 13:15:49 +02:00
zdenop
d9f465926c Merge pull request #299 from mmcco/rm-off_t
Remove conditional definition of off_t
2016-05-24 13:14:17 +02:00
Stefan Weil
1b9d0688fa configure: Fix check for dependencies needed for training
The different checks had set ENABLE_TRAINING unconditionally,
thus overwriting the value from the preceding checks.

So if pango and cairo were available, but icu was missing,
users would still be offered to build the training tools.

The changes for icu and has_cpp11 are not strictly necessary,
but are made here to have uniform code patterns.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-23 22:44:17 +02:00
zdenop
99ccebadfc Merge pull request #330 from amitdo/training-argv1
Training tools: Print help message when (argv == 1)
2016-05-22 13:34:07 +02:00
amitdo
cd1a14450c Training tools: Print help message when (argv == 1) 2016-05-22 11:16:42 +03:00
zdenop
7ffa2a01c2 Merge pull request #329 from amincheloh/patch-1
fix invalid release year for V3.04.01
2016-05-21 20:29:00 +02:00
Amin Cheloh
c4d273d33c fix invalid release year for V3.04.01 2016-05-21 17:51:04 +07:00
Zdenko Podobný
cab6de1740 remove unused GlyphLessFont files 2016-05-20 21:19:00 +02:00
zdenop
d946ae27d0 Merge pull request #311 from robbertkl/patch-1
Fix incompatibility with some C++11 implementations
2016-05-17 08:44:15 +02:00
zdenop
c5e7dc6642 Merge pull request #323 from stweil/stdout
Print normal user messages to stdout instead to stderr
2016-05-17 08:42:37 +02:00
zdenop
d5fd184302 Merge pull request #321 from stweil/build
configure: Fix cross compiler flags for cairo and pango
2016-05-17 08:42:18 +02:00
Stefan Weil
e59be55bcc Print list of languages to stdout instead to stderr
It is common practice for command line programs to print
user requested information on stdout.

This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
7e98c33432 Print help text to stdout instead to stderr
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
ee5e1e972a configure: Fix cross compiler flags for cairo and pango
Calling pkg-config directly is a bad idea because it returns
the compiler flags for native builds.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-15 19:13:11 +02:00
Nick White
d71133a769 Use ocrx_cinfo to hold character box and confidence information
With hocr_char_boxes enabled in hocr output, each grapheme now gets
its own span tag, which holds the character confidence and box
coordinates. Using x_bboxes at the ocrx_word level was
inappropriate, as it was impossible to find which grapheme was
represented by each bounding box.
2016-05-06 13:06:46 +01:00
Nick White
06b7a7b188 Add option to include character bounding boxes in hocr output
Add the 'hocr_char_boxes' configuration option (off by default),
which enables printing the bounding boxes of each character in the
x_bboxes property of an ocrx_word element in hocr output.
2016-04-29 15:37:46 +01:00