Commit Graph

19 Commits

Author SHA1 Message Date
Stefan Weil
d40b28f47d textord: Remove unused constants
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
64f9190575 textord: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
Ray Smith
78b5e1a77d Fixed occurrence of small rotated blocks in loosely spaced text 2015-06-12 11:05:00 -07:00
Ray Smith
0e868ef377 Major change to improve layout analysis for heavily diacritic languages:
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
55d11ad3c2 Moved params from global in page layout to tesseractclass, improved single column layout analysis 2014-10-07 09:31:00 -07:00
theraysmith@gmail.com
2624cac7d5 Fixed issue 1103
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1070 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 00:14:33 +00:00
theraysmith@gmail.com
2ad63776e5 Fixed issue 979
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1034 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-30 02:20:59 +00:00
theraysmith@gmail.com
2c909702c9 Generalized feature extractor to allow fx from greyscale
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@877 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:22:37 +00:00
theraysmith@gmail.com
64c739c8af Added sparse text mode, also fixed issue 653.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
theraysmith@gmail.com
59d244b06e More fixes for GRAPHICS_DISABLED from Zdenko and Ray
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
zdenop@gmail.com
49c4ce3183 fix for GRAPHICS_DISABLED build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
theraysmith@gmail.com
6e3d810c1d Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@648 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:53:04 +00:00
zdenop@gmail.com
da41b96f7f removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
theraysmith
e0b21cbd34 Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@572 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:48:17 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
f2506871f9 move include of config_auto.h to not conflict with local types. Not finished
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
5c8ad7ee72 add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-28 12:03:45 +00:00
joregan
38a6b18a5f disable MSVC warning C4244 in a number of places to cut down the noise
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@363 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 10:22:27 +00:00
theraysmith
0a9ad20d1c Changes to textord for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@301 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:39:56 +00:00