Commit Graph

15 Commits

Author SHA1 Message Date
Stefan Weil
ef1d9600b1 Use standard macros for format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Ray Smith
0e868ef377 Major change to improve layout analysis for heavily diacritic languages:
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
ec026cadfe Generalized feature extractor to allow fx from greyscale
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@876 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:21:37 +00:00
zdenop@gmail.com
7e14ade10d print error/warning messages to stderr/debug file instead of stdout (fix issue 911)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@843 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-05-16 20:31:37 +00:00
theraysmith@gmail.com
59d244b06e More fixes for GRAPHICS_DISABLED from Zdenko and Ray
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
theraysmith@gmail.com
9206e92b0d Added simultaneous multi-language capability, Refactored top-level word recognition module, Blamer module added for error analysis, Tidied up constraints on control parameters, Added UNICHARSET to WERD_CHOICE to make mult-language handling easier, Added word bigram correction
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@655 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:06:39 +00:00
theraysmith
5a779704da Deleted lots of dead code, including PBLOB
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@557 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:52:38 +00:00
theraysmith
137f4806b6 Added sub/superscript, small/dropcap detection
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:32:20 +00:00
theraysmith
47dc322437 Removed serialise and NEWDELETE macro
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@531 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 00:56:39 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
5410018c45 fix issue 350
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@478 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 00:08:04 +00:00
theraysmith
903a4ffe9d Changes to ccstruct for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@289 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:14:57 +00:00
theraysmith
c4f4840fbe Fixed name collision with jpeg library
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@163 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:41:37 +00:00
theraysmith
ac4e0cffa2 Updated graphics output for new java-based display
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@138 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:36:18 +00:00
tmbdev
425d593ebe top-skimming import from sf.net
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-07 20:03:40 +00:00