Commit Graph

7 Commits

Author SHA1 Message Date
Ray Smith
0e868ef377 Major change to improve layout analysis for heavily diacritic languages:
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
dbf6197471 Major refactor of control.cpp to enable line recognition
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
4d514d5a60 Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
theraysmith@gmail.com
64c739c8af Added sparse text mode, also fixed issue 653.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
theraysmith@gmail.com
59d244b06e More fixes for GRAPHICS_DISABLED from Zdenko and Ray
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
theraysmith@gmail.com
6e3d810c1d Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@648 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:53:04 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00