Zdenko Podobný
c943fc1a33
sets justification for ParagraphInfo; fixes #429
2016-09-18 20:31:45 +02:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
97080412fd
Bunch of minor bug fixes/cleanups
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1106 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:48:48 +00:00
theraysmith@gmail.com
ad149844f0
Added polygonal block outline output to PageIterator
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1025 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-29 02:23:28 +00:00
theraysmith@gmail.com
4d514d5a60
Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
zdenop@gmail.com
cd8de9157c
change comments to doxygen block comments (api)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
david.eger@gmail.com
75a9a8fae7
Address "RIL_PARA doesn't work" comment in issue 622.
...
http://code.google.com/p/tesseract-ocr/issues/detail?id=622
The core of the problem is that in PSM_SINGLE_BLOCK mode, Tesseract
doesn't run paragraph detection, so no paragraphs get generated. Here,
we make sure that even if run in a mode where no paragraphs get
generated, we treat each block as its own paragraph.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@696 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 20:02:57 +00:00
theraysmith@gmail.com
ef786ad29b
Moved ResultIterator/PageIterator to ccmain
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@645 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:47:59 +00:00