Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
55d11ad3c2
Moved params from global in page layout to tesseractclass, improved single column layout analysis
2014-10-07 09:31:00 -07:00
theraysmith@gmail.com
df80e9dc59
Fixed problems with OSD that were exposed by fix to issue 979. Fixes issue 979 properly.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1043 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 19:16:42 +00:00
theraysmith@gmail.com
2ad63776e5
Fixed issue 979
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1034 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-30 02:20:59 +00:00
theraysmith@gmail.com
6a10aa7985
More cleanup changes from patches
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1024 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-29 02:22:14 +00:00
zdenop@gmail.com
71ae509354
fix for mingw32/g++ 4.8.1
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@998 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-22 08:10:15 +00:00
theraysmith@gmail.com
b0d67f1b5f
Removed dependence on IMAGE class
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@959 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:48:26 +00:00
theraysmith@gmail.com
b0fb616299
Generalized feature extractor to allow fx from greyscale
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@875 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:19:50 +00:00
theraysmith@gmail.com
64c739c8af
Added sparse text mode, also fixed issue 653.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
zdenop@gmail.com
e216adab43
fix configure.ac; unify identifiers (WIN32 vs _WIN32)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
theraysmith@gmail.com
3a998fe7ac
Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:59:49 +00:00
zdenop@gmail.com
da41b96f7f
removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
zdenop@gmail.com
9b7375edd6
MinGW portability solved + some code cleanup (based on cpplint)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@605 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-15 19:28:25 +00:00
theraysmith
3e8c0bc228
Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:44:05 +00:00
theraysmith
7121e51422
Deleted lots of dead code, including PBLOB
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@556 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:52:08 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00