tesseract/textord
Ray Smith 0e868ef377 Major change to improve layout analysis for heavily diacritic languages:
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
..
alignedblob.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
alignedblob.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
baselinedetect.cpp Fix to baselinedetect from issue 1205 2014-08-12 16:14:19 -07:00
baselinedetect.h Fixed a lot of compiler/clang warnings 2014-01-25 02:28:51 +00:00
bbgrid.cpp Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
bbgrid.h stl cleanup 2014-01-09 17:38:35 +00:00
blkocc.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
blkocc.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
blobgrid.cpp Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
blobgrid.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
ccnontextdetect.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
ccnontextdetect.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
cjkpitch.cpp Bunch of minor bug fixes/cleanups 2014-05-21 15:48:48 +00:00
cjkpitch.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
colfind.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colfind.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colpartition.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colpartition.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colpartitiongrid.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colpartitiongrid.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
colpartitionset.cpp Moved params from global in page layout to tesseractclass, improved single column layout analysis 2014-10-07 09:31:00 -07:00
colpartitionset.h Moved params from global in page layout to tesseractclass, improved single column layout analysis 2014-10-07 09:31:00 -07:00
devanagari_processing.cpp Fixed a lot of compiler/clang warnings 2014-01-25 02:28:51 +00:00
devanagari_processing.h remove 'class IMAGE;' 2014-02-03 23:32:23 +00:00
drawedg.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
drawedg.h Add ability to build under android (without cube or scrollview). 2015-05-12 15:41:15 -07:00
drawtord.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
drawtord.h Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. 2013-09-23 15:26:50 +00:00
edgblob.cpp Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
edgblob.h Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
edgloop.cpp Generalized feature extractor to allow fx from greyscale 2013-09-23 15:22:37 +00:00
edgloop.h Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
equationdetectbase.cpp Added experimental equation detector 2012-02-02 02:50:01 +00:00
equationdetectbase.h Added experimental equation detector 2012-02-02 02:50:01 +00:00
fpchop.cpp Fixed a lot of compiler/clang warnings 2014-01-25 02:28:51 +00:00
fpchop.h Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. 2013-09-23 15:26:50 +00:00
gap_map.cpp Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. 2013-09-23 15:26:50 +00:00
gap_map.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
imagefind.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
imagefind.h Renamed RGB to ComposeRGB to fix windows macro problem 2012-02-03 16:52:25 +00:00
linefind.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
linefind.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
Makefile.am Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
makerow.cpp Major refactor of control.cpp to enable line recognition 2014-08-11 23:23:06 +00:00
makerow.h Major refactor of control.cpp to enable line recognition 2014-08-11 23:23:06 +00:00
oldbasel.cpp Fixed a lot of compiler/clang warnings 2014-01-25 02:28:51 +00:00
oldbasel.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
pithsync.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
pithsync.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
pitsync1.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
pitsync1.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
scanedg.cpp Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
scanedg.h Removed dependence on IMAGE class 2014-01-09 17:36:42 +00:00
sortflts.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
sortflts.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
strokewidth.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
strokewidth.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
tabfind.cpp Moved params from global in page layout to tesseractclass, improved single column layout analysis 2014-10-07 09:31:00 -07:00
tabfind.h Moved params from global in page layout to tesseractclass, improved single column layout analysis 2014-10-07 09:31:00 -07:00
tablefind.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
tablefind.h 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
tablerecog.cpp Fixed issues 1081-1090 2014-02-04 02:23:18 +00:00
tablerecog.h 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
tabvector.cpp Fixed issue 1304 2014-10-07 09:24:24 -07:00
tabvector.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
textlineprojection.cpp fix build with -DGRAPHICS_DISABLED 2014-01-11 23:08:54 +00:00
textlineprojection.h Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding 2012-02-02 02:53:04 +00:00
textord.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
textord.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
topitch.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
topitch.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
tordmain.cpp Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
tordmain.h Major change to improve layout analysis for heavily diacritic languages: 2015-05-12 16:47:02 -07:00
tospace.cpp Bunch of minor bug fixes/cleanups 2014-05-21 15:48:48 +00:00
tovars.cpp remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
tovars.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
underlin.cpp Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. 2013-09-23 15:26:50 +00:00
underlin.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
wordseg.cpp NOP changes from static analysis in issue 1205 2014-08-12 16:09:12 -07:00
wordseg.h remove unused code (Windows related) 2013-07-08 18:21:10 +00:00
workingpartset.cpp Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. 2013-09-23 15:26:50 +00:00
workingpartset.h Changes to textord for 3.00 2009-07-11 02:39:56 +00:00