Commit Graph

11 Commits

Author SHA1 Message Date
Stefan Weil
00a4e06be9 wordrec: Fix typos in comments
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 22:23:48 +02:00
Ray Smith
25d0968d09 Major refactor to improve speed on difficut images, especially when running
a heap checker.
SEAM and SPLIT have been begging for a refactor for a *LONG* time.
This change does most of the work of turning them into proper classes:
  Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions.
  Made the splits full data members of SEAM in an array instead of 3 separate pointers.
    This greatly reduces the amount of new/delete happening in the chopper, which is the main goal.
  Deleted redundant files: olutil.*,  makechop.*
  Brought other code into SEAM in order to keep its data members private with only priority having accessors.
2015-05-12 14:59:14 -07:00
theraysmith@gmail.com
dbf6197471 Major refactor of control.cpp to enable line recognition
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
da20cff7ae Fixed issue 1056
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@975 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 17:24:07 +00:00
theraysmith@gmail.com
4d514d5a60 Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
david.eger@gmail.com
0aadbd0169 Save BLOB_CHOICE s for alternate choices saved during segmentation
search so we have them when trying to replace words with alternates in
the bigram correction pass.


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-01 00:33:46 +00:00
david.eger@gmail.com
4ddb3e5941 Good moming, Good aftemoon.
During our initial chopping for each word, pay attention to whether a
dangerous ambiguity (like rn <-> m) would lead us to a dictionary word.
If so, make sure that blob gets chopped so that we can evaluate said
dictionary word during the segmentation search.

Large accuracy improvement, especially on English printed books (~9%).



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 21:02:54 +00:00
theraysmith@gmail.com
01026af5a2 Refactored top-level word recognition module, Blamer module added for error analysis, Added word bigram correction
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@652 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:01:38 +00:00
theraysmith
23b29fbe9a Impact of DENORM rewrite + removal of NEWDELETE
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@535 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:02:28 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
theraysmith
ff17d40071 More Changes to wordrec for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@307 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:51:34 +00:00