tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-12 07:29:07 +08:00

Author	SHA1	Message	Date
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
theraysmith@gmail.com	67f9af58b8	Removed dependence on IMAGE class git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@944 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-01-09 17:31:29 +00:00
theraysmith@gmail.com	4c3475ad2e	Fixed fmemopen portability problem git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-10-10 02:07:26 +00:00
theraysmith@gmail.com	4d514d5a60	Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-09-23 15:26:50 +00:00
david.eger@gmail.com	0aadbd0169	Save BLOB_CHOICE s for alternate choices saved during segmentation search so we have them when trying to replace words with alternates in the bigram correction pass. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-09-01 00:33:46 +00:00
david.eger@gmail.com	4f0ff358a7	Missing close bracket. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@714 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-29 06:15:33 +00:00
david.eger@gmail.com	4ddb3e5941	Good moming, Good aftemoon. During our initial chopping for each word, pay attention to whether a dangerous ambiguity (like rn <-> m) would lead us to a dictionary word. If so, make sure that blob gets chopped so that we can evaluate said dictionary word during the segmentation search. Large accuracy improvement, especially on English printed books (~9%). git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-28 21:02:54 +00:00
david.eger@gmail.com	0d5e8b5cb6	Recording segmentation state for a choice at LogNewChoice() time was a bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go by a call from Dict::LogNewSplit(). Relying on the auxilury segmentation_state makes alt choices sometimes reference the wrong blobs. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-28 20:11:57 +00:00
david.eger@gmail.com	018f192fc2	Abolish populate_unichars(), fixing seg fault reported in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-15 01:37:00 +00:00
theraysmith@gmail.com	fdd4ffe85e	Fixed endian bug in dawg reader, Added word bigram correction, git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@649 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-02 02:56:18 +00:00
zdenop@gmail.com	4523ce9f7d	3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-23 18:34:14 +00:00
joregan	edf7e7694c	silence more useless warnings git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@432 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-07-21 15:11:19 +00:00
theraysmith	3a13d80d24	Changes to dict for 3.00 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@293 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-07-11 02:20:33 +00:00
theraysmith	bea5e04b76	Fixed compilation with GRAPHICS_DISABLED git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-06-03 17:24:08 +00:00
theraysmith	520077bd41	Fixed name collision with jpeg library git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@164 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2008-04-22 00:42:51 +00:00
theraysmith	2a678305c6	Major internationalization improvements git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@133 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2008-02-01 00:21:49 +00:00
theraysmith	570af48b8b	Remaining changes for Unicodeization project git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@87 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-07-18 01:15:07 +00:00
theraysmith	bc769e29b2	Preparations for unicodization git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@32 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-05-16 00:44:44 +00:00
tmbdev	425d593ebe	top-skimming import from sf.net git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-03-07 20:03:40 +00:00

19 Commits