Ray Smith
84920b92b3
Font and classifier output structure cleanup.
...
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.
Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
3c21c14949
Fixed issue 1245
2014-08-13 18:51:28 -07:00
Ray Smith
736d327473
NOP changes from static analysis in issue 1205
2014-08-12 16:09:12 -07:00
zdenop
ee73e3b107
fix issue 123: user-words (and user-patterns) file specified by command line
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1093 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-04 21:11:00 +00:00
theraysmith@gmail.com
07ca24aeaf
Removed upper limit on trie size, fixing issue 1020.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1044 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 19:18:23 +00:00
theraysmith@gmail.com
d11dc049e3
Fixed a lot of compiler/clang warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
60b4f8bc88
Fixed issue 743
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@978 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 18:25:46 +00:00
theraysmith@gmail.com
67f9af58b8
Removed dependence on IMAGE class
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@944 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:31:29 +00:00
theraysmith@gmail.com
7ec4fd7a56
Refactorerd control functions to enable parallel blob classification
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
zdenop@gmail.com
53a3e0f88a
fix issue 755; add example config files from tesseract manpage
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@894 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-20 20:20:10 +00:00
theraysmith@gmail.com
4c3475ad2e
Fixed fmemopen portability problem
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-10 02:07:26 +00:00
theraysmith@gmail.com
4d514d5a60
Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
david.eger@gmail.com
0aadbd0169
Save BLOB_CHOICE s for alternate choices saved during segmentation
...
search so we have them when trying to replace words with alternates in
the bigram correction pass.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-01 00:33:46 +00:00
david.eger@gmail.com
4f0ff358a7
Missing close bracket.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@714 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-29 06:15:33 +00:00
david.eger@gmail.com
4ddb3e5941
Good moming, Good aftemoon.
...
During our initial chopping for each word, pay attention to whether a
dangerous ambiguity (like rn <-> m) would lead us to a dictionary word.
If so, make sure that blob gets chopped so that we can evaluate said
dictionary word during the segmentation search.
Large accuracy improvement, especially on English printed books (~9%).
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 21:02:54 +00:00
david.eger@gmail.com
0d5e8b5cb6
Recording segmentation state for a choice at LogNewChoice() time was a
...
bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go
by a call from Dict::LogNewSplit(). Relying on the auxilury
segmentation_state makes alt choices sometimes reference the wrong
blobs.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:11:57 +00:00
zdenop@gmail.com
d4d4b8aad8
improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
zdenop@gmail.com
1009a6e2f0
fopen() should use binary mode (issue 70)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@704 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-11 12:41:17 +00:00
zdenop@gmail.com
97e19443a3
install only necessary headers, fix uninstall
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@692 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 13:22:51 +00:00
zdenop@gmail.com
30a70142a0
visibility - autotools part (./configure --enable-visibility)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@690 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 23:51:33 +00:00
zdenop@gmail.com
6ccab83bd6
fixing issue 628 (replacing __MSW32__ with _WIN32) and issue 614 (reverting "class DLLSYM STRING" to "class CCUTIL_API STRING")
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@677 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-19 21:48:45 +00:00
david.eger@gmail.com
018f192fc2
Abolish populate_unichars(), fixing seg fault reported in Debian:
...
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-15 01:37:00 +00:00
theraysmith@gmail.com
fdd4ffe85e
Fixed endian bug in dawg reader, Added word bigram correction,
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@649 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:56:18 +00:00
zdenop@gmail.com
67f47008c7
fixed "one lib" build on linux; runautoconf renamed to autogen.sh;
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@631 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-16 19:39:54 +00:00
joregan@gmail.com
bf4a09d72a
make single/multiple libraries optional -- this needs testing!!!
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@623 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-29 21:28:28 +00:00
theraysmith@gmail.com
d5d15f32d7
Deleted Makefile.in from svn
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@606 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:32:44 +00:00
zdenop@gmail.com
9b7375edd6
MinGW portability solved + some code cleanup (based on cpplint)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@605 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-15 19:28:25 +00:00
theraysmith
664b84b3c8
Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@571 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:46:35 +00:00
theraysmith
96ca745384
Deleted lots of dead code, including PBLOB
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@565 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 22:14:53 +00:00
theraysmith
7cd3c74419
Deleted lots of dead code, including PBLOB
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@560 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:53:35 +00:00
theraysmith
b98c922391
Fixed problem with empty dawgs
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@537 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:04:02 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
zdenop@gmail.com
282aa13975
*.vcproj moved to vs2008/ (bin/ and bin.dbg/ will be in vs2008/)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@506 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-10-06 21:38:19 +00:00
zdenop@gmail.com
3964660093
update of VC++ project file to recent changes
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@495 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 21:20:00 +00:00
joregan
e0b07948fc
disabling gettext checks - not currently used, and something about disabling is causing subsequent autoconf checks to not run
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@492 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 16:27:39 +00:00
joregan
9c53d54fe3
max.markin's patch for issue 345
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@477 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-29 23:54:18 +00:00
joregan
69f39d4bf5
fix for issue 341, thanks to max.markin
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@454 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-19 19:17:06 +00:00
joregan
75676cd644
doxygen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@449 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-02 00:05:57 +00:00
joregan
d7924dd824
http://groups.google.com/group/tesseract-ocr/msg/16597e4f7725dfe1
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@448 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-01 16:24:31 +00:00
joregan
a18816f839
partial merge of doxygen branch (stuff without conflicts, basically)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@441 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 13:23:23 +00:00
joregan
7e8bd73aea
some casts to get rid of persistent warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@435 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 21:19:53 +00:00
joregan
cd96d8ede5
more warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@434 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 18:11:00 +00:00
joregan
edf7e7694c
silence more useless warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@432 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 15:11:19 +00:00
joregan
69d6d35f28
patch for issue 304 from max.markin
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@422 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 02:32:21 +00:00
joregan
a301f9a5c7
start of i18n
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@418 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 01:59:13 +00:00
joregan
ddcb98565a
update generated autoconf/make stuff
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@369 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:21:37 +00:00
joregan
34d8258049
use libtool
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@368 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:20:20 +00:00
theraysmith
aea5be1995
Fixed issue 272
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@335 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-19 18:48:59 +00:00
theraysmith
f01a33ae96
Fixed issue 260
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@326 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-17 21:19:34 +00:00
theraysmith
3a13d80d24
Changes to dict for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@293 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:20:33 +00:00