Commit Graph

106 Commits

Author SHA1 Message Date
theraysmith@gmail.com
4d514d5a60 Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
zdenop@gmail.com
10c1169d98 remove unused code (Windows related)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@860 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-08 18:21:10 +00:00
zdenop@gmail.com
e4c00773de fix typo (issue 908)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@844 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-05-16 20:42:02 +00:00
theraysmith@gmail.com
59d244b06e More fixes for GRAPHICS_DISABLED from Zdenko and Ray
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
david.eger@gmail.com
0aadbd0169 Save BLOB_CHOICE s for alternate choices saved during segmentation
search so we have them when trying to replace words with alternates in
the bigram correction pass.


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-01 00:33:46 +00:00
david.eger@gmail.com
56403c6dc3 Fix an issue where we sometimes leave a dangling outline->loop pointer
during chopping.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@721 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-17 00:02:52 +00:00
david.eger@gmail.com
4ddb3e5941 Good moming, Good aftemoon.
During our initial chopping for each word, pay attention to whether a
dangerous ambiguity (like rn <-> m) would lead us to a dictionary word.
If so, make sure that blob gets chopped so that we can evaluate said
dictionary word during the segmentation search.

Large accuracy improvement, especially on English printed books (~9%).



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 21:02:54 +00:00
david.eger@gmail.com
0d5e8b5cb6 Recording segmentation state for a choice at LogNewChoice() time was a
bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go
by a call from Dict::LogNewSplit().  Relying on the auxilury
segmentation_state makes alt choices sometimes reference the wrong
blobs.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:11:57 +00:00
zdenop@gmail.com
d4d4b8aad8 improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
zdenop@gmail.com
97e19443a3 install only necessary headers, fix uninstall
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@692 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 13:22:51 +00:00
zdenop@gmail.com
30a70142a0 visibility - autotools part (./configure --enable-visibility)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@690 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 23:51:33 +00:00
david.eger@gmail.com
018f192fc2 Abolish populate_unichars(), fixing seg fault reported in Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-15 01:37:00 +00:00
theraysmith@gmail.com
01026af5a2 Refactored top-level word recognition module, Blamer module added for error analysis, Added word bigram correction
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@652 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:01:38 +00:00
zdenop@gmail.com
67f47008c7 fixed "one lib" build on linux; runautoconf renamed to autogen.sh;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@631 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-16 19:39:54 +00:00
max.markin@gmail.com
bf3ae643e5 Fixed some warnings to make the VC2010 compiler happy:
C4355: 'this' : used in base member initializer list
C4099: type name first seen using 'class' now seen using 'struct'

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@630 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-15 22:26:34 +00:00
joregan@gmail.com
bf4a09d72a make single/multiple libraries optional -- this needs testing!!!
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@623 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-29 21:28:28 +00:00
theraysmith@gmail.com
030aae9896 Removed debugwin.cpp, fixing issue 448
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@612 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:45:01 +00:00
theraysmith@gmail.com
d5d15f32d7 Deleted Makefile.in from svn
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@606 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:32:44 +00:00
zdenop@gmail.com
7ec3dca968 show page 0 for multipage tiff;
Windows: use binary mode for fopen (issue 70);
autotools: fixed cutil/Makefile.am, improved tessdata/Makefile.am;

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@604 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-11 21:42:13 +00:00
theraysmith
f7445867f9 Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@575 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:49:31 +00:00
theraysmith
ec39052274 Deleted lots of dead code, including PBLOB
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@563 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:54:54 +00:00
theraysmith
23b29fbe9a Impact of DENORM rewrite + removal of NEWDELETE
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@535 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:02:28 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
zdenop@gmail.com
da06ed4075 addition to Revision: 506 ;-)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@507 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-10-06 21:48:42 +00:00
joregan
e0b07948fc disabling gettext checks - not currently used, and something about disabling is causing subsequent autoconf checks to not run
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@492 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 16:27:39 +00:00
joregan
f2506871f9 move include of config_auto.h to not conflict with local types. Not finished
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
a18816f839 partial merge of doxygen branch (stuff without conflicts, basically)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@441 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 13:23:23 +00:00
joregan
69d6d35f28 patch for issue 304 from max.markin
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@422 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 02:32:21 +00:00
joregan
a301f9a5c7 start of i18n
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@418 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 01:59:13 +00:00
joregan
5279e34296 GRAPHICS_ENABLED means ScrollView, but the correct #define was not being set
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@407 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 16:03:29 +00:00
joregan
00f6c5d371 more
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@405 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 15:29:01 +00:00
joregan
5c8ad7ee72 add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-28 12:03:45 +00:00
joregan
ddcb98565a update generated autoconf/make stuff
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@369 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:21:37 +00:00
joregan
34d8258049 use libtool
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@368 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:20:20 +00:00
theraysmith
fad96e60b1 Fixed issue 237: compilability on other linux variant
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@317 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-08-20 22:40:04 +00:00
theraysmith
fea38ee706 Misc root changes for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@309 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 03:05:57 +00:00
theraysmith
ff17d40071 More Changes to wordrec for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@307 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:51:34 +00:00
theraysmith
b47efd2cc4 Changes to wordrec for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@304 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:46:01 +00:00
theraysmith
bea5e04b76 Fixed compilation with GRAPHICS_DISABLED
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 17:24:08 +00:00
theraysmith
f3060abf71 Automake changes for potential RC of 2.04
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@248 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 02:50:54 +00:00
theraysmith
74c3f2d4af Fixed type of bit vector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@234 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-02 21:57:23 +00:00
tmbdev
a978ccb68f changed runautoconf instructions
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@183 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-08-18 20:18:21 +00:00
theraysmith
f04ff6145c Fixed name collision with jpeg library
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@159 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:35:16 +00:00
theraysmith
1faba52350 Misc fixes including safe init/end
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@151 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:52:42 +00:00
theraysmith
0e6f803ebe Updated graphics output for new java-based display
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@150 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:51:47 +00:00
theraysmith
166c867d84 Removed some compiler warnings on operator precedence
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@129 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:05:57 +00:00
theraysmith
6ae6c0a042 Made some preliminary changes for improving xheights
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@107 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:20:10 +00:00
theraysmith
f382fb56f5 Fixed various internationalization issues, mostly for training
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@106 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:18:35 +00:00
theraysmith
570af48b8b Remaining changes for Unicodeization project
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@87 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:15:07 +00:00
theraysmith
2f4a43b419 Improved consistency of results from floating point calculations
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@79 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 00:55:02 +00:00
theraysmith
02d760759f Making release 1.04
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@62 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-17 00:48:27 +00:00
theraysmith
bfd79a970e Fixed name collisions mostly with stl
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@37 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-16 01:23:42 +00:00
tmbdev
6da5fdb8d0 Added Makefile.in files back in to permit building from Subversion without installed autoconf/automake tools.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@29 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-04-10 23:15:48 +00:00
tmbdev
7fa676659b changed configuration to install header files in $(includedir)/tesseract
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@18 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-31 00:37:26 +00:00
tmbdev
9f2b3b7154 changed autoconf/automake system to use standard install paths; removed auto-generated files from repository (use runautoconf instead)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@16 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-30 23:53:34 +00:00
tmbdev
425d593ebe top-skimming import from sf.net
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-07 20:03:40 +00:00