Commit Graph

139 Commits

Author SHA1 Message Date
theraysmith@gmail.com
3a998fe7ac Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:59:49 +00:00
theraysmith@gmail.com
ac014eb27a Added experimental equation detector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@646 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:50:01 +00:00
theraysmith@gmail.com
ef786ad29b Moved ResultIterator/PageIterator to ccmain
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@645 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:47:59 +00:00
zdenop@gmail.com
67f47008c7 fixed "one lib" build on linux; runautoconf renamed to autogen.sh;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@631 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-16 19:39:54 +00:00
zdenop@gmail.com
da41b96f7f removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
joregan@gmail.com
bf4a09d72a make single/multiple libraries optional -- this needs testing!!!
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@623 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-29 21:28:28 +00:00
theraysmith@gmail.com
4575c52ff5 Removed debugwin.cpp, fixing issue 448
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@613 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:45:59 +00:00
theraysmith@gmail.com
d5d15f32d7 Deleted Makefile.in from svn
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@606 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:32:44 +00:00
zdenop@gmail.com
9b7375edd6 MinGW portability solved + some code cleanup (based on cpplint)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@605 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-15 19:28:25 +00:00
zdenop@gmail.com
7ec3dca968 show page 0 for multipage tiff;
Windows: use binary mode for fopen (issue 70);
autotools: fixed cutil/Makefile.am, improved tessdata/Makefile.am;

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@604 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-11 21:42:13 +00:00
zdenop@gmail.com
4abdfdb8fe moved ccstruct/callcpp.cpp to cutil (to header file - see issue 414); moved vs2008/include/stdint.h to vs2008/port/stdint.h so we can use vs2008/include also for mingw; removed unused tessembedded.*
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@603 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-11 14:04:20 +00:00
theraysmith
3e8c0bc228 Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:44:05 +00:00
theraysmith
7121e51422 Deleted lots of dead code, including PBLOB
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@556 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:52:08 +00:00
theraysmith
137f4806b6 Added sub/superscript, small/dropcap detection
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:32:20 +00:00
theraysmith
c8465252e4 Rewrite of DENORM
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@538 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:05:48 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
zdenop@gmail.com
282aa13975 *.vcproj moved to vs2008/ (bin/ and bin.dbg/ will be in vs2008/)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@506 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-10-06 21:38:19 +00:00
joregan
e0b07948fc disabling gettext checks - not currently used, and something about disabling is causing subsequent autoconf checks to not run
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@492 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 16:27:39 +00:00
joregan
f2506871f9 move include of config_auto.h to not conflict with local types. Not finished
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
9943e96163 fix issue 359 - patch from yukihiro.nakadaira
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@481 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 01:02:56 +00:00
zdenop@gmail.com
8e2018d9ec git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@473 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2010-09-29 21:49:36 +00:00
joregan
2d7821506d small tweaks to doxygen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@451 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-12 18:55:59 +00:00
joregan
08defee46e more doxygen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@450 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-10 19:20:11 +00:00
joregan
575b2de48a doxygen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@446 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-28 00:38:09 +00:00
joregan
b6e3cbea5a more doxygen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@445 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 16:39:45 +00:00
joregan
924f231808 more doxygen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@442 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 14:58:33 +00:00
joregan
a18816f839 partial merge of doxygen branch (stuff without conflicts, basically)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@441 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 13:23:23 +00:00
joregan
4acaabdb62 make some static
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@440 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-26 18:21:10 +00:00
joregan
7e8bd73aea some casts to get rid of persistent warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@435 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 21:19:53 +00:00
joregan
cd96d8ede5 more warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@434 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 18:11:00 +00:00
joregan
edf7e7694c silence more useless warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@432 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 15:11:19 +00:00
joregan
522a8ccfc4 fix issue 332
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@429 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-20 10:31:49 +00:00
joregan
54e610e7c0 mark 2 functions static (start to cut down on the export bloat)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@428 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 23:29:17 +00:00
joregan
7fee1ed025 this code was so illegible that I *must* replace it *now*
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@427 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 22:38:39 +00:00
joregan
69d6d35f28 patch for issue 304 from max.markin
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@422 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 02:32:21 +00:00
joregan
a301f9a5c7 start of i18n
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@418 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-19 01:59:13 +00:00
joregan
5279e34296 GRAPHICS_ENABLED means ScrollView, but the correct #define was not being set
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@407 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 16:03:29 +00:00
joregan
00f6c5d371 more
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@405 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 15:29:01 +00:00
joregan
95db341728 update comment about format
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@398 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-05 11:52:17 +00:00
joregan
cfcd9a1b5a make cppcheck happy
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@388 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-30 03:16:54 +00:00
joregan
5c8ad7ee72 add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-28 12:03:45 +00:00
joregan
ddcb98565a update generated autoconf/make stuff
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@369 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:21:37 +00:00
joregan
34d8258049 use libtool
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@368 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 14:20:20 +00:00
joregan
38a6b18a5f disable MSVC warning C4244 in a number of places to cut down the noise
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@363 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-26 10:22:27 +00:00
theraysmith
8d654e7476 Fixed issue 243, ungraded helpers, genericvector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@340 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-19 22:35:35 +00:00
theraysmith
57d669ff84 Fixed issue 229: lack of bits per sample
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@316 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-08-20 22:30:21 +00:00
theraysmith
9e67cb0773 More accidetal files
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@290 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:15:47 +00:00
theraysmith
eb0ab3ed02 Deleting files from ccstruct added by mistake to ccmain
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@288 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:12:51 +00:00
theraysmith
96e8b51feb More changes to ccmain for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@287 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:07:25 +00:00
theraysmith
109d1c8f21 Some changes in ccmain for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@286 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:03:51 +00:00
theraysmith
2ac934453f Improved box accuracy on failed blobs
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@270 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-30 01:48:21 +00:00
theraysmith
bea5e04b76 Fixed compilation with GRAPHICS_DISABLED
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 17:24:08 +00:00
theraysmith
f3060abf71 Automake changes for potential RC of 2.04
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@248 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 02:50:54 +00:00
theraysmith
e4b9281726 Fixed output of tprintf for windows
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@235 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-02 21:59:39 +00:00
theraysmith
51ed03368d Fixes to lists so an empty constructor is not needed + reenable debugging
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@207 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-12-30 18:15:44 +00:00
theraysmith
cb3b9b492f Fixed tiffio problems with 32 bit images, issue 160 and duplicates
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@204 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-12-24 01:02:14 +00:00
tmbdev
a978ccb68f changed runautoconf instructions
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@183 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-08-18 20:18:21 +00:00
mezhirov
3f218cd158 Bugfix (usage of bounding_union() changed)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@169 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-29 16:54:43 +00:00
theraysmith
f3e67dd89b Improved autoconf to find leptonica headers if present
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@168 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 17:34:42 +00:00
theraysmith
3cf46f21d4 Fixed stupid crash error in 2.02
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@167 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 15:42:11 +00:00
mezhirov
a4d75230fc Converted 8 spaces to tabs in two Makefile.am-s.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@166 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 14:49:14 +00:00
theraysmith
7870d67c21 Fixed name collision with jpeg library
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@157 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:32:14 +00:00
theraysmith
10265fb9cc Updated graphics output for new java-based display
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@136 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:33:18 +00:00
theraysmith
d543e8c2bc added leptonica support and additional interfaces
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@135 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:28:18 +00:00
theraysmith
830a2f54b9 Removed some compiler warnings on operator precedence
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@131 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:13:28 +00:00
theraysmith
6b5e0c4046 Made some major classifier and clustering improvements
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@130 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:07:59 +00:00
theraysmith
dd18aea052 Added multi-page tiff capability
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@128 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:00:46 +00:00
mezhirov
7bb68d2f20 got rid of home-made bbox functions
bug(?) fix


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@124 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-10-08 12:58:53 +00:00
tmbdev
46123802d1 added potential new APIs for communicating page segmentation information and performing line recognition with baseline data; Ray will think about implementing these
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@122 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-31 23:13:00 +00:00
theraysmith
981407d6fa Fixed the code added for ocropus
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@111 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:25:50 +00:00
theraysmith
b60c6065e3 Autoconf changes for 2.01
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:25:18 +00:00
theraysmith
6ae6c0a042 Made some preliminary changes for improving xheights
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@107 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:20:10 +00:00
theraysmith
f382fb56f5 Fixed various internationalization issues, mostly for training
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@106 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:18:35 +00:00
mezhirov
024c9f49c0 This is the first draft of Tesseract API that is used by Ocropus.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@103 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-22 13:17:45 +00:00
theraysmith
6f6a5f9767 More new files for v2.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@89 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:30:21 +00:00
theraysmith
570af48b8b Remaining changes for Unicodeization project
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@87 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:15:07 +00:00
theraysmith
627368df42 API/output changes to produce unlv-style latin-1 output and test scripts
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@86 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:11:18 +00:00
theraysmith
4df1016692 Automake changes for version 2.00.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@84 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:04:56 +00:00
theraysmith
0d9fa6a040 Fixed portability problems with VC++ 6 and VC++ express.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@83 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:01:50 +00:00
theraysmith
1943de9aa9 Fixed the extern C mismatches properly.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@82 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-07-18 01:00:54 +00:00
theraysmith
02d760759f Making release 1.04
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@62 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-17 00:48:27 +00:00
theraysmith
974bfda143 Misc improvements
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@54 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-16 01:44:02 +00:00
theraysmith
0a53f8c5bf Preparations for unicodization
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@34 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-16 01:18:59 +00:00
theraysmith
4dffd5442c Added windows dll from Jetsoft
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@33 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-05-16 01:18:28 +00:00
mezhirov
a9045a20e2 Fixed c/c++ linking (patch by Aaron Digulla)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@30 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-04-13 17:43:37 +00:00
tmbdev
6da5fdb8d0 Added Makefile.in files back in to permit building from Subversion without installed autoconf/automake tools.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@29 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-04-10 23:15:48 +00:00
tmbdev
7fa676659b changed configuration to install header files in $(includedir)/tesseract
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@18 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-31 00:37:26 +00:00
tmbdev
9f2b3b7154 changed autoconf/automake system to use standard install paths; removed auto-generated files from repository (use runautoconf instead)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@16 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-30 23:53:34 +00:00
tmbdev
425d593ebe top-skimming import from sf.net
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-07 20:03:40 +00:00