Commit Graph

151 Commits

Author SHA1 Message Date
zdenop@gmail.com
b1fd75ccf9 amend r:862
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@863 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-14 14:11:16 +00:00
zdenop@gmail.com
c45bb08a6e check inputformat before getting number of pages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@862 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-14 13:58:23 +00:00
zdenop@gmail.com
b5d3d66a68 remove unused code(gettext)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@859 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-07 16:39:13 +00:00
zdenop@gmail.com
4c16ff6a1f use leptonica for getting number of pages instead of own code
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@858 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-05 16:07:25 +00:00
zdenop@gmail.com
e5628e5e1a fix hOCR output - do not print empty words: issue 903
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@854 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-06-23 15:10:24 +00:00
zdenop@gmail.com
a6bee550e8 Add lang and dir attributes to each word in hOCR output (fix issue 878);
Unify usage of single quote in hOCR output 


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@832 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-03-28 21:37:55 +00:00
zdenop@gmail.com
db52047420 fix issue 809: invalid hOCR output file on windows when input filename has non ascii chars.
Add release date to vs2008/doc/versions.html

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@828 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-02-23 15:01:21 +00:00
zdenop@gmail.com
9b2906c67e fix issue 800: Get rid of glob() for searching available languages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@810 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-11-30 22:11:22 +00:00
zdenop@gmail.com
5d9fd5fb72 add word confidence info (x_wconf) to hocr output/fix issue 748
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@806 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-11-06 21:18:35 +00:00
zdenop@gmail.com
23f1d16037 fix fox issue 346 / GetAvailableLanguagesAsVector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@760 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-24 05:20:23 +00:00
theraysmith@gmail.com
fbf7968490 Fixed problem with blank pages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@750 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:27:25 +00:00
zdenop@gmail.com
306a8216e1 fix creating box file from empty image (issue 516)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@737 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-03 22:32:17 +00:00
zdenop@gmail.com
c8eedb25a6 added ocr-capabilities for hocr conformity; XHTML 1.0 Transitional conformity; improved hocr output readability
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@729 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-28 20:44:23 +00:00
david.eger@gmail.com
6a9a3ddcb2 Zdeno pointed out that ocr_line (though not ocr_word) is actually in the hocr spec.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@728 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-27 23:58:09 +00:00
david.eger@gmail.com
d9d70919bb Conform to the hocr spec: hocr doesn't have ocr_word, but instead has ocrx_word.
Tested with ExactImage's hocr2pdf. 
$ tesseract phototest.tif phototest hocr
$ hocr2pdf -i phototest.tif -o ./phototest.pdf < ./phototest.hocr 
$ evince phototest.pdf 

See: https://docs.google.com/document/preview?id=1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0 



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@726 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-25 17:36:25 +00:00
david.eger@gmail.com
eeeb4f513c Provide better paragraph segmentation without having to run fully
automatic layout analysis.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@725 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-10 00:03:34 +00:00
zdenop@gmail.com
cd8de9157c change comments to doxygen block comments (api)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
d4d4b8aad8 improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
david.eger@gmail.com
c2e84c4606 Fix two issues with GetHOCRText():
+ make it not seg-fault if called without calling SetInputName().
+ make it not leak memory (thank you valgrind)

http://code.google.com/p/tesseract-ocr/issues/detail?id=463



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@699 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 21:18:16 +00:00
zdenop@gmail.com
3b326532cc fix --enable-multiple-libraries; implement quite mode (issue 580)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@691 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 11:48:59 +00:00
zdenop@gmail.com
e216adab43 fix configure.ac; unify identifiers (WIN32 vs _WIN32)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
zdenop@gmail.com
49c4ce3183 fix for GRAPHICS_DISABLED build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
zdenop@gmail.com
df1cbdd7d3 fix for issue 463 (GetHOCRText segfaults unless SetInputName has been called first); removed declaration of GetLastInitLanguage
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@684 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-27 17:19:20 +00:00
zdenop@gmail.com
6ccab83bd6 fixing issue 628 (replacing __MSW32__ with _WIN32) and issue 614 (reverting "class DLLSYM STRING" to "class CCUTIL_API STRING")
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@677 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-19 21:48:45 +00:00
theraysmith@gmail.com
23dfabcab1 Cleaned up externally used namespace by removing includes from baseapi.h
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@657 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:14:16 +00:00
zdenop@gmail.com
67f47008c7 fixed "one lib" build on linux; runautoconf renamed to autogen.sh;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@631 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-16 19:39:54 +00:00
zdenop@gmail.com
da41b96f7f removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
zdenop@gmail.com
7ec3dca968 show page 0 for multipage tiff;
Windows: use binary mode for fopen (issue 70);
autotools: fixed cutil/Makefile.am, improved tessdata/Makefile.am;

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@604 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-11 21:42:13 +00:00
zdenop@gmail.com
1ad70ea8ff fixing issues 518 and 521
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@596 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-07-27 20:56:40 +00:00
zdenop@gmail.com
505c8dbece changed "xocr_word" to "ocrx_word" according hOCR spec
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@585 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-05-24 20:53:58 +00:00
theraysmith
c81483f714 Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@566 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:43:04 +00:00
theraysmith
a3f30eb5c7 Deleted lots of dead code, including PBLOB
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@555 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:51:34 +00:00
theraysmith
f040994f51 Fixed closing meta element in hocr output
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@549 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 06:25:20 +00:00
theraysmith
a7db6dada9 Fix for linking with leptonica on Linux.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@548 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:40:39 +00:00
theraysmith
137f4806b6 Added sub/superscript, small/dropcap detection
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:32:20 +00:00
zdenop@gmail.com
c707b26d5f fixed VC++2008 Express build after last changes
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@543 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 12:46:41 +00:00
theraysmith
ef59841ebe Moved multipage code to BaseAPI and tidied up command line handling
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@532 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 00:58:30 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
zdenop@gmail.com
7511d76315 fixed hocr to produce valid document (acording http://validator.w3.org/) - issue http://code.google.com/p/tesseract-ocr/issues/detail?id=401
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@525 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-17 20:03:58 +00:00
zdenop@gmail.com
fa4d4589cb fixed hocr (escape special special characters; thank to aizvorski) + hocr config)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@515 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-10-29 19:03:06 +00:00
zdenop@gmail.com
346da8c1e5 missing returns in nonvoid functions (thanks to rusnakp) issue 389;
corrected windows installation script - tesseract should be not run as start-up application;

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@514 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-10-26 13:35:02 +00:00
joregan
e0b07948fc disabling gettext checks - not currently used, and something about disabling is causing subsequent autoconf checks to not run
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@492 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 16:27:39 +00:00
joregan
f2506871f9 move include of config_auto.h to not conflict with local types. Not finished
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
5279e34296 GRAPHICS_ENABLED means ScrollView, but the correct #define was not being set
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@407 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 16:03:29 +00:00
joregan
c4118eb6cb change define
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@404 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-06-27 15:27:10 +00:00
joregan
8cd185d49f float casts within fabs() - partial patch from issue 304
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@374 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-27 01:45:47 +00:00
theraysmith
a5b4570180 Added page numbers to box files
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@352 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-20 23:06:35 +00:00
theraysmith
9815506429 Fixed issue 299
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@346 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-20 15:34:31 +00:00
theraysmith
e3e78b076b Fixed issue 263 with modified patch.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@333 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-19 18:35:40 +00:00
theraysmith
5d00a436d1 Fixed issue 257
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@325 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-17 21:15:14 +00:00
theraysmith
d745f3fd7d New files in api for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@285 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:03:09 +00:00