Commit Graph

78 Commits

Author SHA1 Message Date
zdenop
fad9de4e1b fix issue 1217: GetThresholdedImage accesses possibly NULL thresholder_
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1113 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 21:21:37 +00:00
zdenop
36f3f76d64 fix tiff issue on windows
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1111 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 07:27:54 +00:00
zdenop@gmail.com
84cdcb32cc fixed windows build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-26 06:48:58 +00:00
zdenop
ffe52737d5 check if input file exists
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1108 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-25 19:58:00 +00:00
theraysmith@gmail.com
25a8c7b720 Enabled streaming input and output of multi-page documents
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1105 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:46:21 +00:00
zdenop
44b0d0e28e addition to r1100
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1101 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:24:54 +00:00
zdenop
6051e40212 fix issue 1197
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1100 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:20:38 +00:00
zdenop
bdb912c186 escape input_file name in hOCR output - fix issue 1154
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1098 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-09 22:19:30 +00:00
theraysmith@gmail.com
45e106820f Fixed issue 1116
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1074 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 00:50:27 +00:00
theraysmith@gmail.com
2fcea93846 Fixed issues 1081-1090
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
1a487252f4 Fixed slow-down that was caused by upping MAX_NUM_CLASSES
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1013 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:12:35 +00:00
zdenop@gmail.com
71ae509354 fix for mingw32/g++ 4.8.1
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@998 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-22 08:10:15 +00:00
zdenop@gmail.com
ef3b1d936e fix mingw build issues
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@995 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-18 09:00:54 +00:00
zdenop@gmail.com
94d08567e1 fix vs2010 (and maybe vs2008) build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@983 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:13:55 +00:00
theraysmith@gmail.com
91d2265429 More minor fixes from issues and cleanup
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@974 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 01:38:00 +00:00
theraysmith@gmail.com
f2ec85d1e1 Added PDF renderer
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@962 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:58:55 +00:00
zdenop
11f7eea7e1 fix tiff identification
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@934 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-06 13:25:42 +00:00
zdenop
fced05f419 identify all supported tiff version by leptonica
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@931 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-22 21:47:07 +00:00
zdenop
9de80e0a06 fix resource leaks - issues 1034, 1038, 1040. Thanks to Martin Ettl
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@920 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-13 22:13:52 +00:00
rajesh.katikam@gmail.com
b8d7a1d139 Fixed all the crashes observed on 24 bit and 8 bit images.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@919 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-10 10:52:54 +00:00
rajesh.katikam@gmail.com
983aaabaae Initial version of OpenCL support added.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@909 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-11 17:43:13 +00:00
zdenop@gmail.com
c7ba981e04 fix validity of hocr output of multipage image
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@908 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-10 22:00:54 +00:00
zdenop@gmail.com
e66d433907 fix issue 938: change tessdata-dir/datadir rules; implement --tessdata-dir option
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@907 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-10 20:59:11 +00:00
zdenop@gmail.com
77c1b41e4e fix svn:executable atribute, trailing spaces, version include
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@903 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-03 17:24:00 +00:00
theraysmith@gmail.com
88ea81c89e Added renderer to API
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@869 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-20 19:39:59 +00:00
zdenop@gmail.com
b5e16669e1 fix issue 946/reopen issue 903
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@865 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-25 15:54:30 +00:00
zdenop@gmail.com
b1fd75ccf9 amend r:862
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@863 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-14 14:11:16 +00:00
zdenop@gmail.com
c45bb08a6e check inputformat before getting number of pages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@862 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-14 13:58:23 +00:00
zdenop@gmail.com
b5d3d66a68 remove unused code(gettext)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@859 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-07 16:39:13 +00:00
zdenop@gmail.com
4c16ff6a1f use leptonica for getting number of pages instead of own code
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@858 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-05 16:07:25 +00:00
zdenop@gmail.com
e5628e5e1a fix hOCR output - do not print empty words: issue 903
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@854 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-06-23 15:10:24 +00:00
zdenop@gmail.com
a6bee550e8 Add lang and dir attributes to each word in hOCR output (fix issue 878);
Unify usage of single quote in hOCR output 


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@832 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-03-28 21:37:55 +00:00
zdenop@gmail.com
db52047420 fix issue 809: invalid hOCR output file on windows when input filename has non ascii chars.
Add release date to vs2008/doc/versions.html

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@828 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-02-23 15:01:21 +00:00
zdenop@gmail.com
9b2906c67e fix issue 800: Get rid of glob() for searching available languages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@810 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-11-30 22:11:22 +00:00
zdenop@gmail.com
5d9fd5fb72 add word confidence info (x_wconf) to hocr output/fix issue 748
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@806 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-11-06 21:18:35 +00:00
zdenop@gmail.com
23f1d16037 fix fox issue 346 / GetAvailableLanguagesAsVector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@760 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-24 05:20:23 +00:00
theraysmith@gmail.com
fbf7968490 Fixed problem with blank pages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@750 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:27:25 +00:00
zdenop@gmail.com
306a8216e1 fix creating box file from empty image (issue 516)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@737 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-03 22:32:17 +00:00
zdenop@gmail.com
c8eedb25a6 added ocr-capabilities for hocr conformity; XHTML 1.0 Transitional conformity; improved hocr output readability
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@729 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-28 20:44:23 +00:00
david.eger@gmail.com
6a9a3ddcb2 Zdeno pointed out that ocr_line (though not ocr_word) is actually in the hocr spec.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@728 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-27 23:58:09 +00:00
david.eger@gmail.com
d9d70919bb Conform to the hocr spec: hocr doesn't have ocr_word, but instead has ocrx_word.
Tested with ExactImage's hocr2pdf. 
$ tesseract phototest.tif phototest hocr
$ hocr2pdf -i phototest.tif -o ./phototest.pdf < ./phototest.hocr 
$ evince phototest.pdf 

See: https://docs.google.com/document/preview?id=1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0 



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@726 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-25 17:36:25 +00:00
david.eger@gmail.com
eeeb4f513c Provide better paragraph segmentation without having to run fully
automatic layout analysis.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@725 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-10 00:03:34 +00:00
zdenop@gmail.com
cd8de9157c change comments to doxygen block comments (api)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
d4d4b8aad8 improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
david.eger@gmail.com
c2e84c4606 Fix two issues with GetHOCRText():
+ make it not seg-fault if called without calling SetInputName().
+ make it not leak memory (thank you valgrind)

http://code.google.com/p/tesseract-ocr/issues/detail?id=463



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@699 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 21:18:16 +00:00
zdenop@gmail.com
3b326532cc fix --enable-multiple-libraries; implement quite mode (issue 580)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@691 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 11:48:59 +00:00
zdenop@gmail.com
e216adab43 fix configure.ac; unify identifiers (WIN32 vs _WIN32)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
zdenop@gmail.com
49c4ce3183 fix for GRAPHICS_DISABLED build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
zdenop@gmail.com
df1cbdd7d3 fix for issue 463 (GetHOCRText segfaults unless SetInputName has been called first); removed declaration of GetLastInitLanguage
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@684 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-27 17:19:20 +00:00