Commit Graph

89 Commits

Author SHA1 Message Date
theraysmith@gmail.com
605fd7488b Fixed relative-to-executable tessdata location, while allowing for addition of terminating /
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@774 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-10-09 00:41:08 +00:00
zdenop@gmail.com
ceff3288d7 fix issue 764...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@768 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-27 08:43:55 +00:00
zdenop@gmail.com
fb91759cdc fix issue 764 and clean tabulators, trim trailing spaces...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@767 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-27 08:24:46 +00:00
zdenop@gmail.com
23f1d16037 fix fox issue 346 / GetAvailableLanguagesAsVector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@760 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-24 05:20:23 +00:00
zdenop@gmail.com
dc8bd4682b C-API (fix issue 362)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@759 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-24 05:14:11 +00:00
theraysmith@gmail.com
fbf7968490 Fixed problem with blank pages
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@750 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:27:25 +00:00
zdenop@gmail.com
2a57976c41 - fix msys buil (missing -lws2_32 for library)
- remove old debian leptonica package


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@738 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-25 19:53:41 +00:00
zdenop@gmail.com
306a8216e1 fix creating box file from empty image (issue 516)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@737 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-03 22:32:17 +00:00
zdenop@gmail.com
c8eedb25a6 added ocr-capabilities for hocr conformity; XHTML 1.0 Transitional conformity; improved hocr output readability
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@729 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-28 20:44:23 +00:00
david.eger@gmail.com
6a9a3ddcb2 Zdeno pointed out that ocr_line (though not ocr_word) is actually in the hocr spec.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@728 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-27 23:58:09 +00:00
david.eger@gmail.com
d9d70919bb Conform to the hocr spec: hocr doesn't have ocr_word, but instead has ocrx_word.
Tested with ExactImage's hocr2pdf. 
$ tesseract phototest.tif phototest hocr
$ hocr2pdf -i phototest.tif -o ./phototest.pdf < ./phototest.hocr 
$ evince phototest.pdf 

See: https://docs.google.com/document/preview?id=1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0 



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@726 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-25 17:36:25 +00:00
david.eger@gmail.com
eeeb4f513c Provide better paragraph segmentation without having to run fully
automatic layout analysis.



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@725 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-10 00:03:34 +00:00
zdenop@gmail.com
aa14e8b212 fix Mingw shared build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@718 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-02 12:14:37 +00:00
zdenop@gmail.com
cd8de9157c change comments to doxygen block comments (api)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
ee44165d3d improve doxygen config; fix doxygen warnings for baseapi.h
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@712 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:38:14 +00:00
zdenop@gmail.com
3115fbfdcb another fix MinGW+MSYS
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@709 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-24 10:14:47 +00:00
zdenop@gmail.com
d4d4b8aad8 improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
zdenop@gmail.com
2f1c112640 +Remove visibility from protected members of tesseract::TessBaseAPI class by applying TESS_LOCAL macro;
+Make PageIterator & ResultIterator classes visible by applying TESS_API macro;
+Fix api/Makefile.am & training/Makefile.am to allow Parallel Build Trees;
patch from Tom Powers (https://groups.google.com/group/tesseract-dev/msg/9d00579540e44055)

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@701 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-07 22:04:46 +00:00
david.eger@gmail.com
c2e84c4606 Fix two issues with GetHOCRText():
+ make it not seg-fault if called without calling SetInputName().
+ make it not leak memory (thank you valgrind)

http://code.google.com/p/tesseract-ocr/issues/detail?id=463



git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@699 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 21:18:16 +00:00
zdenop@gmail.com
765832d449 fixes issue 573 where boolean was being compared to float;
tesseract prints full version info when -v arg;
removes extra includes from tesseractmain.h;
removes extra DLLEXPORT & DLLIMPORT from hosts.h;
remove CCUTIL_IMPORTS & CCUTIL_EXPORTS from vs2008 *.vcproj;


git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@694 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-04 22:27:16 +00:00
zdenop@gmail.com
97e19443a3 install only necessary headers, fix uninstall
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@692 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 13:22:51 +00:00
zdenop@gmail.com
3b326532cc fix --enable-multiple-libraries; implement quite mode (issue 580)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@691 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 11:48:59 +00:00
zdenop@gmail.com
30a70142a0 visibility - autotools part (./configure --enable-visibility)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@690 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 23:51:33 +00:00
zdenop@gmail.com
a776e0be85 TP: visibility trial - code & windows build changes (without autotools changes)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@689 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:48:45 +00:00
zdenop@gmail.com
e216adab43 fix configure.ac; unify identifiers (WIN32 vs _WIN32)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
zdenop@gmail.com
49c4ce3183 fix for GRAPHICS_DISABLED build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
zdenop@gmail.com
df1cbdd7d3 fix for issue 463 (GetHOCRText segfaults unless SetInputName has been called first); removed declaration of GetLastInitLanguage
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@684 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-27 17:19:20 +00:00
zdenop@gmail.com
492f9119c2 check return code of API init (issue 593)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@680 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-26 14:48:35 +00:00
zdenop@gmail.com
6ccab83bd6 fixing issue 628 (replacing __MSW32__ with _WIN32) and issue 614 (reverting "class DLLSYM STRING" to "class CCUTIL_API STRING")
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@677 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-19 21:48:45 +00:00
theraysmith@gmail.com
23dfabcab1 Cleaned up externally used namespace by removing includes from baseapi.h
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@657 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:14:16 +00:00
theraysmith@gmail.com
ef786ad29b Moved ResultIterator/PageIterator to ccmain
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@645 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:47:59 +00:00
zdenop@gmail.com
67f47008c7 fixed "one lib" build on linux; runautoconf renamed to autogen.sh;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@631 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-16 19:39:54 +00:00
max.markin@gmail.com
0fef845950 VC2010: add support for dynamic linking
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@629 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-10-15 22:17:19 +00:00
zdenop@gmail.com
da41b96f7f removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
joregan@gmail.com
bf4a09d72a make single/multiple libraries optional -- this needs testing!!!
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@623 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-29 21:28:28 +00:00
theraysmith@gmail.com
0d969b7b3a Fixed problem of config file vs command line for pageseg mode
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@611 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:41:51 +00:00
theraysmith@gmail.com
7ab0a97180 Fixed comment re bln_numericmode
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@610 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:41:03 +00:00
theraysmith@gmail.com
d5d15f32d7 Deleted Makefile.in from svn
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@606 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:32:44 +00:00
zdenop@gmail.com
7ec3dca968 show page 0 for multipage tiff;
Windows: use binary mode for fopen (issue 70);
autotools: fixed cutil/Makefile.am, improved tessdata/Makefile.am;

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@604 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-11 21:42:13 +00:00
zdenop@gmail.com
9b9efa8e4c man pages included to install script, improved windows installer script (issue 425), output format for "tesseract -v" changed to "3.00 version", README cleanup.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@601 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-08 20:33:18 +00:00
zdenop@gmail.com
411e074b4d fix for issues 479, 524 + tests for input image (there are no leptonica error messages on Windows console)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@597 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-07-29 21:55:49 +00:00
zdenop@gmail.com
1ad70ea8ff fixing issues 518 and 521
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@596 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-07-27 20:56:40 +00:00
zdenop@gmail.com
505c8dbece changed "xocr_word" to "ocrx_word" according hOCR spec
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@585 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-05-24 20:53:58 +00:00
zdenop@gmail.com
b54eee99ac configure script requires liblept;
add '--version' option for tesseract as alternative to '-v'

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@584 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-05-24 20:17:28 +00:00
theraysmith
c81483f714 Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@566 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:43:04 +00:00
theraysmith
a3f30eb5c7 Deleted lots of dead code, including PBLOB
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@555 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:51:34 +00:00
theraysmith
0d81f4b649 Fixed problem that was preventing pagesegmode from being set by config file
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@554 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:43:38 +00:00
theraysmith
f040994f51 Fixed closing meta element in hocr output
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@549 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 06:25:20 +00:00
theraysmith
a7db6dada9 Fix for linking with leptonica on Linux.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@548 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:40:39 +00:00
theraysmith
137f4806b6 Added sub/superscript, small/dropcap detection
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:32:20 +00:00