david.eger@gmail.com
0aadbd0169
Save BLOB_CHOICE s for alternate choices saved during segmentation
...
search so we have them when trying to replace words with alternates in
the bigram correction pass.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-01 00:33:46 +00:00
zdenop@gmail.com
2a57976c41
- fix msys buil (missing -lws2_32 for library)
...
- remove old debian leptonica package
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@738 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-25 19:53:41 +00:00
zdenop@gmail.com
306a8216e1
fix creating box file from empty image (issue 516)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@737 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-03 22:32:17 +00:00
zdenop@gmail.com
60b0d10e16
fix for issue 690
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@736 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-08-01 21:57:49 +00:00
zdenop@gmail.com
b064cf511d
revert back tesseract.sln from r734
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@735 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-31 09:32:29 +00:00
zdenop
937aab009f
fix issue 636
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@734 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-31 09:22:26 +00:00
zdenop@gmail.com
eaf9d63626
Provide pkgconfig file (issue 451), improve configure.ac and INSTALL.SVN
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@733 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-28 21:17:20 +00:00
zdenop@gmail.com
8708102883
implement '--enable-debug' for ./configure; small clean up autogen.sh and configure.ac
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@732 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-22 21:36:20 +00:00
zdenop
1131e5dd2f
addition to Issue 724
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@731 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-04 15:35:26 +00:00
zdenop@gmail.com
d72a318c5c
fix Issue 724: DESTDIR not supported with make install-langs
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@730 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-07-03 20:33:28 +00:00
zdenop@gmail.com
c8eedb25a6
added ocr-capabilities for hocr conformity; XHTML 1.0 Transitional conformity; improved hocr output readability
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@729 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-28 20:44:23 +00:00
david.eger@gmail.com
6a9a3ddcb2
Zdeno pointed out that ocr_line (though not ocr_word) is actually in the hocr spec.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@728 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-27 23:58:09 +00:00
david.eger@gmail.com
d9d70919bb
Conform to the hocr spec: hocr doesn't have ocr_word, but instead has ocrx_word.
...
Tested with ExactImage's hocr2pdf.
$ tesseract phototest.tif phototest hocr
$ hocr2pdf -i phototest.tif -o ./phototest.pdf < ./phototest.hocr
$ evince phototest.pdf
See: https://docs.google.com/document/preview?id=1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@726 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-25 17:36:25 +00:00
david.eger@gmail.com
eeeb4f513c
Provide better paragraph segmentation without having to run fully
...
automatic layout analysis.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@725 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-05-10 00:03:34 +00:00
zdenop@gmail.com
e606c311f5
fix issue Issue 684 : show correct line in failure message "Couldn't find a matching blob"
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@723 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-22 20:51:00 +00:00
zdenop@gmail.com
d39cb38ab8
Fix Issue 678
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@722 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-17 17:32:42 +00:00
david.eger@gmail.com
56403c6dc3
Fix an issue where we sometimes leave a dangling outline->loop pointer
...
during chopping.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@721 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-17 00:02:52 +00:00
david.eger@gmail.com
71b3200625
Fix a shapetable serialization issue -- sizeof(bool) is not portable.
...
See http://code.google.com/p/tesseract-ocr/issues/detail?id=669
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@720 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-17 00:00:26 +00:00
david.eger@gmail.com
a253ea224a
Add some documentation on how to use config files and user dictionaries.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@719 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-09 19:41:06 +00:00
zdenop@gmail.com
aa14e8b212
fix Mingw shared build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@718 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-02 12:14:37 +00:00
zdenop@gmail.com
c2d5616a7e
add Doxyfile (doxygen config) to distribution
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@717 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-04-02 10:52:13 +00:00
zdenop@gmail.com
cd8de9157c
change comments to doxygen block comments (api)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
5958f01f5f
fix doxygen warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@715 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 15:42:06 +00:00
david.eger@gmail.com
4f0ff358a7
Missing close bracket.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@714 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-29 06:15:33 +00:00
david.eger@gmail.com
4ddb3e5941
Good moming, Good aftemoon.
...
During our initial chopping for each word, pay attention to whether a
dangerous ambiguity (like rn <-> m) would lead us to a dictionary word.
If so, make sure that blob gets chopped so that we can evaluate said
dictionary word during the segmentation search.
Large accuracy improvement, especially on English printed books (~9%).
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 21:02:54 +00:00
zdenop@gmail.com
ee44165d3d
improve doxygen config; fix doxygen warnings for baseapi.h
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@712 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:38:14 +00:00
david.eger@gmail.com
0d5e8b5cb6
Recording segmentation state for a choice at LogNewChoice() time was a
...
bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go
by a call from Dict::LogNewSplit(). Relying on the auxilury
segmentation_state makes alt choices sometimes reference the wrong
blobs.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:11:57 +00:00
zdenop@gmail.com
3f9032ef0c
fix 'make dist' for MinGW+MSYS
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@710 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-24 16:33:11 +00:00
zdenop@gmail.com
3115fbfdcb
another fix MinGW+MSYS
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@709 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-24 10:14:47 +00:00
zdenop@gmail.com
d4d4b8aad8
improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
david.eger@gmail.com
c0cd2cd605
Restore VC++ compatibility for paragraphs.cpp.
...
Missed a __func__ addition in the last merge.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@707 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-21 16:41:27 +00:00
david.eger@gmail.com
a91778397b
Fix Issue 645, a char signed/unsigned issue in paragraphs.cpp.
...
When constructing our debug strings, our simple UTF-8 processing should skip all non-ASCII chars.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@706 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-20 20:19:00 +00:00
zdenop@gmail.com
1563c01565
fixed build in java directory; create documentation package with 'make doc-pack'
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@705 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-15 21:05:12 +00:00
zdenop@gmail.com
1009a6e2f0
fopen() should use binary mode (issue 70)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@704 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-11 12:41:17 +00:00
tomp2010@gmail.com
87e03edb3a
Fix dawg2wordlist crash on Windows caused by fopening dawg file in "r" instead of "rb" mode.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@703 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-10 08:09:11 +00:00
zdenop@gmail.com
2972cc426b
+ fix VS2008 warning about "non dll-interface class tesseract::LTRResultIterator used as base for dll-interface class tesseract::ResultIterator" by making LTRResultIterator also visible.
...
+ Changed Project preprocessor definition of WINDLLNAME, because stringizing operator doesn't seem to work when initializing tessedit_module_name in ccutil/ccutil.cpp (which was omitted in previous fixes).
+ Update vs2008/tesshelper.py for new public header files.
patch from Tom Powers (https://groups.google.com/group/tesseract-dev/msg/6da2799cd2cb9844 )
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@702 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-08 21:15:13 +00:00
zdenop@gmail.com
2f1c112640
+Remove visibility from protected members of tesseract::TessBaseAPI class by applying TESS_LOCAL macro;
...
+Make PageIterator & ResultIterator classes visible by applying TESS_API macro;
+Fix api/Makefile.am & training/Makefile.am to allow Parallel Build Trees;
patch from Tom Powers (https://groups.google.com/group/tesseract-dev/msg/9d00579540e44055 )
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@701 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-07 22:04:46 +00:00
zdenop@gmail.com
1455bf5610
set tessedit_module_name for windows;
...
implement 'make install LANG="eng ara deu"';
more headers need to be installed: https://groups.google.com/group/tesseract-dev/msg/a4f7424377993b2e
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@700 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 22:41:43 +00:00
david.eger@gmail.com
c2e84c4606
Fix two issues with GetHOCRText():
...
+ make it not seg-fault if called without calling SetInputName().
+ make it not leak memory (thank you valgrind)
http://code.google.com/p/tesseract-ocr/issues/detail?id=463
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@699 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 21:18:16 +00:00
david.eger@gmail.com
75a9a8fae7
Address "RIL_PARA doesn't work" comment in issue 622.
...
http://code.google.com/p/tesseract-ocr/issues/detail?id=622
The core of the problem is that in PSM_SINGLE_BLOCK mode, Tesseract
doesn't run paragraph detection, so no paragraphs get generated. Here,
we make sure that even if run in a mode where no paragraphs get
generated, we treat each block as its own paragraph.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@696 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 20:02:57 +00:00
zdenop@gmail.com
8cc34e85f1
'make install' do not require language data; language data are installed by 'make install-langs'
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@695 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-05 00:11:38 +00:00
zdenop@gmail.com
765832d449
fixes issue 573 where boolean was being compared to float;
...
tesseract prints full version info when -v arg;
removes extra includes from tesseractmain.h;
removes extra DLLEXPORT & DLLIMPORT from hosts.h;
remove CCUTIL_IMPORTS & CCUTIL_EXPORTS from vs2008 *.vcproj;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@694 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-04 22:27:16 +00:00
zdenop@gmail.com
5761bc5736
fix visibility build; + tprintf visible
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@693 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 21:13:42 +00:00
zdenop@gmail.com
97e19443a3
install only necessary headers, fix uninstall
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@692 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 13:22:51 +00:00
zdenop@gmail.com
3b326532cc
fix --enable-multiple-libraries; implement quite mode (issue 580)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@691 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 11:48:59 +00:00
zdenop@gmail.com
30a70142a0
visibility - autotools part (./configure --enable-visibility)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@690 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 23:51:33 +00:00
zdenop@gmail.com
a776e0be85
TP: visibility trial - code & windows build changes (without autotools changes)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@689 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:48:45 +00:00
zdenop@gmail.com
e216adab43
fix configure.ac; unify identifiers (WIN32 vs _WIN32)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
zdenop@gmail.com
657722aeca
added missing changes for r686
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@687 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 23:19:35 +00:00
zdenop@gmail.com
49c4ce3183
fix for GRAPHICS_DISABLED build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00