david.eger@gmail.com
4ddb3e5941
Good moming, Good aftemoon.
...
During our initial chopping for each word, pay attention to whether a
dangerous ambiguity (like rn <-> m) would lead us to a dictionary word.
If so, make sure that blob gets chopped so that we can evaluate said
dictionary word during the segmentation search.
Large accuracy improvement, especially on English printed books (~9%).
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 21:02:54 +00:00
zdenop@gmail.com
ee44165d3d
improve doxygen config; fix doxygen warnings for baseapi.h
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@712 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:38:14 +00:00
david.eger@gmail.com
0d5e8b5cb6
Recording segmentation state for a choice at LogNewChoice() time was a
...
bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go
by a call from Dict::LogNewSplit(). Relying on the auxilury
segmentation_state makes alt choices sometimes reference the wrong
blobs.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-28 20:11:57 +00:00
zdenop@gmail.com
3f9032ef0c
fix 'make dist' for MinGW+MSYS
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@710 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-24 16:33:11 +00:00
zdenop@gmail.com
3115fbfdcb
another fix MinGW+MSYS
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@709 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-24 10:14:47 +00:00
zdenop@gmail.com
d4d4b8aad8
improve autools system (mingw+msys fix); implementation of --disable-tessdata-prefix
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@708 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-22 20:01:33 +00:00
david.eger@gmail.com
c0cd2cd605
Restore VC++ compatibility for paragraphs.cpp.
...
Missed a __func__ addition in the last merge.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@707 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-21 16:41:27 +00:00
david.eger@gmail.com
a91778397b
Fix Issue 645, a char signed/unsigned issue in paragraphs.cpp.
...
When constructing our debug strings, our simple UTF-8 processing should skip all non-ASCII chars.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@706 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-20 20:19:00 +00:00
zdenop@gmail.com
1563c01565
fixed build in java directory; create documentation package with 'make doc-pack'
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@705 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-15 21:05:12 +00:00
zdenop@gmail.com
1009a6e2f0
fopen() should use binary mode (issue 70)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@704 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-11 12:41:17 +00:00
tomp2010@gmail.com
87e03edb3a
Fix dawg2wordlist crash on Windows caused by fopening dawg file in "r" instead of "rb" mode.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@703 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-10 08:09:11 +00:00
zdenop@gmail.com
2972cc426b
+ fix VS2008 warning about "non dll-interface class tesseract::LTRResultIterator used as base for dll-interface class tesseract::ResultIterator" by making LTRResultIterator also visible.
...
+ Changed Project preprocessor definition of WINDLLNAME, because stringizing operator doesn't seem to work when initializing tessedit_module_name in ccutil/ccutil.cpp (which was omitted in previous fixes).
+ Update vs2008/tesshelper.py for new public header files.
patch from Tom Powers (https://groups.google.com/group/tesseract-dev/msg/6da2799cd2cb9844 )
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@702 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-08 21:15:13 +00:00
zdenop@gmail.com
2f1c112640
+Remove visibility from protected members of tesseract::TessBaseAPI class by applying TESS_LOCAL macro;
...
+Make PageIterator & ResultIterator classes visible by applying TESS_API macro;
+Fix api/Makefile.am & training/Makefile.am to allow Parallel Build Trees;
patch from Tom Powers (https://groups.google.com/group/tesseract-dev/msg/9d00579540e44055 )
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@701 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-07 22:04:46 +00:00
zdenop@gmail.com
1455bf5610
set tessedit_module_name for windows;
...
implement 'make install LANG="eng ara deu"';
more headers need to be installed: https://groups.google.com/group/tesseract-dev/msg/a4f7424377993b2e
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@700 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 22:41:43 +00:00
david.eger@gmail.com
c2e84c4606
Fix two issues with GetHOCRText():
...
+ make it not seg-fault if called without calling SetInputName().
+ make it not leak memory (thank you valgrind)
http://code.google.com/p/tesseract-ocr/issues/detail?id=463
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@699 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 21:18:16 +00:00
david.eger@gmail.com
75a9a8fae7
Address "RIL_PARA doesn't work" comment in issue 622.
...
http://code.google.com/p/tesseract-ocr/issues/detail?id=622
The core of the problem is that in PSM_SINGLE_BLOCK mode, Tesseract
doesn't run paragraph detection, so no paragraphs get generated. Here,
we make sure that even if run in a mode where no paragraphs get
generated, we treat each block as its own paragraph.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@696 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-06 20:02:57 +00:00
zdenop@gmail.com
8cc34e85f1
'make install' do not require language data; language data are installed by 'make install-langs'
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@695 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-05 00:11:38 +00:00
zdenop@gmail.com
765832d449
fixes issue 573 where boolean was being compared to float;
...
tesseract prints full version info when -v arg;
removes extra includes from tesseractmain.h;
removes extra DLLEXPORT & DLLIMPORT from hosts.h;
remove CCUTIL_IMPORTS & CCUTIL_EXPORTS from vs2008 *.vcproj;
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@694 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-04 22:27:16 +00:00
zdenop@gmail.com
5761bc5736
fix visibility build; + tprintf visible
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@693 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 21:13:42 +00:00
zdenop@gmail.com
97e19443a3
install only necessary headers, fix uninstall
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@692 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 13:22:51 +00:00
zdenop@gmail.com
3b326532cc
fix --enable-multiple-libraries; implement quite mode (issue 580)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@691 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-03 11:48:59 +00:00
zdenop@gmail.com
30a70142a0
visibility - autotools part (./configure --enable-visibility)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@690 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 23:51:33 +00:00
zdenop@gmail.com
a776e0be85
TP: visibility trial - code & windows build changes (without autotools changes)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@689 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:48:45 +00:00
zdenop@gmail.com
e216adab43
fix configure.ac; unify identifiers (WIN32 vs _WIN32)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@688 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-02 17:31:24 +00:00
zdenop@gmail.com
657722aeca
added missing changes for r686
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@687 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 23:19:35 +00:00
zdenop@gmail.com
49c4ce3183
fix for GRAPHICS_DISABLED build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
zdenop
06b2156a99
fixed makemoredists; add --enable-embedded to configure
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@685 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 12:40:04 +00:00
zdenop@gmail.com
df1cbdd7d3
fix for issue 463 (GetHOCRText segfaults unless SetInputName has been called first); removed declaration of GetLastInitLanguage
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@684 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-27 17:19:20 +00:00
zdenop@gmail.com
bf7ca288ac
fixed 635 (strngs.h has unnecessary include of genericvector.h)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@682 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-26 16:39:01 +00:00
zdenop@gmail.com
da121f013c
vs2008 and vs2010 replaced with Tom Powers solution
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@681 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-26 15:30:05 +00:00
zdenop@gmail.com
492f9119c2
check return code of API init (issue 593)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@680 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-26 14:48:35 +00:00
zdenop@gmail.com
132909a607
fix for issue 631: gettimeofday() on windows based on leptonica l_getCurrentTime()
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@679 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-21 21:38:45 +00:00
zdenop@gmail.com
95168ef064
fix missing ";" in VS2008 project files + fix VS2010
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@678 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-20 13:12:45 +00:00
zdenop@gmail.com
6ccab83bd6
fixing issue 628 (replacing __MSW32__ with _WIN32) and issue 614 (reverting "class DLLSYM STRING" to "class CCUTIL_API STRING")
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@677 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-19 21:48:45 +00:00
zdenop@gmail.com
61611c1990
removed unnecessary conditional
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@676 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-18 09:18:06 +00:00
david.eger@gmail.com
018f192fc2
Abolish populate_unichars(), fixing seg fault reported in Debian:
...
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-15 01:37:00 +00:00
zdenop@gmail.com
53d133d83a
fixed cntraning thanks to Wil Hadden; fixed installation of new manpages
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@674 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-12 16:03:05 +00:00
zdenop@gmail.com
3c4fd30bb5
Fix is isinf for VC++
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@673 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-12 14:51:28 +00:00
david.eger@gmail.com
22331c03ec
Fix issue 613: assert() fail on Windows isspace() when given non-ASCII.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@671 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-10 01:44:36 +00:00
david.eger@gmail.com
58e06c8c45
Update man pages for Tesseract 3.02.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@670 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-09 22:55:47 +00:00
david.eger@gmail.com
78a8356a76
Put one last bigram correction debug statement behind a debug flag.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@669 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-09 20:08:17 +00:00
zdenop@gmail.com
1355cabe7e
VS2008 - fix include path for release*
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@668 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-07 13:52:33 +00:00
zdenop@gmail.com
425c2b8205
install data files; small fix of INSTALL, README; removed ABOUT-NLS (NLS not used at the moment)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@667 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-05 16:25:40 +00:00
zdenop@gmail.com
0a50c9ca5c
Another VS2008 fixes
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@666 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-04 22:06:40 +00:00
zdenop@gmail.com
d0c2631ec8
VC++2008 build fix for 3.02 version
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@665 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-03 22:23:12 +00:00
david.eger@gmail.com
56bc885721
Fix some debug messaging about bigram correction -- the two lists of
...
alternates are not independent.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@664 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-03 19:43:25 +00:00
theraysmith@gmail.com
09e41d32c2
Renamed RGB to ComposeRGB to fix windows macro problem
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@663 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-03 16:52:25 +00:00
theraysmith@gmail.com
d581ab7e12
New config for testing bigram correction.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@661 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 18:46:19 +00:00
david.eger@gmail.com
ad53f34e7c
Added a missing header file for the 3.02 release.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@659 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 18:01:17 +00:00
theraysmith@gmail.com
e0d735b122
Remaining misc changes for 3.02
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@658 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:14:43 +00:00