Ray Smith
da03e4e910
Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion
2017-07-14 09:30:14 -07:00
Raf Schietekat
190584fec7
RAII: PB_LINE_IT::get_line(): was leaked inside POLY_BLOCK::fill()
2017-05-11 02:02:37 +02:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
4d514d5a60
Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
zdenop@gmail.com
10c1169d98
remove unused code (Windows related)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@860 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-08 18:21:10 +00:00
theraysmith@gmail.com
9206e92b0d
Added simultaneous multi-language capability, Refactored top-level word recognition module, Blamer module added for error analysis, Tidied up constraints on control parameters, Added UNICHARSET to WERD_CHOICE to make mult-language handling easier, Added word bigram correction
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@655 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:06:39 +00:00
theraysmith
47dc322437
Removed serialise and NEWDELETE macro
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@531 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 00:56:39 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
575b2de48a
doxygen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@446 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-28 00:38:09 +00:00
theraysmith
903a4ffe9d
Changes to ccstruct for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@289 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:14:57 +00:00
theraysmith
5c964ea6da
More harmless improvements from 3.00 in 2.04
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@217 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-12-30 21:31:01 +00:00
theraysmith
c4f4840fbe
Fixed name collision with jpeg library
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@163 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:41:37 +00:00
tmbdev
425d593ebe
top-skimming import from sf.net
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-03-07 20:03:40 +00:00