Ray Smith
da03e4e910
Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion
2017-07-14 09:30:14 -07:00
Raf Schietekat
3983d2f76a
Reviewed uses of reinterpret_cast
2017-05-11 01:58:40 +02:00
Stefan Weil
4897796d57
Replace reserved identifiers used in #define guards header files
...
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard ).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Ray Smith
c1c1e426b3
Added new LSTM-based neural network line recognizer
2016-11-07 15:38:07 -08:00
Stefan Weil
64f9190575
textord: Fix typos in comments and strings
...
All of them were found by codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
Stefan Weil
38f3db8ca5
Fix more typos in comments (found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
64c739c8af
Added sparse text mode, also fixed issue 653.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
theraysmith@gmail.com
59d244b06e
More fixes for GRAPHICS_DISABLED from Zdenko and Ray
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
theraysmith@gmail.com
6e3d810c1d
Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@648 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:53:04 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
08defee46e
more doxygen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@450 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-10 19:20:11 +00:00
theraysmith
0a9ad20d1c
Changes to textord for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@301 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:39:56 +00:00