Ray Smith
da03e4e910
Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion
2017-07-14 09:30:14 -07:00
Stefan Weil
ef1d9600b1
Use standard macros for format strings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Stefan Weil
85e37798cb
Simplify delete operations
...
It is not necessary to check for null pointers.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Ray Smith
a303ab9d00
Misc fixes, mostly clang formatting, but some bug fixes in matrix, werd, and tesstrain_utils. Also updates unicharset to match traineddata files.
2015-07-09 14:28:20 -07:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
theraysmith@gmail.com
dbf6197471
Major refactor of control.cpp to enable line recognition
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
2fcea93846
Fixed issues 1081-1090
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
d2ae81d99b
Removed dependence on IMAGE class
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@955 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:45:57 +00:00
theraysmith@gmail.com
7ec4fd7a56
Refactorerd control functions to enable parallel blob classification
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
zdenop@gmail.com
77c1b41e4e
fix svn:executable atribute, trailing spaces, version include
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@903 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-03 17:24:00 +00:00
theraysmith@gmail.com
b0fb616299
Generalized feature extractor to allow fx from greyscale
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@875 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:19:50 +00:00
zdenop@gmail.com
49c4ce3183
fix for GRAPHICS_DISABLED build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
theraysmith@gmail.com
3a998fe7ac
Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:59:49 +00:00
theraysmith@gmail.com
4575c52ff5
Removed debugwin.cpp, fixing issue 448
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@613 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-18 16:45:59 +00:00
theraysmith
3e8c0bc228
Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:44:05 +00:00
theraysmith
7121e51422
Deleted lots of dead code, including PBLOB
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@556 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-18 21:52:08 +00:00
theraysmith
137f4806b6
Added sub/superscript, small/dropcap detection
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-12-09 01:32:20 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
f2506871f9
move include of config_auto.h to not conflict with local types. Not finished
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
924f231808
more doxygen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@442 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 14:58:33 +00:00
joregan
7e8bd73aea
some casts to get rid of persistent warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@435 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-21 21:19:53 +00:00
joregan
5c8ad7ee72
add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-28 12:03:45 +00:00
theraysmith
109d1c8f21
Some changes in ccmain for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@286 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:03:51 +00:00
theraysmith
bea5e04b76
Fixed compilation with GRAPHICS_DISABLED
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 17:24:08 +00:00
theraysmith
7870d67c21
Fixed name collision with jpeg library
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@157 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:32:14 +00:00
theraysmith
10265fb9cc
Updated graphics output for new java-based display
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@136 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:33:18 +00:00