Ray Smith
03f3c9dc88
Misc fixes missed from previous commits
2015-05-12 18:13:15 -07:00
Ray Smith
2924d3ae15
Changes missed from diacritic fix edit
2015-05-12 17:28:56 -07:00
Ray Smith
84920b92b3
Font and classifier output structure cleanup.
...
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.
Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
b6d0184806
Fixed problems with shifted baselines so recognition can recover from layout analysis errors.
2015-05-12 15:53:45 -07:00
Ray Smith
4a3caefd92
Add ability to build under android (without cube or scrollview).
2015-05-12 15:41:15 -07:00
Ray Smith
25d0968d09
Major refactor to improve speed on difficut images, especially when running
...
a heap checker.
SEAM and SPLIT have been begging for a refactor for a *LONG* time.
This change does most of the work of turning them into proper classes:
Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions.
Made the splits full data members of SEAM in an array instead of 3 separate pointers.
This greatly reduces the amount of new/delete happening in the chopper, which is the main goal.
Deleted redundant files: olutil.*, makechop.*
Brought other code into SEAM in order to keep its data members private with only priority having accessors.
2015-05-12 14:59:14 -07:00
Ray Smith
2f197cd653
Fixed issues 899/1220/1246 (mixed eng+ara)
2014-09-17 18:27:49 -07:00
Ray Smith
736d327473
NOP changes from static analysis in issue 1205
2014-08-12 16:09:12 -07:00
theraysmith@gmail.com
dbf6197471
Major refactor of control.cpp to enable line recognition
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
d52231cff3
Started TFile conversion to remove fmemopen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1138 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:08:46 +00:00
zdenop
c3b6ac7f32
skip imagedata build to fix issue 1150 on Mac OS X
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1096 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-07 21:04:42 +00:00
theraysmith@gmail.com
0dc7926f24
Fixed issue 1122
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1077 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 01:02:45 +00:00
theraysmith@gmail.com
a9f483cffc
Applied patch to fix issue 1098
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1066 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 23:28:01 +00:00
theraysmith@gmail.com
3a5f699013
Applied patch to refix issue 331
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1064 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 23:12:53 +00:00
theraysmith@gmail.com
fec775400d
Added ImageData class
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1061 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 22:53:16 +00:00
theraysmith@gmail.com
8364f24f4b
Added ability for box files to store spaces and newlines
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1060 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 22:52:05 +00:00
theraysmith@gmail.com
7f5e5264d3
Fixed issues 1093-1097
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1048 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 23:36:24 +00:00
theraysmith@gmail.com
2fcea93846
Fixed issues 1081-1090
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
d11dc049e3
Fixed a lot of compiler/clang warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
0d93bb7cfa
More code cleanup from patches and fixing warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1011 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:09:59 +00:00
zdenop@gmail.com
71ae509354
fix for mingw32/g++ 4.8.1
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@998 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-22 08:10:15 +00:00
theraysmith@gmail.com
5857bebdc8
Minor formatting changes
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@992 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-17 18:54:16 +00:00
zdenop
3d1e1cc23d
fix opencl build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@986 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-13 22:41:52 +00:00
zdenop
aeba7a7ace
amend r:983
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@985 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 21:38:11 +00:00
zdenop
a6d23c63c5
remove empty file
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@984 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:55:00 +00:00
zdenop@gmail.com
94d08567e1
fix vs2010 (and maybe vs2008) build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@983 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:13:55 +00:00
zdenop
9cf08ca8d3
fix build with -DGRAPHICS_DISABLED
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@981 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-11 23:08:54 +00:00
theraysmith@gmail.com
91d2265429
More minor fixes from issues and cleanup
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@974 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 01:38:00 +00:00
theraysmith@gmail.com
69dac05e1c
Removed dependence on IMAGE class
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@943 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:30:23 +00:00
rajesh.katikam@gmail.com
b8d7a1d139
Fixed all the crashes observed on 24 bit and 8 bit images.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@919 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-10 10:52:54 +00:00
zdenop
38b25b5777
fix issue 1018, 1031
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@918 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-06 22:07:46 +00:00
rajesh.katikam@gmail.com
bf0a83907b
Cleaned up configure.ac and Makefile.am in multiple folder to use OPENCL paths
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@910 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-12 10:40:40 +00:00
rajesh.katikam@gmail.com
983aaabaae
Initial version of OpenCL support added.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@909 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-11 17:43:13 +00:00
theraysmith@gmail.com
7ec4fd7a56
Refactorerd control functions to enable parallel blob classification
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
zdenop@gmail.com
73df602707
fix VC++ build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@898 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-31 13:02:58 +00:00
theraysmith@gmail.com
4c3475ad2e
Fixed fmemopen portability problem
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-10 02:07:26 +00:00
zdenop@gmail.com
af319b4d90
fix for windows build - part 1
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@883 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-25 09:56:49 +00:00
theraysmith@gmail.com
4d514d5a60
Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
theraysmith@gmail.com
ec026cadfe
Generalized feature extractor to allow fx from greyscale
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@876 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:21:37 +00:00
theraysmith@gmail.com
dfc1a92628
Refactored classifier to make it easier to add new ones
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@874 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:16:01 +00:00
theraysmith@gmail.com
42144b9698
Improved baseline fit
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@870 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-20 19:43:47 +00:00
zdenop@gmail.com
10c1169d98
remove unused code (Windows related)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@860 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-08 18:21:10 +00:00
zdenop@gmail.com
7e14ade10d
print error/warning messages to stderr/debug file instead of stdout (fix issue 911)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@843 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-05-16 20:31:37 +00:00
theraysmith@gmail.com
64c739c8af
Added sparse text mode, also fixed issue 653.
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
theraysmith@gmail.com
59d244b06e
More fixes for GRAPHICS_DISABLED from Zdenko and Ray
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
theraysmith@gmail.com
da1047f020
Fixed typos and improved comments
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@753 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:31:20 +00:00
theraysmith@gmail.com
f23460bec4
Removed config_auto.h from .h files
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@748 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:26:10 +00:00
zdenop@gmail.com
cd8de9157c
change comments to doxygen block comments (api)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
5958f01f5f
fix doxygen warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@715 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 15:42:06 +00:00