Commit Graph

110 Commits

Author SHA1 Message Date
Ray Smith
03f3c9dc88 Misc fixes missed from previous commits 2015-05-12 18:13:15 -07:00
Ray Smith
2924d3ae15 Changes missed from diacritic fix edit 2015-05-12 17:28:56 -07:00
Ray Smith
84920b92b3 Font and classifier output structure cleanup.
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.

Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
0e868ef377 Major change to improve layout analysis for heavily diacritic languages:
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
b6d0184806 Fixed problems with shifted baselines so recognition can recover from layout analysis errors. 2015-05-12 15:53:45 -07:00
Ray Smith
4a3caefd92 Add ability to build under android (without cube or scrollview). 2015-05-12 15:41:15 -07:00
Ray Smith
25d0968d09 Major refactor to improve speed on difficut images, especially when running
a heap checker.
SEAM and SPLIT have been begging for a refactor for a *LONG* time.
This change does most of the work of turning them into proper classes:
  Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions.
  Made the splits full data members of SEAM in an array instead of 3 separate pointers.
    This greatly reduces the amount of new/delete happening in the chopper, which is the main goal.
  Deleted redundant files: olutil.*,  makechop.*
  Brought other code into SEAM in order to keep its data members private with only priority having accessors.
2015-05-12 14:59:14 -07:00
Ray Smith
2f197cd653 Fixed issues 899/1220/1246 (mixed eng+ara) 2014-09-17 18:27:49 -07:00
Ray Smith
736d327473 NOP changes from static analysis in issue 1205 2014-08-12 16:09:12 -07:00
theraysmith@gmail.com
dbf6197471 Major refactor of control.cpp to enable line recognition
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
d52231cff3 Started TFile conversion to remove fmemopen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1138 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:08:46 +00:00
zdenop
c3b6ac7f32 skip imagedata build to fix issue 1150 on Mac OS X
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1096 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-07 21:04:42 +00:00
theraysmith@gmail.com
0dc7926f24 Fixed issue 1122
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1077 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 01:02:45 +00:00
theraysmith@gmail.com
a9f483cffc Applied patch to fix issue 1098
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1066 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 23:28:01 +00:00
theraysmith@gmail.com
3a5f699013 Applied patch to refix issue 331
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1064 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 23:12:53 +00:00
theraysmith@gmail.com
fec775400d Added ImageData class
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1061 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 22:53:16 +00:00
theraysmith@gmail.com
8364f24f4b Added ability for box files to store spaces and newlines
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1060 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-23 22:52:05 +00:00
theraysmith@gmail.com
7f5e5264d3 Fixed issues 1093-1097
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1048 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 23:36:24 +00:00
theraysmith@gmail.com
2fcea93846 Fixed issues 1081-1090
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
0d93bb7cfa More code cleanup from patches and fixing warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1011 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:09:59 +00:00
zdenop@gmail.com
71ae509354 fix for mingw32/g++ 4.8.1
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@998 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-22 08:10:15 +00:00
theraysmith@gmail.com
5857bebdc8 Minor formatting changes
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@992 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-17 18:54:16 +00:00
zdenop
3d1e1cc23d fix opencl build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@986 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-13 22:41:52 +00:00
zdenop
aeba7a7ace amend r:983
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@985 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 21:38:11 +00:00
zdenop
a6d23c63c5 remove empty file
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@984 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:55:00 +00:00
zdenop@gmail.com
94d08567e1 fix vs2010 (and maybe vs2008) build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@983 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:13:55 +00:00
zdenop
9cf08ca8d3 fix build with -DGRAPHICS_DISABLED
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@981 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-11 23:08:54 +00:00
theraysmith@gmail.com
91d2265429 More minor fixes from issues and cleanup
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@974 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 01:38:00 +00:00
theraysmith@gmail.com
69dac05e1c Removed dependence on IMAGE class
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@943 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:30:23 +00:00
rajesh.katikam@gmail.com
b8d7a1d139 Fixed all the crashes observed on 24 bit and 8 bit images.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@919 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-10 10:52:54 +00:00
zdenop
38b25b5777 fix issue 1018, 1031
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@918 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-06 22:07:46 +00:00
rajesh.katikam@gmail.com
bf0a83907b Cleaned up configure.ac and Makefile.am in multiple folder to use OPENCL paths
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@910 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-12 10:40:40 +00:00
rajesh.katikam@gmail.com
983aaabaae Initial version of OpenCL support added.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@909 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-11 17:43:13 +00:00
theraysmith@gmail.com
7ec4fd7a56 Refactorerd control functions to enable parallel blob classification
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
zdenop@gmail.com
73df602707 fix VC++ build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@898 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-31 13:02:58 +00:00
theraysmith@gmail.com
4c3475ad2e Fixed fmemopen portability problem
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-10 02:07:26 +00:00
zdenop@gmail.com
af319b4d90 fix for windows build - part 1
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@883 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-25 09:56:49 +00:00
theraysmith@gmail.com
4d514d5a60 Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
theraysmith@gmail.com
ec026cadfe Generalized feature extractor to allow fx from greyscale
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@876 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:21:37 +00:00
theraysmith@gmail.com
dfc1a92628 Refactored classifier to make it easier to add new ones
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@874 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:16:01 +00:00
theraysmith@gmail.com
42144b9698 Improved baseline fit
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@870 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-20 19:43:47 +00:00
zdenop@gmail.com
10c1169d98 remove unused code (Windows related)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@860 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-07-08 18:21:10 +00:00
zdenop@gmail.com
7e14ade10d print error/warning messages to stderr/debug file instead of stdout (fix issue 911)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@843 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-05-16 20:31:37 +00:00
theraysmith@gmail.com
64c739c8af Added sparse text mode, also fixed issue 653.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-01-03 19:06:41 +00:00
theraysmith@gmail.com
59d244b06e More fixes for GRAPHICS_DISABLED from Zdenko and Ray
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@757 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-22 00:59:31 +00:00
theraysmith@gmail.com
da1047f020 Fixed typos and improved comments
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@753 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:31:20 +00:00
theraysmith@gmail.com
f23460bec4 Removed config_auto.h from .h files
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@748 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:26:10 +00:00
zdenop@gmail.com
cd8de9157c change comments to doxygen block comments (api)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@716 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 21:24:12 +00:00
zdenop@gmail.com
5958f01f5f fix doxygen warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@715 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 15:42:06 +00:00