Ray Smith
da03e4e910
Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion
2017-07-14 09:30:14 -07:00
Raf Schietekat
3983d2f76a
Reviewed uses of reinterpret_cast
2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce
Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518
2017-05-03 16:09:44 -07:00
Stefan Weil
f8fba59804
Replace alloc_struct, free_struct
...
Both functions simply call malloc, free.
Remove also unneeded null pointer checks and use calloc where possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 09:25:04 +02:00
Ray Smith
1cc511188d
Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here.
2017-04-27 15:48:23 -07:00
Stefan Weil
f6eed019ab
Fix typo in comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-02-05 12:51:09 +01:00
Ray Smith
185a264f52
Fixed the memory leak/double free cleanly
2016-11-28 09:39:17 -08:00
Ray Smith
51368c8eb4
Fixed failed merge of memory leak
2016-11-22 10:41:43 -08:00
Ray Smith
9c7e99b041
Merged with commit 4ca6ba985b
2016-11-21 08:27:02 -08:00
Ray Smith
2c837dffc3
Result of clang tidy on recent merge
2016-11-07 10:46:33 -08:00
Stefan Weil
1e60a8d71c
Fix crash caused by undefined value of local variable
...
Commit b1f03cb697
added a call of function
FreeFeatureSet to fix a memory leak, but introduced a new bug because the
local variable FloatFeatures was not always assigned a value.
Now FloatFeatures is always assigned a value, and we only need a single
place where FreeFeatureSet is called.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-07 12:31:20 +01:00
Stefan Weil
b1f03cb697
classify/adaptmatch: Fix memory leak
...
Coverity report:
CID 1340280 (#1 of 1): Resource leak (RESOURCE_LEAK)
7. leaked_storage: Variable FloatFeatures going out of scope leaks the storage it points to.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 16:12:31 +02:00
Stefan Weil
963b935e80
classify/adaptmatch: Fix memory leak
...
Coverity report:
CID 1164738 (#1 of 1): Resource leak (RESOURCE_LEAK)
7. leaked_storage: Variable sample going out of scope leaks the storage it points to.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 13:42:18 +02:00
Stefan Weil
55fde61a8f
classify: Fix typos in comments and strings
...
All of them were found by codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 22:12:06 +02:00
Jim O'Regan
524a61452d
Doxygen
...
Squashed commit from https://github.com/tesseract-ocr/tesseract/tree/more-doxygen
closes #14
Commits:
6317305
doxygen
9f42f69
doxygen
0fc4d52
doxygen
37b4b55
fix typo
bded8f1
some more doxy
020eb00
slight tweak
524666d
doxygenify
2a36a3e
doxygenify
229d218
doxygenify
7fd28ae
doxygenify
a8c64bc
doxygenify
f5d21b6
fix
5d8ede8
doxygenify
a58a4e0
language_model.cpp
fa85709
lm_pain_points.cpp lm_state.cpp
6418da3
merge
06190ba
Merge branch 'old_doxygen_merge' into more-doxygen
84acf08
Merge branch 'master' into more-doxygen
50fe1ff
pagewalk.cpp cube_reco_context.cpp
2982583
change to relative
192a24a
applybox.cpp, take one
8eeb053
delete docs for obsolete params
52e4c77
modernise classify/ocrfeatures.cpp
2a1cba6
modernise cutil/emalloc.cpp
773e006
silence doxygen warning
aeb1731
silence doxygen warning
f18387f
silence doxygen; new params are unused?
15ad6bd
doxygenify cutil/efio.cpp
c8b5dad
doxygenify cutil/danerror.cpp
784450f
the globals and exceptions parts are obsolete; remove
8bca324
doxygen classify/normfeat.cpp
9bcbe16
doxygen classify/normmatch.cpp
aa9a971
doxygen ccmain/cube_control.cpp
c083ff2
doxygen ccmain/cube_reco_context.cpp
f842850
params changed
5c94f12
doxygen ccmain/cubeclassifier.cpp
15ba750
case sensitive
f5c71d4
case sensitive
f85655b
doxygen classify/intproto.cpp
4bbc7aa
partial doxygen classify/mfx.cpp
dbb6041
partial doxygen classify/intproto.cpp
2aa72db
finish doxygen classify/intproto.cpp
0b8de99
doxygen training/mftraining.cpp
0b5b35c
partial doxygen ccstruct/coutln.cpp
b81c766
partial doxygen ccstruct/coutln.cpp
40fc415
finished? doxygen ccstruct/coutln.cpp
6e4165c
doxygen classify/clusttool.cpp
0267dec
doxygen classify/cutoffs.cpp
7f0c70c
doxygen classify/fpoint.cpp
512f3bd
ignore ~ files
5668a52
doxygen classify/intmatcher.cpp
84788d4
doxygen classify/kdtree.cpp
29f36ca
doxygen classify/mfoutline.cpp
40b94b1
silence doxygen warnings
6c511b9
doxygen classify/mfx.cpp
f9b4080
doxygen classify/outfeat.cpp
aa1df05
doxygen classify/picofeat.cpp
cc5f466
doxygen training/cntraining.cpp
cce044f
doxygen training/commontraining.cpp
167e216
missing param
9498383
renamed params
37eeac2
renamed param
d87b5dd
case
c8ee174
renamed params
b858db8
typo
4c2a838
h2 context?
81a2c0c
fix some param names; add some missing params, no docs
bcf8a4c
add some missing params, no docs
af77f86
add some missing params, no docs; fix some param names
01df24e
fix some params
6161056
fix some params
68508b6
fix some params
285aeb6
doxygen complains here no matter what
529bcfa
rm some missing params, typos
cd21226
rm some missing params, add some new ones
48a4bc2
fix params
c844628
missing param
312ce37
missing param; rename one
ec2fdec
missing param
05e15e0
missing params
d515858
change "<" to < to make doxygen happy
b476a28
wrong place
2015-07-20 18:48:00 +01:00
Zdenko Podobný
cdc84a5dd7
fix VS2010 build
2015-07-11 07:38:57 +02:00
Ray Smith
b1d99dfe23
Added a backup adaptive classifier to take over from primary when it fills on a large document
2015-06-12 11:10:53 -07:00
Zdenko Podobný
1d6de86150
fix VS2010 linking error
2015-05-14 14:24:55 +02:00
Ray Smith
5bb0d89291
Improved debug of class pruner
2015-05-13 17:07:11 -07:00
Ray Smith
03f3c9dc88
Misc fixes missed from previous commits
2015-05-12 18:13:15 -07:00
Ray Smith
84920b92b3
Font and classifier output structure cleanup.
...
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.
Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
53fc4456cc
Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify.
...
Eliminated the flexfx scheme for calling global feature extractor functions
through an array of function pointers.
Deleted dead code I found as a by-product.
This CL does not change BlobToTrainingSample or ExtractFeatures to be full
members of Classify (the eventual goal) as that would make it even bigger,
since there are a lot of callers to these functions.
When ExtractFeatures and BlobToTrainingSample are members of Classify they
will be able to access control parameters in Classify, which will greatly
simplify developing variations to the feature extraction process.
2015-05-12 15:22:34 -07:00
Ray Smith
25d0968d09
Major refactor to improve speed on difficut images, especially when running
...
a heap checker.
SEAM and SPLIT have been begging for a refactor for a *LONG* time.
This change does most of the work of turning them into proper classes:
Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions.
Made the splits full data members of SEAM in an array instead of 3 separate pointers.
This greatly reduces the amount of new/delete happening in the chopper, which is the main goal.
Deleted redundant files: olutil.*, makechop.*
Brought other code into SEAM in order to keep its data members private with only priority having accessors.
2015-05-12 14:59:14 -07:00
theraysmith@gmail.com
7f5e5264d3
Fixed issues 1093-1097
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1048 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 23:36:24 +00:00
theraysmith@gmail.com
2fcea93846
Fixed issues 1081-1090
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
1a487252f4
Fixed slow-down that was caused by upping MAX_NUM_CLASSES
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1013 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:12:35 +00:00
zdenop
9cf08ca8d3
fix build with -DGRAPHICS_DISABLED
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@981 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-11 23:08:54 +00:00
zdenop
9041990be5
fix issue 1036
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@930 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-22 21:40:23 +00:00
zdenop
38b25b5777
fix issue 1018, 1031
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@918 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-12-06 22:07:46 +00:00
theraysmith@gmail.com
7ec4fd7a56
Refactorerd control functions to enable parallel blob classification
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
theraysmith@gmail.com
99edf4ccbd
Refactored classifier to make it easier to add new ones and generalized feature extractor to allow fx from grey
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@873 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:15:06 +00:00
theraysmith@gmail.com
da1047f020
Fixed typos and improved comments
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@753 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-09-21 15:31:20 +00:00
zdenop@gmail.com
5958f01f5f
fix doxygen warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@715 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-30 15:42:06 +00:00
zdenop@gmail.com
49c4ce3183
fix for GRAPHICS_DISABLED build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@686 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-03-01 22:43:51 +00:00
theraysmith@gmail.com
5bc5e2a0b4
Added simultaneous multi-language capability, Added support for ShapeTable in classifier and training, Refactored class pruner, Added new uniform classifier API, Added new training error counter
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@650 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:57:42 +00:00
theraysmith
c86a0f6892
Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@570 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:45:36 +00:00
theraysmith
eba04e7c5b
Fixed debug display, training on fragments
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@533 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:00:17 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
joregan
f2506871f9
move include of config_auto.h to not conflict with local types. Not finished
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 15:53:40 +00:00
joregan
08defee46e
more doxygen
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@450 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-08-10 19:20:11 +00:00
joregan
a18816f839
partial merge of doxygen branch (stuff without conflicts, basically)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@441 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-07-27 13:23:23 +00:00
joregan
5c8ad7ee72
add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-28 12:03:45 +00:00
theraysmith
694d3f2c20
Changes to classify for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@291 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:17:36 +00:00
theraysmith
bea5e04b76
Fixed compilation with GRAPHICS_DISABLED
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-03 17:24:08 +00:00
theraysmith
2186613963
Fixed excessive stack use
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@241 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-06-02 23:34:20 +00:00
theraysmith
ff3d550c05
Removed obfuscatory 'access' macros: see issue#76
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@219 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-03-10 19:03:06 +00:00
theraysmith
a64f8d02dd
Fixed problem with preadapted templates
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@181 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-08-14 22:51:39 +00:00
theraysmith
9cd87f0ec5
Fixed name collision with jpeg library
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@155 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-04-22 00:23:04 +00:00
theraysmith
6b5e0c4046
Made some major classifier and clustering improvements
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@130 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2008-02-01 00:07:59 +00:00
theraysmith
2f19f0c269
A minor accuracy improvement on punctuation
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@109 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2007-08-30 18:23:00 +00:00