Commit Graph

37 Commits

Author SHA1 Message Date
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Stefan Weil
0ba202f6ed Remove unneeded null pointer check
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-16 22:58:10 +02:00
Stefan Weil
46ca83071e genericvector: Add overloaded LoadDataFromFile
Several code locations call that method with a normal C string,
so overload it to accept that without a conversion to a STRING
object. This saves unneeded new / memcpy / delete operations.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-16 22:57:46 +02:00
Stefan Weil
bb2348bbbe genericvector: Fix and optimize function LoadDataFromFile
It's not necessary to initialize the vector with 0,
because the initial values are read from file.

Fix also an assertion when trying to read an empty file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:15:54 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
46c887b77e genericvector: Fix minimum size
Commit 907de5995f tried to improve
GenericVector, but missed a case where vectors with less than
kDefaultVectorSize were allocated. This resulted in additional
alloc / free operations.

Commit a28b2a033d (before memory optimization)
oem 0: total heap usage: 739,238 allocs, 739,237 frees, 161,699,214 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated

Commit fd3f8f9b2d without genericvector change
oem 0: total heap usage: 738,980 allocs, 738,979 frees, 161,697,150 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated
=> Improvements for oem 0, no change for oem 1 and oem 2.

Commit fd3f8f9b2d
oem 0: total heap usage: 772,648 allocs, 772,647 frees, 160,083,901 bytes allocated
oem 1: total heap usage: 748,591 allocs, 748,584 frees, 143,581,672 bytes allocated
oem 2: total heap usage: 764,796 allocs, 764,789 frees, 181,212,197 bytes allocated
=> Less bytes allocated, but more allocs / frees = bad for performance.

Commit fd3f8f9b2d with this patch
oem 0: total heap usage: 677,537 allocs, 677,536 frees, 160,444,634 bytes allocated
oem 1: total heap usage: 653,812 allocs, 653,805 frees, 143,423,008 bytes allocated
oem 2: total heap usage: 670,029 allocs, 670,022 frees, 181,517,760 bytes allocated
=> Improvements for all three cases.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-03 09:49:23 +02:00
zdenop
fd3f8f9b2d Merge pull request #352 from pnordhus/reduce_mallocs
Avoid unnecessary memory allocations
2017-04-30 17:39:31 +02:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Philipp Nordhus
907de5995f Do not allocate in GenericVector default ctor 2016-06-17 22:38:41 +02:00
Stefan Weil
4a92ff5862 Fix compiler warnings for copy constructors
gcc reports these warnings with -Wextra:

ccstruct/pageres.h:330:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.cpp:115:1: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.h:291:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccutil/genericvector.h:435:3: warning:
 base class 'class GenericVector<WERD_RES*>' should be explicitly initialized
 in the copy constructor [-Wextra]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 09:19:37 +01:00
Stefan Weil
38f3db8ca5 Fix more typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-04 21:58:42 +01:00
zdenop
b882590491 Merge pull request #65 from ws233/master
Type mismatch on 64bit platforms
2015-10-28 20:02:20 +01:00
Stefan Weil
539b7fbbab ccutil: Fix typos in comments and strings
Most of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 22:09:18 +02:00
ws233
0146185c04 Type mismatch on a 64bit platforms has been fixed. 2015-07-26 11:37:15 +03:00
Ray Smith
84920b92b3 Font and classifier output structure cleanup.
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.

Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
theraysmith@gmail.com
c86fe22a62 Started TFile conversion to remove fmemopen
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1139 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:09:25 +00:00
theraysmith@gmail.com
7f5e5264d3 Fixed issues 1093-1097
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1048 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 23:36:24 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
zdenop@gmail.com
adfac4144b amend r995
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@996 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-18 09:04:35 +00:00
zdenop@gmail.com
ef3b1d936e fix mingw build issues
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@995 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-18 09:00:54 +00:00
zdenop
26f8f58042 fix android issues
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@990 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-15 22:47:37 +00:00
zdenop@gmail.com
244731fd51 revert dll-interface for class 'GenericVector<T>'
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@988 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-14 09:25:45 +00:00
zdenop@gmail.com
94d08567e1 fix vs2010 (and maybe vs2008) build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@983 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:13:55 +00:00
theraysmith@gmail.com
fdb1669cda Fixed srand cast
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@892 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-11 04:57:54 +00:00
theraysmith@gmail.com
4c3475ad2e Fixed fmemopen portability problem
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-10-10 02:07:26 +00:00
zdenop@gmail.com
af319b4d90 fix for windows build - part 1
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@883 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-25 09:56:49 +00:00
theraysmith@gmail.com
4d514d5a60 Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-23 15:26:50 +00:00
theraysmith@gmail.com
e0d735b122 Remaining misc changes for 3.02
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@658 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 03:14:43 +00:00
max.markin@gmail.com
7c4461316a fixed comment
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@626 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-09-18 05:12:37 +00:00
theraysmith
ba9f73f04b Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@569 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:45:12 +00:00
zdenop@gmail.com
4523ce9f7d 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
theraysmith
8d654e7476 Fixed issue 243, ungraded helpers, genericvector
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@340 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-05-19 22:35:35 +00:00
theraysmith
d8b1456dd5 Changes to ccutil for 3.00
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@305 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:50:24 +00:00