Commit Graph

122 Commits

Author SHA1 Message Date
Stefan Weil
68450d8680 Fix CID 1164709 (Copy into fixed size buffer)
Using STRING removes the limitation for the filename length.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-23 08:58:07 +02:00
Stefan Weil
16e00b59fe dict: Replace NULL by nullptr
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-22 17:42:35 +02:00
Stefan Weil
bf0bddb9ca dict: Remove deprecated parameters (#1421)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:32:35 +02:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Stefan Weil
6ac5d0ba8f dawg: Fix typos and file information in file header
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-30 21:48:06 +02:00
Stefan Weil
b016c48d06 Add missing spaces in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-23 19:12:41 +02:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Raf Schietekat
9a5ed19cf6 Issue #529: cleanup 2017-05-13 18:01:45 +02:00
Stefan Weil
121a7c6489 ccstruct: Fix non portable and wrong format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
300841f9a7 Replace memalloc / memfree by C++ new / delete
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 17:26:23 +02:00
Stefan Weil
445befd3cb Remove unused include statements for freelist.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 17:12:43 +02:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Egor Pugin
442b5b731a Fix building of training tools in shared configuration. 2016-12-17 16:19:35 +03:00
zdenop
3c0c54f059 Merge pull request #353 from pnordhus/remove_dawgpositionvector_dtor
Remove redundant destructor
2016-12-08 13:04:58 +01:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
cefc420ddb Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
zdenop
690616279c Merge pull request #514 from stweil/free
Simplify new operations
2016-12-01 20:36:24 +01:00
Ray Smith
53003f9074 Formatting changes from clang_tidy on latest pull 2016-11-30 15:44:25 -08:00
Stefan Weil
78d91701bd Simplify new operations
It is not necessary to check for null pointers after new.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 20:24:38 +01:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
zdenop
ac3b40de2f Merge pull request #478 from stweil/w
Fix some compiler warnings
2016-11-22 08:30:57 +01:00
Stefan Weil
7e90200d26 Fix some compiler warnings (unused function parameters)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-13 17:20:06 +01:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
6fad5fc0a9 dict/dict: Fix memory leaks at program termination
Avoid dynamic memory allocation for the static variable 'cache'.
Now the destructor for that variable is called automatically
when Tesseract terminates and releases all associated memory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 17:25:55 +02:00
zdenop
a1a4575f4b Merge pull request #6 from jimregan/gcode_issue1316
Issue 1316: The traineddata file must be closed after it was opened
2016-08-29 17:31:48 +02:00
Philipp Nordhus
c05ff3456e Remove duplicate destructor
Destructor of base class GenericVector calls base class clear()
method, deallocating the memory.
2016-06-17 23:20:03 +02:00
Stefan Weil
e6c0d263db Add missing argument for tprintf
The format string expects an int arguments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-17 09:30:25 +01:00
Stefan Weil
edf765b952 Remove unneeded const qualifiers
This fixes compiler warnings like this one:

api/baseapi.h:739:32: warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:36:42 +01:00
Stefan Weil
97d47a406d dict: Fix typos in comments and strings
All of them were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 22:16:42 +02:00
Zdenko Podobný
66a76a9477 Revert "temporary add config/*, configure and Makefile.in for release"
This reverts commits ec9581d8f2, 1afe382c4e, 4b2cfabcc1
2015-07-31 21:44:43 +02:00
Jim O'Regan
524a61452d Doxygen
Squashed commit from https://github.com/tesseract-ocr/tesseract/tree/more-doxygen
closes #14

Commits:
6317305  doxygen
9f42f69  doxygen
0fc4d52  doxygen
37b4b55  fix typo
bded8f1  some more doxy
020eb00  slight tweak
524666d  doxygenify
2a36a3e  doxygenify
229d218  doxygenify
7fd28ae  doxygenify
a8c64bc  doxygenify
f5d21b6  fix
5d8ede8  doxygenify
a58a4e0  language_model.cpp
fa85709  lm_pain_points.cpp lm_state.cpp
6418da3  merge
06190ba  Merge branch 'old_doxygen_merge' into more-doxygen
84acf08  Merge branch 'master' into more-doxygen
50fe1ff  pagewalk.cpp cube_reco_context.cpp
2982583  change to relative
192a24a  applybox.cpp, take one
8eeb053  delete docs for obsolete params
52e4c77  modernise classify/ocrfeatures.cpp
2a1cba6  modernise cutil/emalloc.cpp
773e006  silence doxygen warning
aeb1731  silence doxygen warning
f18387f  silence doxygen; new params are unused?
15ad6bd  doxygenify cutil/efio.cpp
c8b5dad  doxygenify cutil/danerror.cpp
784450f  the globals and exceptions parts are obsolete; remove
8bca324  doxygen classify/normfeat.cpp
9bcbe16  doxygen classify/normmatch.cpp
aa9a971  doxygen ccmain/cube_control.cpp
c083ff2  doxygen ccmain/cube_reco_context.cpp
f842850  params changed
5c94f12  doxygen ccmain/cubeclassifier.cpp
15ba750  case sensitive
f5c71d4  case sensitive
f85655b  doxygen classify/intproto.cpp
4bbc7aa  partial doxygen classify/mfx.cpp
dbb6041  partial doxygen classify/intproto.cpp
2aa72db  finish doxygen classify/intproto.cpp
0b8de99  doxygen training/mftraining.cpp
0b5b35c  partial doxygen ccstruct/coutln.cpp
b81c766  partial doxygen ccstruct/coutln.cpp
40fc415  finished? doxygen ccstruct/coutln.cpp
6e4165c  doxygen classify/clusttool.cpp
0267dec  doxygen classify/cutoffs.cpp
7f0c70c  doxygen classify/fpoint.cpp
512f3bd  ignore ~ files
5668a52  doxygen classify/intmatcher.cpp
84788d4  doxygen classify/kdtree.cpp
29f36ca  doxygen classify/mfoutline.cpp
40b94b1  silence doxygen warnings
6c511b9  doxygen classify/mfx.cpp
f9b4080  doxygen classify/outfeat.cpp
aa1df05  doxygen classify/picofeat.cpp
cc5f466  doxygen training/cntraining.cpp
cce044f  doxygen training/commontraining.cpp
167e216  missing param
9498383  renamed params
37eeac2  renamed param
d87b5dd  case
c8ee174  renamed params
b858db8  typo
4c2a838  h2 context?
81a2c0c  fix some param names; add some missing params, no docs
bcf8a4c  add some missing params, no docs
af77f86  add some missing params, no docs; fix some param names
01df24e  fix some params
6161056  fix some params
68508b6  fix some params
285aeb6  doxygen complains here no matter what
529bcfa  rm some missing params, typos
cd21226  rm some missing params, add some new ones
48a4bc2  fix params
c844628  missing param
312ce37  missing param; rename one
ec2fdec  missing param
05e15e0  missing params
d515858  change "<" to &lt; to make doxygen happy
b476a28  wrong place
2015-07-20 18:48:00 +01:00
Zdenko Podobný
ec9581d8f2 temporary add configure and Makefile.in for release 2015-07-11 09:42:43 +02:00
Jim O'Regan
a94943cc1f remove unneeded comment from commit 2015-05-13 14:59:02 +01:00
oriahulrich@microvu.com
d3252f926e Issue 1316: The traineddata file must be closed after it was opened 2015-05-13 14:53:37 +01:00
Ray Smith
84920b92b3 Font and classifier output structure cleanup.
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.

Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
3c21c14949 Fixed issue 1245 2014-08-13 18:51:28 -07:00
Ray Smith
736d327473 NOP changes from static analysis in issue 1205 2014-08-12 16:09:12 -07:00
zdenop
ee73e3b107 fix issue 123: user-words (and user-patterns) file specified by command line
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1093 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-04 21:11:00 +00:00
theraysmith@gmail.com
07ca24aeaf Removed upper limit on trie size, fixing issue 1020.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1044 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 19:18:23 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
60b4f8bc88 Fixed issue 743
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@978 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-10 18:25:46 +00:00
theraysmith@gmail.com
67f9af58b8 Removed dependence on IMAGE class
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@944 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:31:29 +00:00