Commit Graph

242 Commits

Author SHA1 Message Date
Stefan Weil
3cccae69e5 Fix wrong format string
The local variable intval is of type int.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 09:06:02 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
46c887b77e genericvector: Fix minimum size
Commit 907de5995f tried to improve
GenericVector, but missed a case where vectors with less than
kDefaultVectorSize were allocated. This resulted in additional
alloc / free operations.

Commit a28b2a033d (before memory optimization)
oem 0: total heap usage: 739,238 allocs, 739,237 frees, 161,699,214 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated

Commit fd3f8f9b2d without genericvector change
oem 0: total heap usage: 738,980 allocs, 738,979 frees, 161,697,150 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated
=> Improvements for oem 0, no change for oem 1 and oem 2.

Commit fd3f8f9b2d
oem 0: total heap usage: 772,648 allocs, 772,647 frees, 160,083,901 bytes allocated
oem 1: total heap usage: 748,591 allocs, 748,584 frees, 143,581,672 bytes allocated
oem 2: total heap usage: 764,796 allocs, 764,789 frees, 181,212,197 bytes allocated
=> Less bytes allocated, but more allocs / frees = bad for performance.

Commit fd3f8f9b2d with this patch
oem 0: total heap usage: 677,537 allocs, 677,536 frees, 160,444,634 bytes allocated
oem 1: total heap usage: 653,812 allocs, 653,805 frees, 143,423,008 bytes allocated
oem 2: total heap usage: 670,029 allocs, 670,022 frees, 181,517,760 bytes allocated
=> Improvements for all three cases.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-03 09:49:23 +02:00
Stefan Weil
048cf9d06a Remove unused local variables
This fixes some compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 09:43:29 +02:00
zdenop
fd3f8f9b2d Merge pull request #352 from pnordhus/reduce_mallocs
Avoid unnecessary memory allocations
2017-04-30 17:39:31 +02:00
Stefan Weil
f8fba59804 Replace alloc_struct, free_struct
Both functions simply call malloc, free.

Remove also unneeded null pointer checks and use calloc where possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 09:25:04 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
8f8651b6ce Fix typo
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-15 17:27:56 +02:00
Stefan Weil
363f13157b ccutil: Remove unused variable
This fixes a compiler warning:

ccutil/scanutils.cpp:284:7: warning:
 variable 'sign' set but not used [-Wunused-but-set-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:38:59 +01:00
Mikhail Solomennik
ba4b60374d Correct reading config files with \r\n 2017-03-01 14:41:17 +03:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Egor Pugin
9b604b1eb9 Fix possible warning when WIN32_LEAN_AND_MEAN is already defined. 2017-01-24 00:22:36 +03:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Egor Pugin
442b5b731a Fix building of training tools in shared configuration. 2016-12-17 16:19:35 +03:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Ray Smith
13e46ae1c4 Made LSTM the default engine, pushed cube out 2016-12-13 14:37:40 -08:00
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Stefan Weil
533399e335 Remove unused macro _TESS_FILE_BASENAME
This fixes a compiler warning from clang:

ccutil/platform.h:88:13: warning:
 macro name is a reserved identifier [-Wreserved-id-macro]
    #define _TESS_FILE_BASENAME_                                            \

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
70c6f1624c Fix #define guards in header files
Some guards were missing, others were not the first statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
cefc420ddb Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
53003f9074 Formatting changes from clang_tidy on latest pull 2016-11-30 15:44:25 -08:00
Stefan Weil
faea44cbc7 mingw-w64: Fix compiler warnings caused by macro redefinition
GNU compiler report (cross build for Windows on Debian):

In file included from ../ccutil/host.h:63:0,
                 from ../arch/dotproductsse.h:22,
                 from ../arch/dotproductsse.cpp:43:
../ccutil/platform.h:27:0: warning: "NOMINMAX" redefined
 #define NOMINMAX

In file included from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/c++config.h:495:0,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/cstdlib:41,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/stdlib.h:36,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/mm_malloc.h:27,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/xmmintrin.h:34,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/emmintrin.h:31,
                 from ../arch/dotproductsse.cpp:40:
/usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/os_defines.h:45:0:
 note: this is the location of the previous definition
 #define NOMINMAX 1

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-29 14:37:10 +01:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
zdenop
64159c7fbb Merge pull request #177 from stweil/posix
Introduce POSIX data types
2016-11-24 14:25:47 +01:00
Ray Smith
0169969b6f Merge branch 'opt' of https://github.com/stweil/tesseract into stweil-opt
Testing before pulling.
2016-11-22 09:55:41 -08:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Stefan Weil
94be4be4be ccutil/ambigs: Optimize tesseract::UnicharIdArrayUtils::compare
The compare method is called very often, so even small improvements
are important.

The new code avoids one comparison in each loop iteration.
This results in smaller code (60 bytes for x86_64, gcc).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-12 19:21:57 +01:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
a987e6d87c Major bug fixes to pango renderer and resolved issue of hash_map vs unordered_map 2016-11-07 11:35:45 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
49c5a5754f Introduce POSIX data types
POSIX provides portable data types for signed and unsigned integer values
of different size.

This patch maps those POSIX data types to the Tesseract specific types.
In a next step, the Tesseract data types can be eliminated by replacing
them with the POSIX data types.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-26 09:03:58 +02:00
zdenop
da89ff9ece Merge pull request #447 from stweil/leak
Fix some memory leaks
2016-10-24 20:45:57 +02:00
Stefan Weil
53c572b47a ccutils/params: Fix memory leak for static variable global_params
It is possible to avoid the dynamic memory allocation here.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 20:20:24 +02:00
Stefan Weil
a351dae29b ccutil/tessdatamanager: Fix resource leak
Coverity report:

CID 1340278 (#1 of 1): Resource leak (RESOURCE_LEAK)
11. leaked_storage: Variable output_file going out of scope leaks the storage it points to.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 16:00:57 +02:00
Stefan Weil
1274874e90 ccutil: Fix and simplify implementation of variadic macro
The implementation for MS C did not pass the variable arguments to
tprintf.

The standard is supported since C99 / C++11, so one implementation
is sufficient.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-08-29 08:15:00 +02:00
Philipp Nordhus
907de5995f Do not allocate in GenericVector default ctor 2016-06-17 22:38:41 +02:00
Marco Atzeri
b1c921b59e Fix Cygwin compatibility 2016-06-17 15:52:01 +03:00
Heiko Oberdiek
dec38db7ce Fix for constant kMaxDoubleSize (from 15 to 16),
which is used by method STRING::add_str_double.
2016-05-25 16:26:41 +02:00
Michael McConville
eb00574c4a Remove conditional definition of off_t
As pointed out by Stefan Weil, conditionally defining off_t using a
macro isn't a valid approach. off_t does not have a fixed size and is
used in ABI definitions (e.g. syscalls), so silently guessing its size
risks breaking the build. Additionally, all sane and modern platforms
will have off_t.
2016-04-13 15:15:56 -04:00
zdenop
6f6953a972 Merge pull request #180 from stweil/master
Remove unneeded definition for NULL
2016-01-05 17:22:57 +01:00
Zdenko Podobný
1db94823a9 Add info for progress monitor, make it visible in doxygen doc; remove commented code 2016-01-05 17:21:53 +01:00
zdenop
c53add706e Merge pull request #27 from tesseract-ocr/monitor
Monitor
2016-01-05 16:28:42 +01:00
Stefan Weil
7334572c4c Remove unneeded definition for NULL
NULL is already defined in stddef.h,
so a local definition is not be needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-12-25 12:25:54 +01:00
Stefan Weil
450efa68cd Get tessdata prefix from executable path (only for Windows)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-12-11 10:06:21 +01:00
Stefan Weil
4fdf272ffa Remove checks for this == NULL
This fixes warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-07 13:09:53 +01:00
Stefan Weil
4a92ff5862 Fix compiler warnings for copy constructors
gcc reports these warnings with -Wextra:

ccstruct/pageres.h:330:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.cpp:115:1: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccstruct/ratngs.h:291:3: warning:
 base class 'class ELIST_LINK' should be explicitly initialized
 in the copy constructor [-Wextra]
ccutil/genericvector.h:435:3: warning:
 base class 'class GenericVector<WERD_RES*>' should be explicitly initialized
 in the copy constructor [-Wextra]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 09:19:37 +01:00