Commit Graph

256 Commits

Author SHA1 Message Date
Stefan Weil
079d6b9161 Improve robustness of TessdataManager
Tesseract crashes with an unhandled exception (std::bad_alloc) if it gets
a bad tessdata file where the numEntries data field is very large (also
after swapping), for example 0x77777777.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-14 21:33:56 +02:00
Stefan Weil
db8750e94e Remove unused method TessdataManager::LoadFileLater
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 13:14:47 +02:00
Stefan Weil
65b839e1aa Remove unused method TessdataManager::OverwriteEntry
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 13:14:47 +02:00
zdenop
6bebe71749 Merge pull request #910 from stweil/opt
Fix GenericVector and optimize some code which used GenericVector::init_to_size
2017-05-13 12:53:40 +02:00
Stefan Weil
69296f8d18 Clean method UNICHARSET::add_script
It increased the script_table too early, so the last element was never
used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 11:53:43 +02:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Stefan Weil
bb2348bbbe genericvector: Fix and optimize function LoadDataFromFile
It's not necessary to initialize the vector with 0,
because the initial values are read from file.

Fix also an assertion when trying to read an empty file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:15:54 +02:00
Stefan Weil
80f51c3758 ccutil: Remove unneeded include statement
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:11:21 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Stefan Weil
7831a35dbb ccutil: Simplify code (removes type cast)
There is no need for an intermediate variable char_buffer.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 20:10:17 +02:00
Stefan Weil
9266f01857 Remove macros which are no longer needed
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Stefan Weil
f2252fdadc Introduce standard macros for format specifiers
There exist standard macro definitions for the printf format specifiers.
MS Visual Studio does not support that standard (at least not in older
versions), so local definitions are needed there.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:30:49 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Stefan Weil
3cccae69e5 Fix wrong format string
The local variable intval is of type int.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 09:06:02 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
46c887b77e genericvector: Fix minimum size
Commit 907de5995f tried to improve
GenericVector, but missed a case where vectors with less than
kDefaultVectorSize were allocated. This resulted in additional
alloc / free operations.

Commit a28b2a033d (before memory optimization)
oem 0: total heap usage: 739,238 allocs, 739,237 frees, 161,699,214 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated

Commit fd3f8f9b2d without genericvector change
oem 0: total heap usage: 738,980 allocs, 738,979 frees, 161,697,150 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated
=> Improvements for oem 0, no change for oem 1 and oem 2.

Commit fd3f8f9b2d
oem 0: total heap usage: 772,648 allocs, 772,647 frees, 160,083,901 bytes allocated
oem 1: total heap usage: 748,591 allocs, 748,584 frees, 143,581,672 bytes allocated
oem 2: total heap usage: 764,796 allocs, 764,789 frees, 181,212,197 bytes allocated
=> Less bytes allocated, but more allocs / frees = bad for performance.

Commit fd3f8f9b2d with this patch
oem 0: total heap usage: 677,537 allocs, 677,536 frees, 160,444,634 bytes allocated
oem 1: total heap usage: 653,812 allocs, 653,805 frees, 143,423,008 bytes allocated
oem 2: total heap usage: 670,029 allocs, 670,022 frees, 181,517,760 bytes allocated
=> Improvements for all three cases.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-03 09:49:23 +02:00
Stefan Weil
048cf9d06a Remove unused local variables
This fixes some compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 09:43:29 +02:00
zdenop
fd3f8f9b2d Merge pull request #352 from pnordhus/reduce_mallocs
Avoid unnecessary memory allocations
2017-04-30 17:39:31 +02:00
Stefan Weil
f8fba59804 Replace alloc_struct, free_struct
Both functions simply call malloc, free.

Remove also unneeded null pointer checks and use calloc where possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 09:25:04 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
8f8651b6ce Fix typo
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-15 17:27:56 +02:00
Stefan Weil
363f13157b ccutil: Remove unused variable
This fixes a compiler warning:

ccutil/scanutils.cpp:284:7: warning:
 variable 'sign' set but not used [-Wunused-but-set-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:38:59 +01:00
Mikhail Solomennik
ba4b60374d Correct reading config files with \r\n 2017-03-01 14:41:17 +03:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Egor Pugin
9b604b1eb9 Fix possible warning when WIN32_LEAN_AND_MEAN is already defined. 2017-01-24 00:22:36 +03:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Egor Pugin
442b5b731a Fix building of training tools in shared configuration. 2016-12-17 16:19:35 +03:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Ray Smith
13e46ae1c4 Made LSTM the default engine, pushed cube out 2016-12-13 14:37:40 -08:00
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Stefan Weil
533399e335 Remove unused macro _TESS_FILE_BASENAME
This fixes a compiler warning from clang:

ccutil/platform.h:88:13: warning:
 macro name is a reserved identifier [-Wreserved-id-macro]
    #define _TESS_FILE_BASENAME_                                            \

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
70c6f1624c Fix #define guards in header files
Some guards were missing, others were not the first statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
cefc420ddb Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
53003f9074 Formatting changes from clang_tidy on latest pull 2016-11-30 15:44:25 -08:00
Stefan Weil
faea44cbc7 mingw-w64: Fix compiler warnings caused by macro redefinition
GNU compiler report (cross build for Windows on Debian):

In file included from ../ccutil/host.h:63:0,
                 from ../arch/dotproductsse.h:22,
                 from ../arch/dotproductsse.cpp:43:
../ccutil/platform.h:27:0: warning: "NOMINMAX" redefined
 #define NOMINMAX

In file included from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/c++config.h:495:0,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/cstdlib:41,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/stdlib.h:36,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/mm_malloc.h:27,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/xmmintrin.h:34,
                 from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/emmintrin.h:31,
                 from ../arch/dotproductsse.cpp:40:
/usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/os_defines.h:45:0:
 note: this is the location of the previous definition
 #define NOMINMAX 1

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-29 14:37:10 +01:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
zdenop
64159c7fbb Merge pull request #177 from stweil/posix
Introduce POSIX data types
2016-11-24 14:25:47 +01:00
Ray Smith
0169969b6f Merge branch 'opt' of https://github.com/stweil/tesseract into stweil-opt
Testing before pulling.
2016-11-22 09:55:41 -08:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Stefan Weil
94be4be4be ccutil/ambigs: Optimize tesseract::UnicharIdArrayUtils::compare
The compare method is called very often, so even small improvements
are important.

The new code avoids one comparison in each loop iteration.
This results in smaller code (60 bytes for x86_64, gcc).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-12 19:21:57 +01:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
a987e6d87c Major bug fixes to pango renderer and resolved issue of hash_map vs unordered_map 2016-11-07 11:35:45 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
49c5a5754f Introduce POSIX data types
POSIX provides portable data types for signed and unsigned integer values
of different size.

This patch maps those POSIX data types to the Tesseract specific types.
In a next step, the Tesseract data types can be eliminated by replacing
them with the POSIX data types.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-26 09:03:58 +02:00
zdenop
da89ff9ece Merge pull request #447 from stweil/leak
Fix some memory leaks
2016-10-24 20:45:57 +02:00