Ray Smith
df41eab6aa
Added script-specific validation and normalization for virama-using scripts and updated normalization for others
2017-07-14 10:05:05 -07:00
Ray Smith
da03e4e910
Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion
2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069
fix filenames in comments
2017-07-02 17:35:47 -04:00
Stefan Weil
5f8ecdb2b3
Remove local implementation of strtok_r
...
MS Visual Studio does not provide that function, but can use strtok_s
which does exactly the same.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 19:52:25 +02:00
Stefan Weil
fb863c97a9
UNICHARSET: Add missing initialization
...
The member variable default_sid_ was used without being initialized.
Valgrind report for `tesseract --oem 1 hello.png hello`:
Conditional jump or move depends on uninitialised value(s)
at 0x14352E: BITS16::set_bit(unsigned char, unsigned char) (bits16.h:50)
by 0x143E27: WERD::set_flag(WERD_FLAGS, unsigned char) (werd.h:129)
by 0x27D053: WERD_RES::SetupWordScript(UNICHARSET const&) (pageres.cpp:381)
by 0x27CAFD: WERD_RES::SetupForRecognition(UNICHARSET const&, tesseract::Tesseract*, Pix*, int, TBOX const*, bool, bool, bool, ROW*, BLOCK const*) (pageres.cpp:316)
by 0x145903: tesseract::Tesseract::SetupWordPassN(int, tesseract::WordData*) (control.cpp:182)
by 0x145780: tesseract::Tesseract::SetupAllWordsPassN(int, TBOX const*, char const*, PAGE_RES*, GenericVector<tesseract::WordData>*) (control.cpp:168)
by 0x146293: tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int) (control.cpp:336)
by 0x12F356: tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) (baseapi.cpp:878)
by 0x13036D: tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1184)
by 0x13014A: tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1140)
by 0x12FBCE: tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) (baseapi.cpp:1040)
by 0x12C3DF: main (tesseractmain.cpp:515)
Uninitialised value was created by a heap allocation
at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
by 0x12D88B: tesseract::TessBaseAPI::Init(char const*, int, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool, bool (*)(STRING const&, GenericVector<char>*)) (baseapi.cpp:320)
by 0x12D6DA: tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (baseapi.cpp:284)
by 0x12C088: main (tesseractmain.cpp:440)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-19 20:57:39 +02:00
Stefan Weil
e05f4c677d
Remove obsolete comments and unused code from ccutil/host.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-17 11:55:00 +02:00
Stefan Weil
3a6a8d70fc
Replace Standard C library header files by C++ header files
...
Replacing inttypes.h by cinttypes fixes a problem with glibc < 2.18:
In older inttypes.h, the standard C format macros are only defined for
C++ when the macro __STDC_FORMAT_MACROS is set.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-17 11:49:43 +02:00
Stefan Weil
0ba202f6ed
Remove unneeded null pointer check
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-16 22:58:10 +02:00
Stefan Weil
46ca83071e
genericvector: Add overloaded LoadDataFromFile
...
Several code locations call that method with a normal C string,
so overload it to accept that without a conversion to a STRING
object. This saves unneeded new / memcpy / delete operations.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-16 22:57:46 +02:00
Stefan Weil
079d6b9161
Improve robustness of TessdataManager
...
Tesseract crashes with an unhandled exception (std::bad_alloc) if it gets
a bad tessdata file where the numEntries data field is very large (also
after swapping), for example 0x77777777.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-14 21:33:56 +02:00
Stefan Weil
db8750e94e
Remove unused method TessdataManager::LoadFileLater
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 13:14:47 +02:00
Stefan Weil
65b839e1aa
Remove unused method TessdataManager::OverwriteEntry
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 13:14:47 +02:00
zdenop
6bebe71749
Merge pull request #910 from stweil/opt
...
Fix GenericVector and optimize some code which used GenericVector::init_to_size
2017-05-13 12:53:40 +02:00
Stefan Weil
69296f8d18
Clean method UNICHARSET::add_script
...
It increased the script_table too early, so the last element was never
used.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 11:53:43 +02:00
Stefan Weil
3a67ff930e
Optimize code by replacing init_to_size with resize_no_init
...
There is no need to initialize memory with a fixed value which is
overwritten in the next step.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Stefan Weil
bb2348bbbe
genericvector: Fix and optimize function LoadDataFromFile
...
It's not necessary to initialize the vector with 0,
because the initial values are read from file.
Fix also an assertion when trying to read an empty file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:15:54 +02:00
Stefan Weil
80f51c3758
ccutil: Remove unneeded include statement
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:11:21 +02:00
Raf Schietekat
c335508e84
Fewer g++ -Wsign-compare warnings
2017-05-11 23:14:52 +02:00
Stefan Weil
7831a35dbb
ccutil: Simplify code (removes type cast)
...
There is no need for an intermediate variable char_buffer.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 20:10:17 +02:00
Stefan Weil
9266f01857
Remove macros which are no longer needed
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Stefan Weil
f2252fdadc
Introduce standard macros for format specifiers
...
There exist standard macro definitions for the printf format specifiers.
MS Visual Studio does not support that standard (at least not in older
versions), so local definitions are needed there.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:30:49 +02:00
zdenop
64994a2707
Merge pull request #900 from rfschtkt/cast
...
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Stefan Weil
3cccae69e5
Fix wrong format string
...
The local variable intval is of type int.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 09:06:02 +02:00
Raf Schietekat
3983d2f76a
Reviewed uses of reinterpret_cast
2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce
Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518
2017-05-03 16:09:44 -07:00
Stefan Weil
46c887b77e
genericvector: Fix minimum size
...
Commit 907de5995f
tried to improve
GenericVector, but missed a case where vectors with less than
kDefaultVectorSize were allocated. This resulted in additional
alloc / free operations.
Commit a28b2a033d
(before memory optimization)
oem 0: total heap usage: 739,238 allocs, 739,237 frees, 161,699,214 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated
Commit fd3f8f9b2d
without genericvector change
oem 0: total heap usage: 738,980 allocs, 738,979 frees, 161,697,150 bytes allocated
oem 1: total heap usage: 690,182 allocs, 690,175 frees, 144,470,400 bytes allocated
oem 2: total heap usage: 728,213 allocs, 728,206 frees, 182,885,824 bytes allocated
=> Improvements for oem 0, no change for oem 1 and oem 2.
Commit fd3f8f9b2d
oem 0: total heap usage: 772,648 allocs, 772,647 frees, 160,083,901 bytes allocated
oem 1: total heap usage: 748,591 allocs, 748,584 frees, 143,581,672 bytes allocated
oem 2: total heap usage: 764,796 allocs, 764,789 frees, 181,212,197 bytes allocated
=> Less bytes allocated, but more allocs / frees = bad for performance.
Commit fd3f8f9b2d
with this patch
oem 0: total heap usage: 677,537 allocs, 677,536 frees, 160,444,634 bytes allocated
oem 1: total heap usage: 653,812 allocs, 653,805 frees, 143,423,008 bytes allocated
oem 2: total heap usage: 670,029 allocs, 670,022 frees, 181,517,760 bytes allocated
=> Improvements for all three cases.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-03 09:49:23 +02:00
Stefan Weil
048cf9d06a
Remove unused local variables
...
This fixes some compiler warnings.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 09:43:29 +02:00
zdenop
fd3f8f9b2d
Merge pull request #352 from pnordhus/reduce_mallocs
...
Avoid unnecessary memory allocations
2017-04-30 17:39:31 +02:00
Stefan Weil
f8fba59804
Replace alloc_struct, free_struct
...
Both functions simply call malloc, free.
Remove also unneeded null pointer checks and use calloc where possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 09:25:04 +02:00
Ray Smith
7a116ce8bb
More formatting fixes from clang tidy
2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d
Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here.
2017-04-27 15:48:23 -07:00
Stefan Weil
8f8651b6ce
Fix typo
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-15 17:27:56 +02:00
Stefan Weil
363f13157b
ccutil: Remove unused variable
...
This fixes a compiler warning:
ccutil/scanutils.cpp:284:7: warning:
variable 'sign' set but not used [-Wunused-but-set-variable]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:38:59 +01:00
Mikhail Solomennik
ba4b60374d
Correct reading config files with \r\n
2017-03-01 14:41:17 +03:00
Ray Smith
f566a45b30
clang-tidy changes from sync
2017-01-25 16:20:19 -08:00
Egor Pugin
9b604b1eb9
Fix possible warning when WIN32_LEAN_AND_MEAN is already defined.
2017-01-24 00:22:36 +03:00
amitdo
5d627aacae
Remove code that is no longer needed
...
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.
As a result of this file removal, we now use:
std::unordered_map
std::unordered_set
std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Egor Pugin
442b5b731a
Fix building of training tools in shared configuration.
2016-12-17 16:19:35 +03:00
zdenop
da4c064c2e
Merge pull request #531 from stweil/guards
...
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Ray Smith
13e46ae1c4
Made LSTM the default engine, pushed cube out
2016-12-13 14:37:40 -08:00
Ray Smith
d55f462c9c
More clang-tidy from previous commits
2016-12-06 13:45:49 -08:00
Stefan Weil
533399e335
Remove unused macro _TESS_FILE_BASENAME
...
This fixes a compiler warning from clang:
ccutil/platform.h:88:13: warning:
macro name is a reserved identifier [-Wreserved-id-macro]
#define _TESS_FILE_BASENAME_ \
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
70c6f1624c
Fix #define guards in header files
...
Some guards were missing, others were not the first statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
4897796d57
Replace reserved identifiers used in #define guards header files
...
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard ).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
cefc420ddb
Remove extra semicolons after member function definitions
...
clang++ report:
api/baseapi.h:852:4: warning:
extra ';' after member function definition [-Wextra-semi]
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
Ray Smith
ce76d1c569
Fixes to training process to allow incremental training from a recognition model
2016-11-30 15:51:17 -08:00
Ray Smith
53003f9074
Formatting changes from clang_tidy on latest pull
2016-11-30 15:44:25 -08:00
Stefan Weil
faea44cbc7
mingw-w64: Fix compiler warnings caused by macro redefinition
...
GNU compiler report (cross build for Windows on Debian):
In file included from ../ccutil/host.h:63:0,
from ../arch/dotproductsse.h:22,
from ../arch/dotproductsse.cpp:43:
../ccutil/platform.h:27:0: warning: "NOMINMAX" redefined
#define NOMINMAX
In file included from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/c++config.h:495:0,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/cstdlib:41,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/stdlib.h:36,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/mm_malloc.h:27,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/xmmintrin.h:34,
from /usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/emmintrin.h:31,
from ../arch/dotproductsse.cpp:40:
/usr/lib/gcc/i686-w64-mingw32/6.1-win32/include/c++/i686-w64-mingw32/bits/os_defines.h:45:0:
note: this is the location of the previous definition
#define NOMINMAX 1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-29 14:37:10 +01:00
Stefan Weil
85e37798cb
Simplify delete operations
...
It is not necessary to check for null pointers.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c
Fix windows build.
2016-11-24 17:32:23 +03:00