Commit Graph

63 Commits

Author SHA1 Message Date
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
14ee911978 lstm: Use MS C intrinsic function for faster calculation of log2 (#1369)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 20:52:39 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
4cf123e099 Added ability to randomly rotate images upside-down during training for training OSD 2017-09-08 12:42:57 +01:00
Ray Smith
bf774382e8 Updated comments on RemapOutputs 2017-09-08 10:24:00 +01:00
Ray Smith
77c44cdecd Added convert to int and directory listing to combine_tessdata 2017-08-02 14:53:07 -07:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
rays
f4f66f8fa9 Fixed regression of issue #644 2017-07-15 17:21:47 -07:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Stefan Weil
34d1e7331d LSTMTrainer: Catch empty vectors
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues #644, #792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-04 18:18:16 +02:00
Stefan Weil
15b3596ec4 Optimize LSTM code for builds without OpenMP
The constant value kNumThreads is not only used to configure the number
of threads but also to allocate vectors used in those threads.

There is only a single thread without OpenMP, so it is sufficient to
allocate vectors with only one element in that case.

Replace also the upper limit in the for loops by the known vector size.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-22 10:13:53 +02:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
b86b4fa06b Better fix for re-enabling training 2017-05-08 14:26:09 -07:00
Ray Smith
d18931e86e Fixed int types for imported tf networks 2017-05-05 16:42:44 -07:00
Ray Smith
4fa463cd71 Corrected SetEnableTraining for recovery from a recognize-only model. 2017-05-05 16:39:43 -07:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
dcc86664e2 lstm: Remove unused variable
This fixes a compiler warning:

lstm/input.cpp:141:7: warning: unused variable 'width' [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:30:31 +01:00
Bernard Cafarelli
4c39775c2d
Fix Network declarations with GRAPHICS_DISABLED 2017-01-27 12:06:15 +01:00
Bernard Cafarelli
8aeb73e507
Partial fix for GRAPHICS_DISABLED build, issue #679
Include automatically generated configuration file if running autoconf
2017-01-26 11:40:35 +01:00
Ray Smith
b453f74e01 Fixed issue #633 (multi-language mode 2017-01-25 15:58:39 -08:00
Stefan Weil
465e2def7b lstm: Remove unused constant and unused local variables
This fixes three compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-01-21 12:32:06 +01:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Stefan Weil
19616b07ba lstm: Move class SIMDDetect to new source file and improve code
Modify also the code to use a singleton. This simplifies the code as
no locking is needed. It also slightly improves the performance because
no check whether the architecture was tested is needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 13:25:08 +01:00
Stefan Weil
5cfe0c700c lstm: Add AVX / SSE support for Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 11:14:15 +01:00
Stefan Weil
b2a0262c59 lstm: Update AVX / SSE support
* Fix compiler warning (see below)

* Use Linux code for Mingw-w64, too

* Simplify conditional code by using X86_BUILD instead of NONX86_BUILD

* Remove unneeded call of __get_cpuid_max (already called by __get_cpuid)

* Remove unneeded #undef statement

gcc report:

lstm/weightmatrix.cpp: In static member function
 'static double tesseract::WeightMatrix::DotProduct(const double*, const double*, int)':
weightmatrix.cpp:67:29: warning:
 'ecx' may be used uninitialized in this function [-Wmaybe-uninitialized]
       avx_available_ = (ecx & 0x10000000) != 0;
                             ^
lstm/weightmatrix.cpp:64:30: note: 'ecx' was declared here
       unsigned int eax, ebx, ecx, edx;
                              ^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 10:21:50 +01:00
Stefan Weil
9a4a32137c Fix build for non x86
cpuid.h is only available for x86 builds. There are lots of non x86
architectures, so simply checking for PowerPC is not enough.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-26 19:17:03 +01:00
Ray Smith
13e46ae1c4 Made LSTM the default engine, pushed cube out 2016-12-13 14:37:40 -08:00
zdenop
7f7cea1ee6 Merge pull request #532 from stweil/openmp
openmp: Fix build with clang++ and compilers without OpenMP support
2016-12-07 14:47:08 +01:00
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
6140be6a55 openmp: Fix build with clang++ and compilers without OpenMP support
Builds without support for OpenMP failed with the old code. Fix this:

* Add OPENMP_CXXFLAGS for ccmain.
* Replace unconditional -fopenmp by OPENMP_CXXFLAGS for lstm.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 18:44:03 +01:00
Stefan Weil
9e0da72818 lstm: Fix possible float division by zero
Coverity report:

CID 1366441 (#1 of 1): Division or modulo by float zero (DIVIDE_BY_ZERO)
5. divide_by_zero: In expression
 static_cast<double>(char_errors) / truth_size, division by expression
 truth_size which may be zero has undefined behavior.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
dfd7082679 lstm: Fix explicit null dereferenced
Coverity report:

CID 1366443 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
3. var_deref_model: Passing null pointer this->sub_trainer_ to
 training_iteration, which dereferences it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
f3e8895a6a lstm: Pass big parameter by reference (performance)
Coverity report:

CID 1366448 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter recoder of type
 tesseract::UnicharCompress const (size 240 bytes) by value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00