Commit Graph

51 Commits

Author SHA1 Message Date
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
rays
f4f66f8fa9 Fixed regression of issue #644 2017-07-15 17:21:47 -07:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Stefan Weil
34d1e7331d LSTMTrainer: Catch empty vectors
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues #644, #792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-04 18:18:16 +02:00
Stefan Weil
15b3596ec4 Optimize LSTM code for builds without OpenMP
The constant value kNumThreads is not only used to configure the number
of threads but also to allocate vectors used in those threads.

There is only a single thread without OpenMP, so it is sufficient to
allocate vectors with only one element in that case.

Replace also the upper limit in the for loops by the known vector size.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-22 10:13:53 +02:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
b86b4fa06b Better fix for re-enabling training 2017-05-08 14:26:09 -07:00
Ray Smith
d18931e86e Fixed int types for imported tf networks 2017-05-05 16:42:44 -07:00
Ray Smith
4fa463cd71 Corrected SetEnableTraining for recovery from a recognize-only model. 2017-05-05 16:39:43 -07:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
dcc86664e2 lstm: Remove unused variable
This fixes a compiler warning:

lstm/input.cpp:141:7: warning: unused variable 'width' [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:30:31 +01:00
Bernard Cafarelli
4c39775c2d
Fix Network declarations with GRAPHICS_DISABLED 2017-01-27 12:06:15 +01:00
Bernard Cafarelli
8aeb73e507
Partial fix for GRAPHICS_DISABLED build, issue #679
Include automatically generated configuration file if running autoconf
2017-01-26 11:40:35 +01:00
Ray Smith
b453f74e01 Fixed issue #633 (multi-language mode 2017-01-25 15:58:39 -08:00
Stefan Weil
465e2def7b lstm: Remove unused constant and unused local variables
This fixes three compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-01-21 12:32:06 +01:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Stefan Weil
19616b07ba lstm: Move class SIMDDetect to new source file and improve code
Modify also the code to use a singleton. This simplifies the code as
no locking is needed. It also slightly improves the performance because
no check whether the architecture was tested is needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 13:25:08 +01:00
Stefan Weil
5cfe0c700c lstm: Add AVX / SSE support for Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 11:14:15 +01:00
Stefan Weil
b2a0262c59 lstm: Update AVX / SSE support
* Fix compiler warning (see below)

* Use Linux code for Mingw-w64, too

* Simplify conditional code by using X86_BUILD instead of NONX86_BUILD

* Remove unneeded call of __get_cpuid_max (already called by __get_cpuid)

* Remove unneeded #undef statement

gcc report:

lstm/weightmatrix.cpp: In static member function
 'static double tesseract::WeightMatrix::DotProduct(const double*, const double*, int)':
weightmatrix.cpp:67:29: warning:
 'ecx' may be used uninitialized in this function [-Wmaybe-uninitialized]
       avx_available_ = (ecx & 0x10000000) != 0;
                             ^
lstm/weightmatrix.cpp:64:30: note: 'ecx' was declared here
       unsigned int eax, ebx, ecx, edx;
                              ^
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 10:21:50 +01:00
Stefan Weil
9a4a32137c Fix build for non x86
cpuid.h is only available for x86 builds. There are lots of non x86
architectures, so simply checking for PowerPC is not enough.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-26 19:17:03 +01:00
Ray Smith
13e46ae1c4 Made LSTM the default engine, pushed cube out 2016-12-13 14:37:40 -08:00
zdenop
7f7cea1ee6 Merge pull request #532 from stweil/openmp
openmp: Fix build with clang++ and compilers without OpenMP support
2016-12-07 14:47:08 +01:00
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
6140be6a55 openmp: Fix build with clang++ and compilers without OpenMP support
Builds without support for OpenMP failed with the old code. Fix this:

* Add OPENMP_CXXFLAGS for ccmain.
* Replace unconditional -fopenmp by OPENMP_CXXFLAGS for lstm.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 18:44:03 +01:00
Stefan Weil
9e0da72818 lstm: Fix possible float division by zero
Coverity report:

CID 1366441 (#1 of 1): Division or modulo by float zero (DIVIDE_BY_ZERO)
5. divide_by_zero: In expression
 static_cast<double>(char_errors) / truth_size, division by expression
 truth_size which may be zero has undefined behavior.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
dfd7082679 lstm: Fix explicit null dereferenced
Coverity report:

CID 1366443 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
3. var_deref_model: Passing null pointer this->sub_trainer_ to
 training_iteration, which dereferences it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
f3e8895a6a lstm: Pass big parameter by reference (performance)
Coverity report:

CID 1366448 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter recoder of type
 tesseract::UnicharCompress const (size 240 bytes) by value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
bb6cfc1c75 lstm: Initialize member variable beam_size_
Coverity report:

CID 1366450 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member beam_size_ is not initialized
 in this constructor nor in any functions that it calls.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
06b28a111d lstm: Initialize member variable input_width_
Coverity report:

CID 1366452 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
18. uninit_member: Non-static class member input_width_ is not initialized
 in this constructor nor in any functions that it calls.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Ray Smith
9d9056716f Added std:: to vector 2016-11-30 15:45:36 -08:00
Stefan Weil
b04879412e lstm: Remove several unused variables
This fixes compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-25 15:43:39 +01:00
Egor Pugin
a8f444112e Fix build with leptonica 1.73. 2016-11-24 18:42:49 +03:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
zdenop
a8cfc7e2ad Merge pull request #491 from stweil/lstm
lstm: Fix compilation (undeclared 'isnan')
2016-11-22 12:03:40 +01:00
Stefan Weil
beb564df82 lstm: Fix compilation (undeclared 'isnan')
gcc report:

lstm/lstmrecognizer.cpp:608:47: error: 'isnan' was not declared in this scope
     ASSERT_HOST(!isnan(output.f(t)[null_char_]));

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-22 11:31:01 +01:00
Stefan Weil
0c9235ebc2 Fix typos in new LSTM code
All of them were found and fixed by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-22 08:25:43 +01:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00