Commit Graph

27 Commits

Author SHA1 Message Date
Stefan Weil
8f7be2e72c lstm: Replace NULL by nullptr (#1415)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:19:27 +02:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Ray Smith
4cf123e099 Added ability to randomly rotate images upside-down during training for training OSD 2017-09-08 12:42:57 +01:00
Ray Smith
bf774382e8 Updated comments on RemapOutputs 2017-09-08 10:24:00 +01:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Stefan Weil
34d1e7331d LSTMTrainer: Catch empty vectors
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues #644, #792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-04 18:18:16 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Bernard Cafarelli
8aeb73e507
Partial fix for GRAPHICS_DISABLED build, issue #679
Include automatically generated configuration file if running autoconf
2017-01-26 11:40:35 +01:00
Stefan Weil
465e2def7b lstm: Remove unused constant and unused local variables
This fixes three compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-01-21 12:32:06 +01:00
amitdo
5d627aacae Remove code that is no longer needed
The code in ccutil/hashfn.h was needed for some old compilers. Now that we support MSVC >= 2010 and compilers that has good support for C++11, we can drop this code.

As a result of this file removal, we now use:
  std::unordered_map
  std::unordered_set
  std::unique_ptr
directly in the codebase with '#include' for the needed headers.
2017-01-16 01:49:17 +02:00
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Stefan Weil
9e0da72818 lstm: Fix possible float division by zero
Coverity report:

CID 1366441 (#1 of 1): Division or modulo by float zero (DIVIDE_BY_ZERO)
5. divide_by_zero: In expression
 static_cast<double>(char_errors) / truth_size, division by expression
 truth_size which may be zero has undefined behavior.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
dfd7082679 lstm: Fix explicit null dereferenced
Coverity report:

CID 1366443 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
3. var_deref_model: Passing null pointer this->sub_trainer_ to
 training_iteration, which dereferences it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Stefan Weil
f3e8895a6a lstm: Pass big parameter by reference (performance)
Coverity report:

CID 1366448 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter recoder of type
 tesseract::UnicharCompress const (size 240 bytes) by value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-01 14:42:54 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00
Stefan Weil
b04879412e lstm: Remove several unused variables
This fixes compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-25 15:43:39 +01:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00