Commit Graph

2981 Commits

Author SHA1 Message Date
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Egor Pugin
61adbdfa4b Merge pull request #1054 from tdhintz/master
std::max build fix.
2017-07-27 02:49:21 +03:00
Hintz
67314ea9bd Merge pull request #1 from tdhintz/tdhintz-stdmax-patch
Define std::max under VS2017 x64
2017-07-26 16:40:08 -05:00
Hintz
c5a861b229 Define std::max under VS2017 x64 2017-07-26 17:19:40 -04:00
Ray Smith
0e95e2ca87 Rewrote the recoder to use an encoding based on wubi instead of radical-stroke index, changed from normalized to unnormalized unichar representation 2017-07-25 09:40:44 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Stefan Weil
99755b0732 googletest: Add dummy test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
796cd7ab56 cmake: Add googletest
The submodule is build automatically as soon as it exists.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
f36dc34c4f Add googletest submodule
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Ray Smith
4efc539f51 clang tidy on previous pull 2017-07-19 17:04:49 -07:00
Ray Smith
4e8018d013 Important fix to RTL languages saves last space on each line, which was previously lost 2017-07-19 17:04:06 -07:00
Ray Smith
3f7735492f Removed unnecessary using statements and cleaned up google/non-google distinction 2017-07-19 16:42:48 -07:00
Ray Smith
cec1037260 Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original 2017-07-19 16:28:26 -07:00
Egor Pugin
66e686a0e6 Merge pull request #1041 from stweil/leptonica
Use lept_free to free memory allocated by Leptonica
2017-07-16 18:04:54 +03:00
Egor Pugin
900bf6076f Merge pull request #1040 from stweil/clean
Delete unused code in PangoFontInfo
2017-07-16 14:21:08 +03:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
Stefan Weil
ba95a686aa Use lept_free to free memory allocated by Leptonica
This fixes problems on Windows when Tesseract and Leptonica use different
C runtime libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 08:34:18 +02:00
Stefan Weil
5a7b7ed7e1 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:22:05 +02:00
Stefan Weil
0cd71c67c9 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:59 +02:00
Stefan Weil
fbfbf67cf9 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:49 +02:00
Stefan Weil
500f913b51 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:35 +02:00
Stefan Weil
059e30d4cb PangoFontInfo: Remove unused method is_fraktur
That restores commit 25e0c1accb and
partially revert commit 4907a23fea
which added the now unused Shlwapi library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:16:38 +02:00
rays
f4f66f8fa9 Fixed regression of issue #644 2017-07-15 17:21:47 -07:00
zdenop
4b6f0b9538 Merge pull request #1039 from stweil/clean
Fix compiler warnings
2017-07-15 20:18:50 +02:00
Egor Pugin
4907a23fea Fix windows build. 2017-07-15 15:09:00 +03:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Stefan Weil
fa9e43fdde Fix wrong data type in argument for sscanf
Compiler warning:

    ccutil/unicharcompress.cpp:76:76: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]
    ccutil/unicharcompress.cpp:80:31: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Stefan Weil
f527715933 Fix type of bit values (fixes compiler warning)
A single bit cannot represent a signed integer value, so it must be an
unsigned integer. This fixes a compiler warning:

    In file included from ../ccutil/clst.h:24:0,
                     from ../ccstruct/blobbox.h:23,
                     from workingpartset.h:24,
                     from workingpartset.cpp:21:
    ../ccstruct/blobbox.h: In member function ‘void BLOBNBOX::set_reduced_box(TBOX)’:
    ../ccutil/host.h:79:25: warning: overflow in conversion from ‘int’ to ‘signed char:1’ changes value from ‘1’ to ‘-1’ [-Woverflow]
     #define TRUE            1
                             ^
    ../ccstruct/blobbox.h:236:17: note: in expansion of macro ‘TRUE’
           reduced = TRUE;
                     ^~~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
7588540296 Removed changes from last commit that didn't belong 2017-07-14 11:08:26 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Ray Smith
df41eab6aa Added script-specific validation and normalization for virama-using scripts and updated normalization for others 2017-07-14 10:05:05 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
zdenop
f5c18f78c0 Merge pull request #1018 from hotchkiss87/fix_file_names
fix filenames in comments (trivial)
2017-07-03 08:22:57 +02:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
zdenop
10779bd9e5 Merge pull request #1017 from hotchkiss87/correct_error_message_format
Add new line to a few error messages.
2017-07-01 17:47:58 +02:00
Justin Hotchkiss Palermo
1d862a54bd Add new line to a few error messages. 2017-07-01 08:40:57 -04:00
zdenop
a9303a18ce Merge pull request #1014 from elopio/patch-1
Download the leptonica source from github
2017-06-30 13:37:48 +02:00
Leo Arias
91afb5540f Download the leptonica source from github
1.74.2 is no longer available from the leptonica website. But anyway, it seems safer going forward to download it from github. It's https, and it won't disappear as easily. Also, this is the same source used by travis, so there's less chance of shipping something untested.
2017-06-29 16:29:29 -06:00
zdenop
2b854e3749 Merge pull request #978 from stweil/lstm
LSTMTrainer: Catch empty vectors
2017-06-12 19:32:45 +02:00
zdenop
8c29e6827f Merge pull request #980 from stweil/clean
Remove portability code which is no longer needed
2017-06-05 21:01:46 +02:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
Stefan Weil
5f8ecdb2b3 Remove local implementation of strtok_r
MS Visual Studio does not provide that function, but can use strtok_s
which does exactly the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 19:52:25 +02:00
Egor Pugin
22bcf4d1a2 Merge pull request #979 from stweil/update
Update from Leptonica 1.74.1 to 1.74.2
2017-06-05 16:03:15 +03:00
Stefan Weil
a2404ae735 Fix Travis CI for Leptonica 1.74.2
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 13:30:49 +02:00
Stefan Weil
44a5e3da40 Update from Leptonica 1.74.1 to 1.74.2
The newer version contains fixes for the pixUnsharpMaskingGray*
functions which are relevant for Tesseract (used in ImageData::PreScale
which calls pixScale).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 10:31:53 +02:00
Stefan Weil
34d1e7331d LSTMTrainer: Catch empty vectors
The new test in LSTMTrainer::UpdateErrorGraph fixes an assertion
(see issues #644, #792).

The new test in LSTMTrainer::ReadTrainingDump was added to improve
the robustness of the code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-04 18:18:16 +02:00
zdenop
1e5522d321 Merge pull request #975 from stweil/ocl
Clean OpenCL code
2017-06-03 19:55:44 +02:00