Commit Graph

2643 Commits

Author SHA1 Message Date
Ray Smith
5f5e85e4a0 Fixed lack of error on non-existent traineddata 2017-08-07 09:58:43 -07:00
Ray Smith
0a91498195 Improved error message on missing optional config 2017-08-07 09:50:49 -07:00
Ray Smith
4b3c5f6c35 Added check for non-empty traineddata flag 2017-08-07 09:43:30 -07:00
Egor Pugin
c67c2e9f41 Add combine_lang_model to cmake and cppan builds. 2017-08-06 14:46:32 +03:00
zdenop
08ec5775a1 Merge pull request #1064 from stweil/win32
Fix broken build for Windows
2017-08-04 10:50:01 +02:00
Stefan Weil
cdec915e17 Fix broken build for Windows
Windows does not provide a mkdir function with two parameters.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-04 10:18:35 +02:00
Ray Smith
8e55e52be7 Harder unittest that uses file i/o and string manipulation 2017-08-03 15:51:18 -07:00
Ray Smith
4572940639 Portability fix to help tests compile with the same code in both Google and github 2017-08-03 15:42:26 -07:00
Ray Smith
2fbcba62e5 Initial push of one simple unittest 2017-08-02 17:35:29 -07:00
Ray Smith
77c44cdecd Added convert to int and directory listing to combine_tessdata 2017-08-02 14:53:07 -07:00
Ray Smith
2ef1aeaeb4 Added AVX2 and AVX512 detector 2017-08-02 14:15:50 -07:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Egor Pugin
61adbdfa4b Merge pull request #1054 from tdhintz/master
std::max build fix.
2017-07-27 02:49:21 +03:00
Hintz
67314ea9bd Merge pull request #1 from tdhintz/tdhintz-stdmax-patch
Define std::max under VS2017 x64
2017-07-26 16:40:08 -05:00
Hintz
c5a861b229 Define std::max under VS2017 x64 2017-07-26 17:19:40 -04:00
Ray Smith
0e95e2ca87 Rewrote the recoder to use an encoding based on wubi instead of radical-stroke index, changed from normalized to unnormalized unichar representation 2017-07-25 09:40:44 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Stefan Weil
99755b0732 googletest: Add dummy test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
796cd7ab56 cmake: Add googletest
The submodule is build automatically as soon as it exists.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
f36dc34c4f Add googletest submodule
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Ray Smith
4efc539f51 clang tidy on previous pull 2017-07-19 17:04:49 -07:00
Ray Smith
4e8018d013 Important fix to RTL languages saves last space on each line, which was previously lost 2017-07-19 17:04:06 -07:00
Ray Smith
3f7735492f Removed unnecessary using statements and cleaned up google/non-google distinction 2017-07-19 16:42:48 -07:00
Ray Smith
cec1037260 Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original 2017-07-19 16:28:26 -07:00
Egor Pugin
66e686a0e6 Merge pull request #1041 from stweil/leptonica
Use lept_free to free memory allocated by Leptonica
2017-07-16 18:04:54 +03:00
Egor Pugin
900bf6076f Merge pull request #1040 from stweil/clean
Delete unused code in PangoFontInfo
2017-07-16 14:21:08 +03:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
Stefan Weil
ba95a686aa Use lept_free to free memory allocated by Leptonica
This fixes problems on Windows when Tesseract and Leptonica use different
C runtime libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 08:34:18 +02:00
Stefan Weil
5a7b7ed7e1 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:22:05 +02:00
Stefan Weil
0cd71c67c9 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:59 +02:00
Stefan Weil
fbfbf67cf9 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:49 +02:00
Stefan Weil
500f913b51 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:35 +02:00
Stefan Weil
059e30d4cb PangoFontInfo: Remove unused method is_fraktur
That restores commit 25e0c1accb and
partially revert commit 4907a23fea
which added the now unused Shlwapi library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:16:38 +02:00
rays
f4f66f8fa9 Fixed regression of issue #644 2017-07-15 17:21:47 -07:00
zdenop
4b6f0b9538 Merge pull request #1039 from stweil/clean
Fix compiler warnings
2017-07-15 20:18:50 +02:00
Egor Pugin
4907a23fea Fix windows build. 2017-07-15 15:09:00 +03:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Stefan Weil
fa9e43fdde Fix wrong data type in argument for sscanf
Compiler warning:

    ccutil/unicharcompress.cpp:76:76: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]
    ccutil/unicharcompress.cpp:80:31: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Stefan Weil
f527715933 Fix type of bit values (fixes compiler warning)
A single bit cannot represent a signed integer value, so it must be an
unsigned integer. This fixes a compiler warning:

    In file included from ../ccutil/clst.h:24:0,
                     from ../ccstruct/blobbox.h:23,
                     from workingpartset.h:24,
                     from workingpartset.cpp:21:
    ../ccstruct/blobbox.h: In member function ‘void BLOBNBOX::set_reduced_box(TBOX)’:
    ../ccutil/host.h:79:25: warning: overflow in conversion from ‘int’ to ‘signed char:1’ changes value from ‘1’ to ‘-1’ [-Woverflow]
     #define TRUE            1
                             ^
    ../ccstruct/blobbox.h:236:17: note: in expansion of macro ‘TRUE’
           reduced = TRUE;
                     ^~~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
7588540296 Removed changes from last commit that didn't belong 2017-07-14 11:08:26 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Ray Smith
df41eab6aa Added script-specific validation and normalization for virama-using scripts and updated normalization for others 2017-07-14 10:05:05 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
zdenop
f5c18f78c0 Merge pull request #1018 from hotchkiss87/fix_file_names
fix filenames in comments (trivial)
2017-07-03 08:22:57 +02:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
zdenop
10779bd9e5 Merge pull request #1017 from hotchkiss87/correct_error_message_format
Add new line to a few error messages.
2017-07-01 17:47:58 +02:00