Commit Graph

3759 Commits

Author SHA1 Message Date
chrismamo1
6f281c36a7 fix a problem I introduced in a previous commit 2017-08-12 18:09:22 -05:00
chrismamo1
7111167497 fix a set-but-not-used warning and add casts for comparing signed+unsigned numbers 2017-08-12 17:53:28 -05:00
chrismamo1
b89bb09f9b fix a set but not used warning and cleanup some old code from 2007 2017-08-12 17:48:33 -05:00
chrismamo1
f9b51d7983 suppress a strict aliasing warning; the original author was very clear about the nature of the problematic code 2017-08-12 17:36:50 -05:00
chrismamo1
5fd3e22f74 move code around so that list-langs will work without an English traineddata file 2017-08-12 17:15:27 -05:00
Stefan Weil
cc0d87c5b8 List available languages recursively
Tesseract supports hierarchies of languages and uses them since
the new files best/*.traineddata were added.

Now `tesseract --list-langs` also shows any traineddata files in
subdirectories of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-10 18:55:38 +02:00
Egor Pugin
efa50daf5a Merge pull request #1070 from stweil/resolution
Change default resolution from 70 to 300 dpi
2017-08-08 23:05:14 +03:00
Stefan Weil
0720b3f38b Change default resolution from 70 to 300 dpi
The default resolution is used for images without an explicit resolution
or with an unreasonable resolution (smaller than 70 or larger than 2400).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-08 16:48:10 +02:00
Ray Smith
5f5e85e4a0 Fixed lack of error on non-existent traineddata 2017-08-07 09:58:43 -07:00
Ray Smith
0a91498195 Improved error message on missing optional config 2017-08-07 09:50:49 -07:00
Ray Smith
4b3c5f6c35 Added check for non-empty traineddata flag 2017-08-07 09:43:30 -07:00
Egor Pugin
c67c2e9f41 Add combine_lang_model to cmake and cppan builds. 2017-08-06 14:46:32 +03:00
zdenop
08ec5775a1 Merge pull request #1064 from stweil/win32
Fix broken build for Windows
2017-08-04 10:50:01 +02:00
Stefan Weil
cdec915e17 Fix broken build for Windows
Windows does not provide a mkdir function with two parameters.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-04 10:18:35 +02:00
Ray Smith
8e55e52be7 Harder unittest that uses file i/o and string manipulation 2017-08-03 15:51:18 -07:00
Ray Smith
4572940639 Portability fix to help tests compile with the same code in both Google and github 2017-08-03 15:42:26 -07:00
Ray Smith
2fbcba62e5 Initial push of one simple unittest 2017-08-02 17:35:29 -07:00
Ray Smith
77c44cdecd Added convert to int and directory listing to combine_tessdata 2017-08-02 14:53:07 -07:00
Ray Smith
2ef1aeaeb4 Added AVX2 and AVX512 detector 2017-08-02 14:15:50 -07:00
Ray Smith
39b168a0b6 Removed errors introduced by git merge 2017-08-02 14:12:45 -07:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
2633fef0b6 Part 2 of separating out the unicharset from the LSTM model, fixing command line for training 2017-08-02 13:29:23 -07:00
Egor Pugin
61adbdfa4b Merge pull request #1054 from tdhintz/master
std::max build fix.
2017-07-27 02:49:21 +03:00
Hintz
67314ea9bd Merge pull request #1 from tdhintz/tdhintz-stdmax-patch
Define std::max under VS2017 x64
2017-07-26 16:40:08 -05:00
Hintz
c5a861b229 Define std::max under VS2017 x64 2017-07-26 17:19:40 -04:00
Ray Smith
0e95e2ca87 Rewrote the recoder to use an encoding based on wubi instead of radical-stroke index, changed from normalized to unnormalized unichar representation 2017-07-25 09:40:44 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Stefan Weil
99755b0732 googletest: Add dummy test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
796cd7ab56 cmake: Add googletest
The submodule is build automatically as soon as it exists.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Stefan Weil
f36dc34c4f Add googletest submodule
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-24 19:45:06 +02:00
Ray Smith
4efc539f51 clang tidy on previous pull 2017-07-19 17:04:49 -07:00
Ray Smith
4e8018d013 Important fix to RTL languages saves last space on each line, which was previously lost 2017-07-19 17:04:06 -07:00
Ray Smith
3f7735492f Removed unnecessary using statements and cleaned up google/non-google distinction 2017-07-19 16:42:48 -07:00
Ray Smith
cec1037260 Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original 2017-07-19 16:28:26 -07:00
Egor Pugin
66e686a0e6 Merge pull request #1041 from stweil/leptonica
Use lept_free to free memory allocated by Leptonica
2017-07-16 18:04:54 +03:00
Egor Pugin
900bf6076f Merge pull request #1040 from stweil/clean
Delete unused code in PangoFontInfo
2017-07-16 14:21:08 +03:00
rays
45fb7dde49 Fixed regression of issue #644 again! 2017-07-15 23:36:58 -07:00
Stefan Weil
ba95a686aa Use lept_free to free memory allocated by Leptonica
This fixes problems on Windows when Tesseract and Leptonica use different
C runtime libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 08:34:18 +02:00
Stefan Weil
5a7b7ed7e1 PangoFontInfo: Remove unused method is_italic
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:22:05 +02:00
Stefan Weil
0cd71c67c9 PangoFontInfo: Remove unused method is_bold
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:59 +02:00
Stefan Weil
fbfbf67cf9 PangoFontInfo: Remove unused method is_smallcaps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:49 +02:00
Stefan Weil
500f913b51 PangoFontInfo: Remove unused method is_monospace
Remove also some macros which are no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:21:35 +02:00
Stefan Weil
059e30d4cb PangoFontInfo: Remove unused method is_fraktur
That restores commit 25e0c1accb and
partially revert commit 4907a23fea
which added the now unused Shlwapi library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 07:16:38 +02:00
rays
f4f66f8fa9 Fixed regression of issue #644 2017-07-15 17:21:47 -07:00
zdenop
4b6f0b9538 Merge pull request #1039 from stweil/clean
Fix compiler warnings
2017-07-15 20:18:50 +02:00
Egor Pugin
4907a23fea Fix windows build. 2017-07-15 15:09:00 +03:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Stefan Weil
fa9e43fdde Fix wrong data type in argument for sscanf
Compiler warning:

    ccutil/unicharcompress.cpp:76:76: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]
    ccutil/unicharcompress.cpp:80:31: warning: format ‘%x’ expects argument of type ‘unsigned int*’, but argument 3 has type ‘int*’ [-Wformat=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Stefan Weil
f527715933 Fix type of bit values (fixes compiler warning)
A single bit cannot represent a signed integer value, so it must be an
unsigned integer. This fixes a compiler warning:

    In file included from ../ccutil/clst.h:24:0,
                     from ../ccstruct/blobbox.h:23,
                     from workingpartset.h:24,
                     from workingpartset.cpp:21:
    ../ccstruct/blobbox.h: In member function ‘void BLOBNBOX::set_reduced_box(TBOX)’:
    ../ccutil/host.h:79:25: warning: overflow in conversion from ‘int’ to ‘signed char:1’ changes value from ‘1’ to ‘-1’ [-Woverflow]
     #define TRUE            1
                             ^
    ../ccstruct/blobbox.h:236:17: note: in expansion of macro ‘TRUE’
           reduced = TRUE;
                     ^~~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00