Commit Graph

5372 Commits

Author SHA1 Message Date
zdenop
179c8b1295
Merge pull request #2617 from juliangilbey/fix-training-data-creation
fix #2616: allow building of training data
2019-09-12 14:36:45 +02:00
zdenop
598a37d717
Merge pull request #2645 from stweil/master
Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit
2019-09-12 14:28:26 +02:00
Stefan Weil
913cbe6eae Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit
The class no longer uses bit fields. Re-ordering the member variables
avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 09:07:48 +02:00
Stefan Weil
a922745d9a tfnetwork: Fix info text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-11 19:10:25 +02:00
Stefan Weil
e903eaea59 Re-order commands in autogen.sh
This avoids an unnecessary reconfiguration when running

    ./autogen.sh && ./configure && make

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-11 15:50:23 +02:00
zdenop
4ed587a57d
Merge pull request #2643 from stweil/UndefinedBehaviorSanitizer
Fix several runtime errors detected by UndefinedBehaviorSanitizer
2019-09-10 18:18:48 +02:00
Stefan Weil
5fa09f184f RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix this runtime error in recodebeam_test and unicharcompress_test:

    src/ccutil/unicharcompress.h:84:27: runtime error:
      left shift of 267 by 28 places cannot be represented in type 'int'

code has up to kMaxCodeLen (9) values, so the highest possible value for
i is 8, and the shift value can reach 7 * 8 = 56.

That requires an uint64_t data type.
size_t would fit for 64 bit hosts, but be too small for 32 bit hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
Stefan Weil
4a2d5a2e8d OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix this runtime error in osd_test and textlineprojection_test:

    src/ccmain/osdetect.cpp:109:14: runtime error: division by zero

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
Stefan Weil
5c6fade555 BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix these runtime errors in mastertrainer_test:

    src/ccutil/bitvector.cpp:119:18: runtime error:
      null pointer passed as argument 2, which is declared to never be null
    src/ccutil/bitvector.cpp:124:10: runtime error:
      null pointer passed as argument 1, which is declared to never be null

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
zdenop
98c7aaa343
Lstm choice ril (#2635)
Lstm choice ril
2019-09-06 19:12:00 +02:00
Stefan Weil
9f32032517 ccutil: Remove old comments
There is no CLIST2 in the current code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-05 17:52:42 +02:00
Stefan Weil
b6933a1082 Use type bool for boolean values in class BLOBNBOX
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-03 19:56:59 +02:00
Noah Metzger
c350077b96 Made the lstm_choice mode compatible with the hocr_char_boxes mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07 Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Stefan Weil
fdf4067296 Fix warnings from LGTM
This fixes three LGTM warnings:

    Multiplication result may overflow 'float' before it is converted to 'double'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 22:04:24 +02:00
Stefan Weil
4a434809b0 fuzzer-api: Use optional macro LIB_FUZZING_ENGINE for build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 15:32:33 +02:00
Stefan Weil
c460d19316 Add missing TensorFlow libraries for fuzzer-api
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 14:54:04 +02:00
Stefan Weil
dc90741f1b Fix crash when function lookup tables are accessed with NaN
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 13:42:09 +02:00
zdenop
d889a38f80
Merge pull request #2627 from stweil/master
capi: Add missing PSM_RAW_LINE to TessPageSegMode
2019-08-25 15:36:43 +02:00
Stefan Weil
7968f50fe6 capi: Add missing PSM_RAW_LINE to TessPageSegMode
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-25 09:08:09 +02:00
zdenop
0ded672067 fix typo 2019-08-18 18:47:32 +02:00
Egor Pugin
0a3a351cb3
Merge pull request #2620 from stweil/simd
simd: Check whether the OS supports FMA, AVX, ...
2019-08-17 08:31:54 +03:00
Stefan Weil
00cff79f7f simd: Check whether the OS supports FMA, AVX, ...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-16 22:51:17 +02:00
Stefan Weil
43b2e9513b lstmtrainer: Fix diagnostic message
Signed character values must be converted to unsigned integers for %x.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-15 14:31:32 +02:00
Stefan Weil
100d8cd29b lstmtester: Add missing space in log messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-14 14:12:47 +02:00
Stefan Weil
a86251c62b classify/Makefile: Fix inconsistent style
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-13 21:35:59 +02:00
Julian Gilbey
5a1978a4fc
fix #2616: allow building of training data
This fixes Issue #2616 by preventing an attempt to build the recognition engine when running tesstrain.sh.
2019-08-13 19:05:49 +01:00
Egor Pugin
423a188513 Export some classify vars. 2019-08-13 20:12:21 +03:00
Stefan Weil
46e2a0f106 Remove more code for builds with disabled legacy engine
Now the Tesseract library no longer includes unused code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-13 17:49:10 +02:00
Stefan Weil
f43ca88f29 [sw] Update build for commit e84cb24def
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 19:36:41 +02:00
Egor Pugin
f71e58c557 [sw] Try to fix build. 2019-08-12 19:50:22 +03:00
Egor Pugin
73f713519c
Merge pull request #2614 from stweil/training
Move source files which are used for training only to src/training
2019-08-12 19:35:50 +03:00
Egor Pugin
23afe833f9
Merge pull request #2613 from stweil/unused
Remove unused code
2019-08-12 19:33:52 +03:00
Stefan Weil
e84cb24def Move source files which are used for training only to src/training
They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 17:08:08 +02:00
Stefan Weil
ba17bc8204 OpenCL: Add static attribute for kernel_src
It is only used in openclwrapper.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 15:13:45 +02:00
Stefan Weil
970622fbd1 Remove unused functions create_edges_window, draw_raw_edge
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 15:04:10 +02:00
Stefan Weil
23e605911f Remove unused function truncate_path and related files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 14:48:56 +02:00
Stefan Weil
bce585286d Remove global array kPolyBlockNames from Tesseract library
It is only used in unittest/layout_test.cc after moving a test from
baseapi_test.cc to that file, so it can be made local.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 14:33:55 +02:00
Egor Pugin
c757b4ec19
Merge pull request #2612 from stweil/unicharset
Remove UNICHARSET::load_from_inmemory_file and related code
2019-08-12 14:50:28 +03:00
Stefan Weil
beec85e023 Remove UNICHARSET::load_from_inmemory_file and related code
The method was only used in unittest where it can be replaced by
UNICHARSET::load_from_file which also simplifies the code.

This allows removing the class InMemoryFilePointer and fixes a TODO.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 13:07:15 +02:00
Egor Pugin
ae020e7fbd [sw] Update build script. 2019-08-10 15:46:59 +03:00
Stefan Weil
315dd9df3f cmake: Don't link pthread on Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 15:24:00 +02:00
Stefan Weil
ab953c1d51 unittest: Fix build and simplify build rules
Now more tests (those which use fileio) depend on the training build.
This is required since commit c5a50b93ce.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 13:58:12 +02:00
Stefan Weil
9786b7276e Fix linker error in Appveyor CI
This completes commit c5a50b93ce.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 10:24:57 +02:00
Stefan Weil
b8079d8ce1 universalambigs: Add hack to fix builds with Microsoft compiler
The MS compiler only accepts string constants up to 65535 characters,
so shorten the string for that compiler to fix the compilation.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-06 15:46:07 +02:00
Egor Pugin
cb99fe9b41 [sw] Use the latest pango again. 2019-08-06 15:04:32 +03:00
Zdenko Podobný
c5a50b93ce move fileio.cpp and fileio.h to training (this fix android build) 2019-08-04 21:26:39 +02:00
zdenop
f1eb172cb6
Merge pull request #2602 from stweil/ambigs
Clean ambigs.h and replace octal characters by UTF-8 string in universalambigs
2019-08-04 20:08:03 +02:00
Stefan Weil
6acab45837 universalambigs: Replace octal characters by UTF-8 string
This improves readability and reduces the file size.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-04 19:21:59 +02:00
Stefan Weil
8127b4dd27 Clean ambigs.h
* Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator
* Move some declarations to ambigs.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-04 19:21:59 +02:00