Commit Graph

620 Commits

Author SHA1 Message Date
Stefan Weil
ddea230b1b Don't compute function tables at compile time with clang
The current code fails to compile with clang compilers on Linux and macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-17 08:38:42 +01:00
zdenop
15f2a4b2c1
Merge pull request #2231 from Shreeshrii/wordstr
Add renderer to create WordStr box files from images
2019-02-16 13:48:06 +01:00
Stefan Weil
862322c18c Fix check for images which are too small to scale
Images with width == min_width are not too small.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-15 13:53:11 +01:00
Stefan Weil
c0523ee5a2 Fix compiler warning
g++ warning:

    src/lstm/functions.h:152:35: warning:
        unused parameter ‘x’ [-Wunused-parameter]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-14 10:29:39 +01:00
Stefan Weil
3556152412 Compute function tables at compile time
This requires C++ 14. Older compilers still use the old code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-14 10:29:39 +01:00
Stefan Weil
f491eb6188 Simplify tanh and logistic functions and precompute function tables
Both functions are called very often, so computing the table values
at program start should be faster than computing them on demand.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-12 12:04:08 +01:00
Shree Devi Kumar
f3362a4b5b Add renderer to create WordStr box files from images 2019-02-10 19:59:17 +00:00
zdenop
2ae65b2493
Merge pull request #2216 from Shreeshrii/lstmbox
Lstmbox
2019-02-10 13:53:41 +01:00
Shree Devi Kumar
311053681c put common code in AddBoxToLSTM 2019-02-10 09:16:45 +00:00
zdenop
e51f1885e6
Merge pull request #2229 from stweil/warn
Fix some compiler warnings
2019-02-10 08:20:23 +01:00
Shree Devi Kumar
b51c1bf05a change to const char* as suggested by @stweil 2019-02-10 05:13:18 +00:00
Stefan Weil
0c9f7db536 Fix compiler warning (-Wimplicit-fallthrough)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:53:44 +01:00
Stefan Weil
d91c316ab1 FontInfo: Make sure that deleted member variables can no longer be used
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Stefan Weil
877e62db55 Fix compiler warning (-Wmaybe-uninitialized)
gcc warning:

    src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]

It's a false positive, but setting the variable to 0 satisfies the compiler.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Stefan Weil
33f6dc2a67 Fix compiler warnings (-Wformat-truncation=)
gcc warnings:

    src/viewer/scrollview.cpp:404:31: warning: ‘%s’ directive output may be
        truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=]
    src/viewer/scrollview.cpp:572:31: warning: ‘%s’ directive output may be
        truncated writing up to 4095 bytes into a region of size between 4084 and 4093 [-Wformat-truncation=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Stefan Weil
2a355ea103 Fix compiler warnings (-Wimplicit-fallthrough)
gcc warnings:

    src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    [...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Stefan Weil
aa2dcca295 Fix compiler warnings (-Wstringop-truncation)
gcc warnings:

    src/api/tesseractmain.cpp:252:14: warning:
        ‘char* strncpy(char*, const char*, size_t)’ specified bound 255
        equals destination size [-Wstringop-truncation]
    src/ccutil/unicharset.h:66:12: warning:
        ‘char* strncpy(char*, const char*, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation]
    src/ccutil/unicharset.cpp:806:12: warning:
        ‘char* strncpy(char*, const char*, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:09 +01:00
Stefan Weil
d42413dd17 OpenCL: Remove PERF_COUNT framework
It was rarely used, but added a lot of code and an unconditional
dependency on openclwrapper.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 10:58:15 +01:00
Shree Devi Kumar
0f42fd8c69 change to use bbox coordinates for TEXTLINE for all characters
(cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)
2019-02-05 14:03:29 +00:00
Shree Devi Kumar
9c89cd51cf Add a new renderer to create box files from images for LSTM training
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)

fix typo

(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)

Add lstmboxrenderer to CMakeLists

(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)

fix formatting

(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Shreeshrii
c28a68115e
Merge branch 'master' into boxtiff 2019-02-02 23:42:39 +05:30
Shree Devi Kumar
d9590f8adf allow user specified box/tiff pairs with tesstrain.sh 2019-02-02 11:35:45 +00:00
Shree Devi Kumar
323361b902 allow user specified box/tiff pairs with tesstrain.sh 2019-02-02 11:33:32 +00:00
Shree Devi Kumar
ad223296af use --xsize instead of --x_size
(cherry picked from commit 94b8988b8cca3812137933db00750bd6e2e84e32)
2019-02-02 11:08:34 +00:00
Mikhail Akopov
7be04342cf Fix typo 2019-02-01 09:58:44 +01:00
Stefan Weil
b49806766e Fix AVX2 support for Windows builds with MSC
It was never detected, so the existing code for AVX2
was compiled but never used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-30 11:40:17 +01:00
Shree Devi Kumar
4d9bc11fd3 add --xsize as parameter for tesstrain 2019-01-27 07:00:25 +00:00
zdenop
12c1abcb6b
Merge pull request #2189 from stweil/fix
Fix memory leak for PNG images
2019-01-24 07:59:55 +01:00
zdenop
059c50be8c
Merge pull request #2184 from stweil/tests
Fix and enable stringrenderer_test
2019-01-24 07:59:07 +01:00
Stefan Weil
9e6e3a0232 Fix memory leak for PNG images
Commit 5fe1390748 used an implementation
which created a new Pix object. That object was never destroyed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 20:05:10 +01:00
Diego de la Hera
1a398a5b5d removed reference to unbound variable 2019-01-23 15:04:16 -03:00
Stefan Weil
ecf73f5bc7 training: Don't terminate after processing 8 fonts or 8 images
tesstrain_utils.sh sets the shell flag -e, so it exits immediately
if a command exits with a non-zero status.

The following command returns a non-zero status as soon as counter is a
multiple of par_factor (par_factor=8, that means as soon as 8 fonts or
images are processed):

    let rem=counter%par_factor

The new code fixes this undesired exit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 17:26:40 +01:00
Stefan Weil
32e9d7c8f5 training: Fix some compiler warnings (signed/unsigned)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 13:55:13 +01:00
Stefan Weil
e4b862d588 pango_font_info: Fix runtime error messages from Pango
pango_coverage_get and pango_coverage_unref should not be called
with coverage == nullptr.

pango_font_get_coverage should not be called with font == nullptr.

Otherwise Pango prints runtime error messages:

    (process:12657): Pango-CRITICAL **: pango_coverage_get: assertion 'coverage != NULL' failed
    (process:12657): Pango-CRITICAL **: pango_coverage_unref: assertion 'coverage != NULL' failed
    (process:12657): Pango-CRITICAL **: pango_font_get_coverage: assertion 'font != NULL' failed
    (process:12657): GLib-GObject-CRITICAL **: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

Typically those errors occur if a required font is not installed,
so this can be a quite common error.

Fix also a potential resource leak in PangoFontInfo::CoversUTF8Text.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 13:55:13 +01:00
Shree Devi Kumar
77d0b6ce8e fix WORDLIST filename 2019-01-22 15:49:55 +01:00
Stefan Weil
564482db30 Fix selection of IntSimdMatrix method
Commit d36231e3e4 did not distinguish
between AVX and AVX2, so AVX2 code was enabled for IntSimdMatrix
even when only AVX was supported.

This resulted in an illegal instruction.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-20 22:13:04 +01:00
Stefan Weil
66e31bfd8c OpenCL: Fix alloc-dealloc mismatch
Bug message from AddressSanitizer:

    ==7153==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs free) on 0x602000072cb0
        #0 0x7ffff70c6a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10)
        #1 0x555557188638 in writeProfileToFile ../../../../../src/opencl/openclwrapper.cpp:541

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-19 08:06:26 +01:00
Stefan Weil
ad19183b92 OpenCL: Fix heap buffer overflow
Bug message from AddressSanitizer:

    ==6158==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fffe774b7fc at pc 0x555557086b54 bp 0x7fffffffcee0 sp 0x7fffffffced8
    READ of size 1 at 0x7fffe774b7fc thread T0
        #0 0x555557086b53 in tesseract::HistogramRect(Pix*, int, int, int, int, int, int*) ../../../../../src/ccstruct/otsuthr.cpp:163

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-19 07:58:16 +01:00
Stefan Weil
502bb624c2 More optimisations for IntSimdMatrix
* Move IntDotProductSSE. That allows inlining of the code.
* Improve IntDotProductSSE by moving some instructions.
* Remove unused num_input_groups_ from IntSimdMatrix.
* Re-order elements in IntSimdMatrix to avoid padding.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
95606398f5 Clean code for IntSimdMatrix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
7fc7d28dd0 Compile files for AVX, AVX2 or SSE only when needed
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
a9a1035e55 Move IntSimdMatrixNative from IntSimdMatrix to unittest
It is only used for the unit test.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
d36231e3e4 Set best or user selected IntSimdMatrix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
605b4d66c7 Replace dynamically allocated IntSimdMatrix instances by constants
Two header files are no longer needed and could be removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
26be7c5d2e Use constructor with parameters for IntSimdMatrix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
e237a38405 Add const attributes to IntSimMatrix multiplier
IntSimMatrix no longer contains variable members.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
7c70147701 Move shaped weights from IntSimMatrix to WeightMatrix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
ea4d0d354b Format comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
Stefan Weil
c79d613b65 Replace ASSERT_HOST by assert
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
zdenop
f75b2c1948
Merge pull request #310 from nickjwhite/hocrcharboxes
Character boxes in hOCR output
2019-01-14 19:19:04 +01:00