Commit Graph

1275 Commits

Author SHA1 Message Date
Egor Pugin
c3e04abe1e Inherit STRING from std::string. 2020-12-26 03:48:35 +03:00
Egor Pugin
4fc467a922 Inherit GenericVector from std::vector. Inherit kdpairs from std::pair. Rewrite some move ctors to modern C++ style. 2020-12-26 03:23:09 +03:00
Egor Pugin
04d3cfcf2f Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2020-12-26 00:55:37 +03:00
Egor Pugin
79a86f2582 Move all tesseract symbols into tesseract namespace. Fix include order in many places. 2020-12-26 00:55:30 +03:00
zdenop
ceadc4ddb8 remove inline declaration 2020-12-25 16:28:00 +01:00
Egor Pugin
14d52a79ba Remove .rc files. No need to add them into dll/exe. 2020-12-25 18:06:35 +03:00
zdenop
044921267f embed pdf.ttf to tesseract library #2551 2020-12-25 13:20:36 +01:00
Stefan Weil
cc133aa394 Fix text for fonts_dir parameter
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
34abba8698 Add terminating linefeed to fonts.conf
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
17a64eef1e Simplify code for PangoFontInfo::HardInitFontConfig
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
707ee70966 Use deprecated pango_fc_font_get_glyph for old Pango versions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 12:02:37 +01:00
Stefan Weil
f759142c95 Remove buggy Windows implementation for getting glyph from font
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:07:09 +01:00
Stefan Weil
7669d36a37 Use HarfBuzz instead of deprecated pango_fc_font_get_glyph
This fixes the crash on MacOS with M1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:03:05 +01:00
Stefan Weil
8c859a7329 Fix type cast from PangoFont to PangoFcFont
The original code crashes in pango_fc_font_get_glyph on MacOS with M1.

Replacing the type cast with the macro made for that conversion
gives at least an error message before crashing:

    (process:12546): GLib-GObject-WARNING **: 08:38:02.472: invalid cast from 'PangoCairoCoreTextFont' to 'PangoFcFont'
    zsh: segmentation fault  ./pango_font_info_test

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 08:45:11 +01:00
Stefan Weil
3efedabda3 automake: Flat build for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-19 15:25:21 +01:00
Stefan Weil
6fcf8d23bc Use more compiler and linker flags from pkg-config
This fixes some build issues with Homebrew on MacOS.

Signed-off-by: Stefan Weil <stefan@Sabines-Mac-mini.fritz.box>
2020-12-13 13:24:46 +01:00
Stefan Weil
490bd3ec8f Fix build with enabled TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-04 10:56:23 +01:00
Stefan Weil
ac116d1b28 Fix regression in Network::Serialize (fix issue #3167)
The regression was caused by a wrong string serialization in
commit 4613738a5e.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-03 19:36:58 +01:00
zdenop
279b0b2e37
Merge pull request #3160 from stweil/string2
Replace more occurrences of STRING by std::string of char*
2020-11-27 18:24:17 +01:00
Stefan Weil
65b11a1e12 Pack class SVMenuNode
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
a1849bc65c Pack struct CLASS_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
0bb46ac2e0 Pack struct BlamerBundle
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
bf3774cc91 Use more const char*
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
4613738a5e Use const char* for filename and network_spec parameters
This replaces the proprietary STRING data type
(764 instead of 838 lines remaining).

It also removes STRING from osdetect.h and serialis.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
fbc4c809d9 Replace STRING by std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-31 14:08:39 +01:00
Stefan Weil
92b6c652f3 Use std::vector for scales_
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-29 08:00:11 +01:00
Stefan Weil
c15dd26b84 Don't pass scales_ to IntSimdMatrix::Init
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-28 20:35:53 +01:00
Stefan Weil
fe76142a3d Remove GenericVector::scale() again
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-28 16:24:59 +01:00
Stefan Weil
eaf72ace31 Prefer result from inverted image if the mean confidence is better
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-26 20:37:47 +01:00
Stefan Weil
cfb1fb2540 Try OCR on inverted line only if mean confidence is below 50 %
The old code looked for the minimum confidence which triggered
very often a 2nd OCR without improving the result.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-26 09:32:09 +01:00
Robin Watts
436008bd37 Tweak SIMDDetect for ANDROID Neon.
cpufeatures.h should be cpu-features.h, with the latest NDK
at least. The #if 0'd section is not required because armv8
always includes NEON.
2020-10-19 12:04:29 +01:00
Robin Watts
db10c7b577 intsimdmatrixneon.cpp: Do biasing in SIMD. 2020-10-12 04:30:46 -07:00
Robin Watts
d1e49d6dd2 intsimdmatrixavx2: Do biasing in SIMD.
We also move to relying on both scales and output having been
padded to accomodate us writing more results than are actually
needed here. This was allowed for a few commits back.
2020-10-12 04:30:46 -07:00
Robin Watts
872816897a Rejig intsimdmatrix to reduce FP ops.
Avoid 1) floating point division by 127, 2) conversion of
bias to double, 3) FP addition, in favour of 1) integer
multiplication by 127, and 2) integer addition.

(Also costs extra work in the serialisation/deserialisation of
the scale values, and conversion of weights to int formats, but
these are all one offs).
2020-10-12 04:30:46 -07:00
Robin Watts
aba1800f69 Round output buffers for intSimdMatrix.
In order to allow intSimdMatrix implementations to 'overwrite'
their outputs, ensure that the output buffers are always padded
to the next block size.

This doesn't make any difference yet, but it enables optimisations
further down the line, especially when the biasing is pulled into
the SIMD.
2020-10-12 11:47:16 +01:00
Robin Watts
9dfdac51c6 Tweak scales array for intSimdMatrix case.
Currently, the size of the scales array is not rounded up
in the same way as the weights are. This blocks us pushing
the scale calculations into the SIMD, as when we "overread"
the end of the scale array, we potentially get errors.

Here, we adjust the intSimdMatrix stuff to ensure that the
scales array reserves enough entries to allow such overreads
to work.

This doesn't make any difference for now, but opens the way
for future optimisations.
2020-10-12 11:47:16 +01:00
amitdo
958f23453e Improve disabled legacy engine build 2020-10-12 11:47:16 +01:00
Stefan Weil
ac14ab32c6 Remove dummy functions from globaloc.cpp and related code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-04 12:24:26 +02:00
Stefan Weil
7c4ef88dab Remove unused functions FontUtils::GetAllRenderableCharacters
They used the function pango_coverage_max which does nothing and
which has been deprecated since pango version 1.44.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-03 12:04:40 +02:00
Le Duc Nam
eb8f1674bf Correct "NoImages" in debug pdf file
Issues:
  Debug information for "NoImages" just be binary image,
  it don't show up the result of photo_mask_pix to developer

Fix:
  Substract binary image to photo_mask_pix, the result
  are "NoImages" binary pix
2020-09-06 23:31:30 +07:00
Robert Sachunsky
640c14e080 AutoPageSeg/FindBlocks/GridRemoveUnderlinePartitions: avoid self-deletion
When checking horizontal line partitions for
possible interpretation as underline formatting,
avoid confusing the hline partition itself with
an overlapping neighbour (which would delete it).
2020-08-24 19:13:48 +02:00
Robert Sachunsky
65a077d3e9 FindAndRemoveLines/FindVerticalAlignment: decrease fixed vline min length
When detecting vertical separators, the blob aligner is used to glue
line segments (often segmented due to artificial cracks).
But (unlike LineFinder) it has many parameters that are not
relative to pixel density/resolution.
This change decreases the minimum absolute length in pixels
for vertical separators.
2020-08-24 19:13:36 +02:00
Robert Sachunsky
0228d93684 textord debugging: invert default top/bottom bounaries, improve description 2020-08-24 19:13:25 +02:00
Stefan Weil
d33edbc4b1
Merge pull request #3066 from robinwatts/pushback14
Remove unused char constant that causes a warning.
2020-07-17 15:55:51 +02:00
Robin Watts
578462109b Remove unused char constant that causes a warning.
The kDictWildcard is never actually used, so removing it makes
no difference. It causes warnings in MSVC builds as MSVC doesn't
know how to pack a unicode value into chars.
2020-07-17 14:22:37 +01:00
Robin Watts
150e2e54fe Squash some warnings in MSVC build.
In particular, "defined but not used" (caused by GRAPHICS_DISABLED),
double constants being truncated to floats, and implicit casts.
2020-07-16 10:08:40 +01:00
zdenop
7fa200bfb7
Merge pull request #3064 from robinwatts/pushback12
Fix Memory leak when using TESSERACT_IMAGEDATA_AS_PIX
2020-07-15 19:08:58 +02:00
Robin Watts
7f45b719d1 Fix Memory leak when using TESSERACT_IMAGEDATA_AS_PIX
If building with TESSERACT_IMAGEDATA_AS_PIX, then tesseract
doesn't compress/decompress images, but rather holds the
data as internal Pix structures. Unfortunately, I forgot to
make the ImageData destructor free these, so memory leaked
during use. Fixed here.
2020-07-15 12:35:35 +01:00
zdenop
135c8a49b5
Merge pull request #3061 from stweil/neon
Always use NEON by default for ARMv8
2020-07-11 09:11:54 +02:00
zdenop
875bd48bd5
Merge pull request #3058 from stweil/scrollview
Disable more code and data with GRAPHICS_DISABLED
2020-07-11 09:11:27 +02:00