Commit Graph

5801 Commits

Author SHA1 Message Date
Egor Pugin
feb32ecbe5 Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2021-08-18 18:15:05 +03:00
Egor Pugin
6056c84977 [image] Mark PIX** cast explicit to prevent implicit bool checks in ternary operators. 2021-08-18 18:14:47 +03:00
Stefan Weil
547164edae Create new pre-release 5.0.0-beta-20210815
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-15 17:07:11 +02:00
Egor Pugin
536112ce6f [sw] Fix build. 2021-08-12 22:46:45 +03:00
Stefan Weil
59271470b4 Remove unneeded type cast
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 20:55:14 +02:00
Stefan Weil
aaec341449 Avoid call of ColumnFinder::DisplayBlocks (small optimization)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 15:23:44 +02:00
Stefan Weil
6da7d6fcda Optimize check for non empty string and fix code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:45:22 +02:00
Stefan Weil
92cae8f194 Optimize check for non empty string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:44:45 +02:00
Stefan Weil
63c12a9ee5 unittest: Enable more code for tatweel_test without requiring Tensorflow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:12:53 +02:00
Egor Pugin
c1180a8bc0
Merge pull request #3520 from stweil/unused
Remove some unused code
2021-08-10 23:36:34 +03:00
Stefan Weil
3ef403c345 Compile LSTM::PrintW and LSTM::PrintDW conditionally
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
5d99041f5d Remove unused function Wordrec::merge_fragments
Remove also more functions which are now also unused.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
f1c8df0ce9 Remove unused global variable fx_debug
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Egor Pugin
3178c49729
Merge pull request #3517 from stweil/alto
Write image filename in ALTO output and reduce size of renderer classes
2021-08-08 00:17:31 +03:00
Stefan Weil
16fd1439fa Write image filename in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
5f10fed5d9 Reduce size of TessResultRenderer
Changing the order reduces the size from 72 to 64 bytes
on 64 bit Linux.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
95223cfaab cmake: Link tiff library only for Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 19:57:24 +02:00
Stefan Weil
2215174951 unittest: Fix compiler warning for unused function
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 18:14:50 +02:00
Egor Pugin
3a68a80eed
Merge pull request #3516 from stweil/abseil
Remove submodule abseil
2021-08-07 15:05:29 +03:00
Egor Pugin
33fcb99d3a [sw] Do not build arm neon file. 2021-08-07 13:40:47 +03:00
Stefan Weil
49f410ced3 unittest: Remove dependency on absl::StripAsciiWhitespace()
This removes the last dependency on Abseil, so that submodule
is now removed completely.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:10 +02:00
Stefan Weil
87707bb8b0 unittest: Remove dependency on absl::StrSplit()
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
f407345cbe unittest: Remove dependency on absl::StrJoin()
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
61b8e301dd unittest: Remove dependency on absl::StrCat()
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
8486f59493 unittest: Remove dependency on absl::StrFormat()
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
fe5ca9dad9 unittest: Remove dependency on absl::GetCurrentTimeNanos()
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
6b8b1f0007 unittest: Remove some dependencies on abseil
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:59:09 +02:00
Stefan Weil
d50baec7a7 cmake: Add dotproductneon.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-06 20:55:40 +02:00
zdenop
7975fec2fd
Add new cmake option -DFAST_FLOAT=ON for faster LSTM with float (#3514)
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-08-05 21:35:54 +02:00
Stefan Weil
4c8799ac40 codeql-analysis: Disable analysis of Python code
It should be enabled again (and also the analysis for Java)
as soon as it does not compile all C++ code, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 11:53:51 +02:00
Stefan Weil
a73e7b97a4 Add float dotproduct implementation for NEON
Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
2021-08-03 10:35:22 +02:00
Stefan Weil
bb4a1219d7 Improve setting of dot product functions via environment variable
Apply the settings which are selected by environment variable DOTPRODUCT
after the autodetection which detects the available SIMD hardware.

'accelerate', 'fma' and 'std::inner_product' now no longer change
the setting for intSimdMatrix to 'generic' because they don't provide
their own implementation for it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 10:34:33 +02:00
Stefan Weil
2786a887cd Update codeql-analysis.yml for Tesseract autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:23:28 +02:00
Stefan Weil
ebae27435a Update codeql-analysis.yml for Tesseract autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
edcf4fcd3b Fix comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
40d12d0945
Create codeql-analysis.yml
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 12:54:30 +02:00
Amit D
c8bb526afb
Merge pull request #3510 from stweil/enable-float32
Add new configure option --enable-float32 for faster LSTM with float
2021-07-29 18:01:21 +03:00
Stefan Weil
0d0f203509 Add new configure option --enable-float32 for faster LSTM with float
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-29 06:49:08 +02:00
Stefan Weil
553ab64d8d Rename UnicityTable<T>::get_id to UnicityTable<T>::get_index
This prepares replacing UnicityTable<FontInfo> by FontInfoTable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-26 07:59:58 +02:00
Stefan Weil
c9f42ce62b
Add unittest for static TessBaseAPI object (#3509)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 14:34:43 +03:00
Stefan Weil
df1295ea6b
Simplify *_VAR_H macros (#3508)
This avoids duplicate (and potentially inconsistent) code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 12:09:07 +03:00
Amit D
e538cd7152
Merge pull request #3486 from stweil/tfloat
Add TFloat data type for neural network
2021-07-25 00:03:56 +03:00
Ger Hobbelt
27597883db Implement DotProductSSE() for FAST_FLOAT
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
79e8b4f344 bugfixing the AVX2 Extract8+16 codes
There's lines like `__m256d scale01234567 = _mm256_loadu_ps(scales)`,
i.e. loading float vectors into double vector types.

[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
24a29b79e5 bugfix of FMA port to FAST_FLOAT
8 float FPs fit in a single 256bit vector (8x32)
(contrasting 4 double FPs: 4*64)

[sw] Format commit message and use float instead of TFloat
2021-07-24 15:14:17 +02:00
Stefan Weil
472f5d9020 Add TFloat data type for neural network
Up to now Tesseract used double for training and recognition
with "best" models.

This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.

Ideally this should allow faster training.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 15:14:17 +02:00
Stefan Weil
66b77e6639 Prepare using float instead of double for LSTM calculations
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).

FloatToDouble is now a local function.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 13:59:37 +02:00
Stefan Weil
c3fb050daa Remove TODO comment which is no longer open
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 11:20:29 +02:00
Stefan Weil
4df822a3fc
Revert "Merge pull request #3330 from Sintun/master" (#3505)
This reverts commit 122daf1d64, reversing
changes made to 4cd56dc5f5.

Those changes caused two regressions which resulted in an assertion
or a segmentation fault.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-22 09:04:23 +03:00
Stefan Weil
e176169a90 Remove stray spaces at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:59:15 +02:00