Commit Graph

1817 Commits

Author SHA1 Message Date
Stefan Weil
5d99041f5d Remove unused function Wordrec::merge_fragments
Remove also more functions which are now also unused.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
f1c8df0ce9 Remove unused global variable fx_debug
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
16fd1439fa Write image filename in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
5f10fed5d9 Reduce size of TessResultRenderer
Changing the order reduces the size from 72 to 64 bytes
on 64 bit Linux.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
a73e7b97a4 Add float dotproduct implementation for NEON
Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
2021-08-03 10:35:22 +02:00
Stefan Weil
bb4a1219d7 Improve setting of dot product functions via environment variable
Apply the settings which are selected by environment variable DOTPRODUCT
after the autodetection which detects the available SIMD hardware.

'accelerate', 'fma' and 'std::inner_product' now no longer change
the setting for intSimdMatrix to 'generic' because they don't provide
their own implementation for it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 10:34:33 +02:00
Stefan Weil
edcf4fcd3b Fix comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
0d0f203509 Add new configure option --enable-float32 for faster LSTM with float
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-29 06:49:08 +02:00
Stefan Weil
553ab64d8d Rename UnicityTable<T>::get_id to UnicityTable<T>::get_index
This prepares replacing UnicityTable<FontInfo> by FontInfoTable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-26 07:59:58 +02:00
Stefan Weil
df1295ea6b
Simplify *_VAR_H macros (#3508)
This avoids duplicate (and potentially inconsistent) code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 12:09:07 +03:00
Ger Hobbelt
27597883db Implement DotProductSSE() for FAST_FLOAT
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
79e8b4f344 bugfixing the AVX2 Extract8+16 codes
There's lines like `__m256d scale01234567 = _mm256_loadu_ps(scales)`,
i.e. loading float vectors into double vector types.

[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
24a29b79e5 bugfix of FMA port to FAST_FLOAT
8 float FPs fit in a single 256bit vector (8x32)
(contrasting 4 double FPs: 4*64)

[sw] Format commit message and use float instead of TFloat
2021-07-24 15:14:17 +02:00
Stefan Weil
472f5d9020 Add TFloat data type for neural network
Up to now Tesseract used double for training and recognition
with "best" models.

This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.

Ideally this should allow faster training.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 15:14:17 +02:00
Stefan Weil
66b77e6639 Prepare using float instead of double for LSTM calculations
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).

FloatToDouble is now a local function.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 13:59:37 +02:00
Stefan Weil
4df822a3fc
Revert "Merge pull request #3330 from Sintun/master" (#3505)
This reverts commit 122daf1d64, reversing
changes made to 4cd56dc5f5.

Those changes caused two regressions which resulted in an assertion
or a segmentation fault.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-22 09:04:23 +03:00
Stefan Weil
e176169a90 Remove stray spaces at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:59:15 +02:00
Ger Hobbelt
444fe14273 Fix a couple of 'shadowed local variables' compiler warnings
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)

[sw]: Format commit message and use different fix for blamer.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
0fc6d8d7f0 Add missing hint for dotproduct parameter value "fma"
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:44:29 +02:00
Ger Hobbelt
f72d4b1fe7 NEON arch: dead ref cycle fix
When neon_available_ is ON, the DotProduct was set to point to DotProduct,
which should have been DotProductNative, as dotProduct is the *target* global itself:
see simddetect.h --> effectively making that part of the SetDotProduct() call
identical to this (no-op) statement: `DotProduct = DotProduct;`

Also added the Neon check in the Update() API, where it exists together
with the other checks (for AVX/SSE/etc.)

[sw: formatted commit message and merged into main branch]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:40:16 +02:00
Stefan Weil
dff7312aed Modernize code in SIMDDetect::Update
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:16:49 +02:00
Stefan Weil
3ab8dcbf72 Use Apple Accelerate framework for training and best models
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 19:27:54 +02:00
Johannes Künsebeck
3be11f12a9 Removed unused parameters declarations and definitions 2021-07-20 15:08:10 +02:00
zdenop
8dd7936475
Solve clang reporting unused variable in ExtractMicros function (#3501)
* mark attribute as unused for compiler
* try c++17 standard https://en.cppreference.com/w/cpp/language/attributes/maybe_unused
2021-07-18 01:59:49 +02:00
nagadomi
7fe0624838
Fix spec string of convolution layer (#3499) 2021-07-16 18:21:52 +03:00
Stefan Weil
88d4028a5a Enable pragma for SIMD also when _OPENMP is defined
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-15 16:03:43 +02:00
Stefan Weil
f0fb6809e3 Use SIMD instructions for DotProductNative
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-14 19:13:01 +02:00
Tadahito Yao
12e0fb4e01
Fix deadlock in lstmtraing. (#3488) 2021-07-10 10:59:10 +03:00
Stefan Weil
767fb5a177 Fix LSTMTrainerTest.BidiTest
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-04 18:41:19 +02:00
Stefan Weil
158c845228 Catch another FP division by 0 (fixes issue #3483)
Rewriting the code avoids FP operations (so makes it potentially faster)
and fixes the division by 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-03 15:37:24 +02:00
Stefan Weil
4b630a8813 Catch FP division by 0 (fixes issue #3483)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-02 15:04:31 +02:00
Stefan Weil
a701454ae5
Fix vector resize with init for all elements (issue #3473) (#3474)
Fixes: c8b8d266d6
Fixes: 9710bc0465
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-06-29 21:05:29 +03:00
nagadomi
ff1062d39d
Add --reset_learning_rate option to lstmtraining (#3470)
When the --reset_learning_rate option is specified,
it resets the learning rate stored in each layer of the network
loaded with --continue_from to the value specified by the --learning_rate option.
If checkpoint is available, it does nothing.
2021-06-28 11:48:07 +03:00
nagadomi
d8bd78f8e2
Fix missing reset of best_error_history_ in LSTMTrainer::InitIterations() (#3469) 2021-06-27 09:26:32 +03:00
nagadomi
b2fa77f8f0
Show layer specified learning rates with combine_tessdata -l (#3468) 2021-06-26 08:08:54 +03:00
MonkeybreadSoftware
75e6c3ea4c
Null check for GetSourceYResolution (#3457)
* Null check for GetSourceYResolution

Added missing NULL check to avoid crash when we read property in our tesseract wrapper.

* Added missing return value.

added -1 to return if undefined.
2021-06-16 16:35:24 +03:00
Amit Dovev
bf979c801a Remove unused variable 2021-05-21 20:34:09 +03:00
Egor Pugin
a72408fdef
Merge pull request #3438 from amitdo/pango
Raise Minimum required Pango version to 1.38.0
2021-05-21 20:09:27 +03:00
Amit Dovev
8615f65cc4 Raise Minimum required Pango version to 1.38.0 2021-05-21 19:56:37 +03:00
Amit Dovev
c24538518c ThresholdMethod::TiledSauvola -> ThresholdMethod::Sauvola
The fact that this method uses tiles is implementation detail. It does not change the result compared to Sauvola without tiles. The use of tiles minimize memory consumption.
2021-05-21 18:15:30 +03:00
Stefan Weil
93348a83a3 Remove scripts for training
They were replaced by Python3 scripts (part of the tesstrain repository).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-18 10:47:44 +02:00
nagadomi
42e4b91132 Refactor ObjectCache::DeleteUnusedObjects with reverse iterator 2021-05-17 14:50:30 +02:00
nagadomi
dc4a8a6ce0 Fix crash in ObjectCache::DeleteUnusedObjects 2021-05-17 10:25:17 +09:00
Stefan Weil
0c4e2f1cb5 Fix comment in code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-16 07:47:19 +02:00
Stefan Weil
57b7974292 Remove an arbitrary limit for the image size
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
a0cf117c5d Fix compiler warning in binarization code (uninitialized local variable)
Simplify the code also a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
bf84fb9f2d Optimize code for binarization
Some code is only needed for Otsu or even not at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
4b5dd25b84 Fix compiler warning
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
12c29639fc Add conditional compilation with GRAPHICS_DISABLED
This fixes a compiler warning when GRAPHICS_DISABLED is defined.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-13 17:22:24 +02:00
Nick White
ad7010a5eb lstmeval: Only print char and word error rates for verbosity 2/3 2021-05-11 13:15:35 +01:00