Commit Graph

2089 Commits

Author SHA1 Message Date
zdenop
3ca273f914 cmake silent message about changed behaviour 2021-10-28 12:07:53 +02:00
Stefan Weil
5cc649e5f9 Remove code which is wrong in combination with NFC
See comments in https://github.com/tesseract-ocr/tesseract/pull/3420.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:52:03 +02:00
Stefan Weil
5cee9a0cec Merge remote-tracking branch 'nickjwhite/nfc'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:40:02 +02:00
Stefan Weil
c602624012 Prepare support for image width and height larger than 32767 (continued)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:58:31 +02:00
Stefan Weil
59fbad0dd5 Prepare support for image width and height larger than 32767
Avoid using int16_t and use a new data type TDimension where needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:45:33 +02:00
Stefan Weil
56f54c24de Fix heap use after free (issue #3523)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 19:19:59 +02:00
Amit D
cea2a6015e
Thresholding: Improve some debug messages 2021-10-26 19:09:06 +03:00
Stefan Weil
d6de055acf Set default language for tesseract only if required
When running with --list-langs, --print-parameters or --print-fonts-table
no default language is needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
Stefan Weil
f5d22d0bcc Don't set a default language in TessBaseAPI::Init (API change)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
zdenop
48c5d426ca
Merge pull request #3609 from stweil/api
Remove TessBaseAPI::InitLangMod (API change)
2021-10-26 07:23:52 +02:00
Stefan Weil
255d7c9675 Fix CID 1400763 Using invalid iterator (fixes issue #2806)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 22:20:45 +02:00
Stefan Weil
c2df9ce57b Remove Tesseract::init_tesseract_lm which is no longer used
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
5738c44d40 Remove TessBaseAPI::InitLangMod (API change)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
cdd19d561b Remove old comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:26:32 +02:00
Amit Dovev
0aeb2e7913 Thresholding: Change smooth scaling logic
As suggested by @bertsky.
2021-10-15 19:34:39 +03:00
Amit D
9a1ad4333e
Apply suggestions from code review
Extend help message for 2 parameters

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
2021-10-15 18:14:49 +03:00
Amit D
0d2d6e3b2a
Fix a mismatch between tprintf format string and args 2021-10-14 20:56:48 +03:00
Amit Dovev
a268c3092f Thresholding: Change the window and tile size parameters to relative numbers
They are relative to the pixel density of the image.
2021-10-14 20:21:28 +03:00
Amit D
0d5705fe50
ThresholdMethod enum: AdaptiveOtsu -> LeptonicaOtsu (#3593) 2021-10-13 15:03:39 +03:00
Amit D
7f349a47b6
Fix a bug in the thresholder 2021-10-11 19:29:39 +03:00
Stefan Weil
d935502b48 Fix two LGTM alerts (Comparison between i of type int16_t and wider type int32_t)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:37:04 +02:00
Stefan Weil
4a56136d34 Disable conditional which is currently always false (reported by LGTM)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:21:12 +02:00
Stefan Weil
cc085f6bd6 Fix format string (reported by LGTM)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:19:58 +02:00
Stefan Weil
988102c41d Disable incomplete code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
842cca1d49 Fix more signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
86d981eee6 wordrec: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
cb10da06be training: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
5cce7342e5 textord: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3bb8263b3e lstm: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a274f4a531 dict: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
bcc71c675a classify: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
e1d7a21559 ccutil: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
97048fe3e4 ccstruct: Fix some signed/unsigned compiler warnings
Remove also a local buffer in function REJMAP::print.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
2e4bb8f5d7 genericvector: Change function size to return unsigned value
Sizes are generally unsigned in the C++ standard library,
and following this standard makes code changes easier.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
d040cce990 ccmain: Remove unused local variable
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
c8fd23d6dc ccmain: Fix more signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3a4828bcf4 ccmain: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a9c3f6d87f ccmain/paragraphs: Make local function UnicodeFor and fix signed/unsigned
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
4c36e2e29a Fix compiler warnings in TWERD::MergeBlobs and optimize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
0cdcd0f02b Remove unused code
Fixes: 766b7bd620 ("Don't drop words with low certainty")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
ca0e68f046 Avoid implicit conversions from float to double
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
9315d4c7e2 Change size and count arguments in TFile from int to size_t
This matches standard functions like fread, fwrite.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
85cb6678fa Replace new / delete by std::unique_ptr and std::vector in class Classify
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:08:12 +02:00
Stefan Weil
5d903da1ce Replace new / delete by std::vector in class Wordrec
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:07:25 +02:00
Stefan Weil
467f24c0b6 Replace new / delete by std::vector in class Trie
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:06:48 +02:00
Stefan Weil
ed1100832c Replace new / delete by std::vector in class WERD_CHOICE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:05:47 +02:00
Stefan Weil
0aad8b8619 Fix build with OpenCL and add namespace to OpenCL code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-06 07:51:03 +02:00
Amit D
0cb9c40528
Add configurable variables to control thresholding (#3577) 2021-09-29 23:17:22 +03:00
zdenop
ebb214c443 destroy temporary page_pix 2021-09-25 10:26:31 +02:00
Amit D
adaaef87a4
Fix wrong tiles parameters in Sauvola (#3570)
Thanks to Robert Sachunsky @bertsky that pointed out the issue.
2021-09-23 10:26:07 +03:00
Merlijn Wajer
ca177e72f3 hocrrenderer: write scan_res property to the ocr_page
This will make Tesseract emit the DPI of the document, if known at OCR
time. This is requird to properly interpret the x_fsize (font size)
property of words, since Tesseract scales the font size to the DPI.

See issue #3326 (https://github.com/tesseract-ocr/tesseract/issues/3326)
2021-09-21 11:02:52 +02:00
Stefan Weil
638045133f Simplify function LoadTrainingData and fix mastertrainer_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-17 08:24:50 +02:00
Stefan Weil
d87e08f266 Fix crash of shapeclustering (fixes #3564)
Fixes: 4415209fd6 ("Remove tessopt. This fixes mastertrainer test in shared build")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-16 22:31:09 +02:00
Stefan Weil
e5e12f2856 Disable HAVE_FRAMEWORK_ACCELERATE for compilers which fail to compile with it
g++-10 and g++-11 throw compiler errors in builds with the
Accelerate framework, so disable it for all GNU compilers
before version 12 (which still has to be tested).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 17:15:46 +02:00
Stefan Weil
ec87dd4d49 Abort LSTM training with integer model (fixes issue #1573)
Tesseract currently cannot continue LSTM training from an
integer (fast) model.

Report this to users who try it nevertheless instead of crashing
with an assertion.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 08:18:55 +02:00
Stefan Weil
a027dca007 Extend URI support for Tesseract with libcurl
libcurl not only supports HTTP and HTTPS, but also a lot of other protocols,
for example FTP and SFTP. Those protocols can also be useful for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-05 16:49:22 +02:00
Stefan Weil
7fc9a34f79 Rename processed TIFF output file and add page number if needed (fixes issue #3544)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-01 14:16:05 +02:00
Robert Pösel
40fdacd485 Add missing check for __ARM_NEON
This makes it consistent with intsimdmatrixneon.cpp file and allows having this file included in builds even for non-NEON platforms (simplifies build config).
2021-08-26 15:28:59 +02:00
Stefan Weil
4dcd8fa591 Fix handling of TESSDATA_PREFIX containing // (fixes issue #3527)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 20:05:54 +02:00
Stefan Weil
391e713ae8 Use model prefix also for submodels
Fix also a regression in the for loop which handles submodels.

Fixes: 0d91c700c0 ("Modernize code in Tesseract::init_tesseract")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 13:41:00 +02:00
Stefan Weil
0d91c700c0 Modernize code in Tesseract::init_tesseract
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-23 07:30:03 +02:00
Egor Pugin
1d3d1fbc62 Move member function bodies into class template. 2021-08-20 12:42:40 +03:00
Egor Pugin
c539328d7d Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2021-08-20 12:38:12 +03:00
Egor Pugin
407346246c [universalambigs] Use inline variables. 2021-08-20 12:38:03 +03:00
Stefan Weil
7acda5cb6c Fix cloning of Image with pix_ == nullptr (issue #537)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-18 19:22:23 +02:00
Egor Pugin
6056c84977 [image] Mark PIX** cast explicit to prevent implicit bool checks in ternary operators. 2021-08-18 18:14:47 +03:00
Stefan Weil
59271470b4 Remove unneeded type cast
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 20:55:14 +02:00
Stefan Weil
aaec341449 Avoid call of ColumnFinder::DisplayBlocks (small optimization)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 15:23:44 +02:00
Stefan Weil
6da7d6fcda Optimize check for non empty string and fix code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:45:22 +02:00
Stefan Weil
92cae8f194 Optimize check for non empty string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:44:45 +02:00
Stefan Weil
3ef403c345 Compile LSTM::PrintW and LSTM::PrintDW conditionally
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
5d99041f5d Remove unused function Wordrec::merge_fragments
Remove also more functions which are now also unused.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
f1c8df0ce9 Remove unused global variable fx_debug
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
16fd1439fa Write image filename in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
5f10fed5d9 Reduce size of TessResultRenderer
Changing the order reduces the size from 72 to 64 bytes
on 64 bit Linux.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
a73e7b97a4 Add float dotproduct implementation for NEON
Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
2021-08-03 10:35:22 +02:00
Stefan Weil
bb4a1219d7 Improve setting of dot product functions via environment variable
Apply the settings which are selected by environment variable DOTPRODUCT
after the autodetection which detects the available SIMD hardware.

'accelerate', 'fma' and 'std::inner_product' now no longer change
the setting for intSimdMatrix to 'generic' because they don't provide
their own implementation for it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 10:34:33 +02:00
Stefan Weil
edcf4fcd3b Fix comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
0d0f203509 Add new configure option --enable-float32 for faster LSTM with float
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-29 06:49:08 +02:00
Stefan Weil
553ab64d8d Rename UnicityTable<T>::get_id to UnicityTable<T>::get_index
This prepares replacing UnicityTable<FontInfo> by FontInfoTable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-26 07:59:58 +02:00
Stefan Weil
df1295ea6b
Simplify *_VAR_H macros (#3508)
This avoids duplicate (and potentially inconsistent) code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 12:09:07 +03:00
Ger Hobbelt
27597883db Implement DotProductSSE() for FAST_FLOAT
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
79e8b4f344 bugfixing the AVX2 Extract8+16 codes
There's lines like `__m256d scale01234567 = _mm256_loadu_ps(scales)`,
i.e. loading float vectors into double vector types.

[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
24a29b79e5 bugfix of FMA port to FAST_FLOAT
8 float FPs fit in a single 256bit vector (8x32)
(contrasting 4 double FPs: 4*64)

[sw] Format commit message and use float instead of TFloat
2021-07-24 15:14:17 +02:00
Stefan Weil
472f5d9020 Add TFloat data type for neural network
Up to now Tesseract used double for training and recognition
with "best" models.

This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.

Ideally this should allow faster training.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 15:14:17 +02:00
Stefan Weil
66b77e6639 Prepare using float instead of double for LSTM calculations
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).

FloatToDouble is now a local function.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 13:59:37 +02:00
Stefan Weil
4df822a3fc
Revert "Merge pull request #3330 from Sintun/master" (#3505)
This reverts commit 122daf1d64, reversing
changes made to 4cd56dc5f5.

Those changes caused two regressions which resulted in an assertion
or a segmentation fault.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-22 09:04:23 +03:00
Stefan Weil
e176169a90 Remove stray spaces at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:59:15 +02:00
Ger Hobbelt
444fe14273 Fix a couple of 'shadowed local variables' compiler warnings
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)

[sw]: Format commit message and use different fix for blamer.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
0fc6d8d7f0 Add missing hint for dotproduct parameter value "fma"
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:44:29 +02:00
Ger Hobbelt
f72d4b1fe7 NEON arch: dead ref cycle fix
When neon_available_ is ON, the DotProduct was set to point to DotProduct,
which should have been DotProductNative, as dotProduct is the *target* global itself:
see simddetect.h --> effectively making that part of the SetDotProduct() call
identical to this (no-op) statement: `DotProduct = DotProduct;`

Also added the Neon check in the Update() API, where it exists together
with the other checks (for AVX/SSE/etc.)

[sw: formatted commit message and merged into main branch]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:40:16 +02:00
Stefan Weil
dff7312aed Modernize code in SIMDDetect::Update
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:16:49 +02:00
Stefan Weil
3ab8dcbf72 Use Apple Accelerate framework for training and best models
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 19:27:54 +02:00
Johannes Künsebeck
3be11f12a9 Removed unused parameters declarations and definitions 2021-07-20 15:08:10 +02:00
zdenop
8dd7936475
Solve clang reporting unused variable in ExtractMicros function (#3501)
* mark attribute as unused for compiler
* try c++17 standard https://en.cppreference.com/w/cpp/language/attributes/maybe_unused
2021-07-18 01:59:49 +02:00
nagadomi
7fe0624838
Fix spec string of convolution layer (#3499) 2021-07-16 18:21:52 +03:00
Stefan Weil
88d4028a5a Enable pragma for SIMD also when _OPENMP is defined
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-15 16:03:43 +02:00
Stefan Weil
f0fb6809e3 Use SIMD instructions for DotProductNative
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-14 19:13:01 +02:00
Tadahito Yao
12e0fb4e01
Fix deadlock in lstmtraing. (#3488) 2021-07-10 10:59:10 +03:00
Stefan Weil
767fb5a177 Fix LSTMTrainerTest.BidiTest
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-04 18:41:19 +02:00
Stefan Weil
158c845228 Catch another FP division by 0 (fixes issue #3483)
Rewriting the code avoids FP operations (so makes it potentially faster)
and fixes the division by 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-03 15:37:24 +02:00
Stefan Weil
4b630a8813 Catch FP division by 0 (fixes issue #3483)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-02 15:04:31 +02:00
Stefan Weil
a701454ae5
Fix vector resize with init for all elements (issue #3473) (#3474)
Fixes: c8b8d266d6
Fixes: 9710bc0465
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-06-29 21:05:29 +03:00
nagadomi
ff1062d39d
Add --reset_learning_rate option to lstmtraining (#3470)
When the --reset_learning_rate option is specified,
it resets the learning rate stored in each layer of the network
loaded with --continue_from to the value specified by the --learning_rate option.
If checkpoint is available, it does nothing.
2021-06-28 11:48:07 +03:00
nagadomi
d8bd78f8e2
Fix missing reset of best_error_history_ in LSTMTrainer::InitIterations() (#3469) 2021-06-27 09:26:32 +03:00
nagadomi
b2fa77f8f0
Show layer specified learning rates with combine_tessdata -l (#3468) 2021-06-26 08:08:54 +03:00
MonkeybreadSoftware
75e6c3ea4c
Null check for GetSourceYResolution (#3457)
* Null check for GetSourceYResolution

Added missing NULL check to avoid crash when we read property in our tesseract wrapper.

* Added missing return value.

added -1 to return if undefined.
2021-06-16 16:35:24 +03:00
Amit Dovev
bf979c801a Remove unused variable 2021-05-21 20:34:09 +03:00
Egor Pugin
a72408fdef
Merge pull request #3438 from amitdo/pango
Raise Minimum required Pango version to 1.38.0
2021-05-21 20:09:27 +03:00
Amit Dovev
8615f65cc4 Raise Minimum required Pango version to 1.38.0 2021-05-21 19:56:37 +03:00
Amit Dovev
c24538518c ThresholdMethod::TiledSauvola -> ThresholdMethod::Sauvola
The fact that this method uses tiles is implementation detail. It does not change the result compared to Sauvola without tiles. The use of tiles minimize memory consumption.
2021-05-21 18:15:30 +03:00
Stefan Weil
93348a83a3 Remove scripts for training
They were replaced by Python3 scripts (part of the tesstrain repository).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-18 10:47:44 +02:00
nagadomi
42e4b91132 Refactor ObjectCache::DeleteUnusedObjects with reverse iterator 2021-05-17 14:50:30 +02:00
nagadomi
dc4a8a6ce0 Fix crash in ObjectCache::DeleteUnusedObjects 2021-05-17 10:25:17 +09:00
Stefan Weil
0c4e2f1cb5 Fix comment in code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-16 07:47:19 +02:00
Stefan Weil
57b7974292 Remove an arbitrary limit for the image size
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
a0cf117c5d Fix compiler warning in binarization code (uninitialized local variable)
Simplify the code also a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
bf84fb9f2d Optimize code for binarization
Some code is only needed for Otsu or even not at all.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
4b5dd25b84 Fix compiler warning
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
12c29639fc Add conditional compilation with GRAPHICS_DISABLED
This fixes a compiler warning when GRAPHICS_DISABLED is defined.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-13 17:22:24 +02:00
Nick White
ad7010a5eb lstmeval: Only print char and word error rates for verbosity 2/3 2021-05-11 13:15:35 +01:00
Nick White
4787414d88 lstmeval: Print char and word error rates for each line tested 2021-05-11 10:54:34 +01:00
Nick White
9c82cc63c2 Switch to NFC normalisation everywhere 2021-05-11 10:18:06 +01:00
Egor Pugin
43747d6ea8 Postfix for #3418. 2021-05-10 15:06:27 +03:00
Egor Pugin
e7c01a6f15
Merge pull request #3418 from amitdo/thresholder
Add more binarization options
2021-05-10 14:45:03 +03:00
Amit Dovev
21e76c7a13 Convert enum ThreshMethod to enum class 2021-05-09 18:49:09 +03:00
Egor Pugin
176d0927bd Allow explicit casts of Image to Pix**. 2021-05-07 21:30:42 +03:00
Amit Dovev
11c73c9481 Add more binarization options
Use functions from Leptonica to provide more binarization options. The new options are: 1) Adaptive Otsu and 2) Sauvola (Tiled) .
2021-05-07 16:48:26 +03:00
Egor Pugin
65118b2e3a [misc] Fix variable type. Fixes warning. 2021-05-04 16:12:40 +03:00
Egor Pugin
346b77c94e Remove unneeded header. 2021-05-04 16:10:52 +03:00
Egor Pugin
4fbe9f1de2 Revert d6cdc52. Fixes #3412. 2021-05-04 00:51:39 +03:00
Ger Hobbelt
bd8adff829 fix compile error: PrintFontsTable() is for legacy builds only
# Conflicts:
#	googletest
2021-04-29 23:27:20 +02:00
Lucas Cimon
b852d658cb Adding --print-fonts-table parameter & tessedit_font_id configuration option 2021-04-29 11:25:40 +02:00
Stefan Weil
2e2a5b3ef4 Improved fix for issue #3405
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:36 +02:00
Stefan Weil
0b7fc068d2 Revert "Fix double free. Closes #3405."
This reverts commit 3997cf54d2.
It will be replaced by a simpler fix.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:18 +02:00
Egor Pugin
3a195e5b05 Misc. 2021-04-27 22:08:29 +03:00
Egor Pugin
3997cf54d2 Fix double free. Closes #3405. 2021-04-27 22:08:06 +03:00
Egor Pugin
e3ac1835e0 Remove unneeded ctor. 2021-04-23 04:26:18 +03:00
Egor Pugin
a7f938d28e Make FontSet just a vector. 2021-04-23 04:25:45 +03:00
Egor Pugin
4ae5a7d6b5 Properly init font set. 2021-04-23 04:05:59 +03:00
Egor Pugin
048e63c02b Replace FontSet struct with vector. It may be improved further (remove pointer?). 2021-04-23 02:38:25 +03:00
Egor Pugin
d6cdc521e5 Remove unused headers. 2021-04-23 02:06:06 +03:00
Stefan Weil
740d10b61b Fix issue #3404 (empty page regression)
The regression was caused by a bug in commit 5db92b26aa.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-22 20:51:23 +02:00
Stefan Weil
66a963b50a Remove two assertions which are triggered by fuzzing
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 19:04:49 +02:00
Stefan Weil
26c21a6db4 Fix some compiler warnings with GRAPHICS_DISABLED
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 07:58:31 +02:00
Stefan Weil
6d0595b443 Fix memory leak (OSS-Fuzz issue 33220)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-19 20:59:18 +02:00
Robert Pösel
c74ff1259b Fix wrong parameter name and documentation
set_only_init_params -> set_only_non_debug_params
2021-04-19 16:55:01 +02:00
Stefan Weil
2dfa38a072 Fix old TODO for struct EDGEPT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-17 18:08:27 +02:00
Fabrizio Di Vittorio
2be896d2b9 Add SVSemaphore destructor to avoid system objects leaks 2021-04-15 09:23:22 +02:00
Stefan Weil
e6e871bc73 Replace pointer by value for ScrollView mutex
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-15 06:30:05 +02:00
Stefan Weil
4daf781916 Fix NULL pointer access (issue #3394)
The regression was caused by commit 57c90eee02.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 22:10:12 +02:00
Stefan Weil
91b2b4f4a0 Fix OSS-Fuzz issue 32142 (container-overflow write)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 13:45:12 +02:00
Stefan Weil
f83f00496e Clean, format and optimize code in edgblob.cpp / edgblob.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 08:03:30 +02:00
Egor Pugin
a732565cad Fix headers. 2021-04-12 01:40:40 +03:00
Egor Pugin
4f6ff85123 Remove unneeded header. 2021-04-12 01:19:00 +03:00
Egor Pugin
57c90eee02 [edgblob] Replace unique ptr with vector. Fix possible index issues.
Closes #1921.
2021-04-12 01:17:57 +03:00
Stefan Weil
cca46e6b29 Fix another use-after-free (issue #3394)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 21:37:46 +02:00
Stefan Weil
33fa9d3223 Fix use-after-free (issue #3394)
This bug was introduced by commit f77b1c6881.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 19:10:44 +02:00
Egor Pugin
423f00c351
Merge pull request #3393 from eighttails/fix_zero_division
Fix division by zero during CJK training.
2021-04-11 15:38:28 +03:00
Tadahito Yao
8a8204e62a Reverted one of zero value checks. 2021-04-11 21:30:02 +09:00
Tadahito Yao
05eef742df Fix division by zero during CJK training. 2021-04-11 20:14:45 +09:00
Stefan Weil
0401b9470c Fix some typos (most found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 11:06:36 +02:00
Stefan Weil
f77b1c6881 Fix memory leak (OSS-Fuzz issue #32246)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-10 21:35:31 +02:00
Amit D
a4a84c4c92
lstmrecognizer.cpp: Call OutputStats() only when 'invert' is true (#3387)
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 17:55:23 +02:00
Amit Dovev
e6ce048426 Change message from 'Found SSE' to 'Found SSE4.1' 2021-04-08 17:51:09 +02:00
Stefan Weil
63f4463028 Add const attribute to some functions (API change)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
253751c331 Simplify class REJ by replacing two std::bitset<16> by one std::bitset<32>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
2fbcca783b Make more functions in class REJ inline
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
a74bbb6032 Remove bits16.h and BITS16 data type
Add also const attribute to some functions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
2fa96b765b Modernize and optimize list_rec a little bit
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 17:30:33 +02:00
Stefan Weil
7fd90498ca Modernize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 17:30:33 +02:00
Egor Pugin
edfce72340 Refactor microfeatures a bit. 2021-04-07 17:29:46 +03:00
Egor Pugin
47715e576a Replace microfeatures from oldlist to std::forward_list. 2021-04-07 17:10:16 +03:00
Egor Pugin
2e17ee7327 Correct template args. 2021-04-07 13:28:57 +03:00
Stefan Weil
10255d013a Fix new / delete class mismatch
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 09:25:37 +02:00
Egor Pugin
b1731b6e73 Add missing TESS_API. 2021-04-07 00:59:36 +03:00
Egor Pugin
6e3259593a Reorder list templates. 2021-04-07 00:29:07 +03:00
Egor Pugin
409aa5296f Misc. 2021-04-07 00:17:04 +03:00
Egor Pugin
9d40512ade [elist2] Convert macros to template. Remove source file macro ELIST2IZE. 2021-04-07 00:15:01 +03:00
Egor Pugin
03435adca0 [elist] Rework macro into template and small macro. Move common iterator template into 'list_iterator.h'. 2021-04-07 00:04:30 +03:00
Egor Pugin
b9329e599f Misc. 2021-04-06 23:45:28 +03:00
Egor Pugin
746b87363b Remove unused methods. 2021-04-06 23:45:22 +03:00
Egor Pugin
29e75d0f51 [elist] Remove unused macros QUOTE_IT. 2021-04-06 23:40:56 +03:00
Egor Pugin
539f4b8255 [clist] Remove unused methods. 2021-04-06 23:40:35 +03:00
Egor Pugin
18e61d10ce Rework big clist macro into template and small macro. Remove unused macros QUOTE_IT and CLISTIZE (source file macro). 2021-04-06 23:37:14 +03:00
Raf Schietekat
6bbfef7c85 RAII: TessBaseAPI::GetIterator()
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 17:57:23 +02:00
Raf Schietekat
d71413f4aa RAII: TessBaseAPI::AnalyseLayout()
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 17:46:26 +02:00
Stefan Weil
897e59613d Clean code for hOCR renderer
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 16:36:23 +02:00
Stefan Weil
3705989c94 Optimize length method for ELIST, ELIST2
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:57:12 +02:00
Stefan Weil
4104876b08 Add const attribute to some methods of ELIST, ELIST2 and related classes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:48:18 +02:00
Stefan Weil
fb904d2265 Remove redundant debug code for CLIST
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:26:04 +02:00
Stefan Weil
b47ce5643b Modernize CLIST code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:16:57 +02:00
Stefan Weil
fd187b0c18 Optimize CLIST
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:08:35 +02:00
Stefan Weil
4a628729b2 Delete assignment and copy constructor for ELIST
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:59:31 +02:00
Stefan Weil
b0b5600c30 Delete assignment and copy constructor for ELIST2, ELIST2_LINK
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:59:00 +02:00
Stefan Weil
24f91fab0b Delete assignment and copy constructor for CLIST, CLIST_LINK
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:42:01 +02:00
Stefan Weil
eeb67e8ae8 Replace find / insert by insert on unordered set to optimize GridSearch
Both find and insert can be slow for a large unordered set.

Instead of using both methods, it is sufficient to simply try only
the insert method which returns whether the insertion was possible
or not.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 18:11:33 +02:00
Egor Pugin
50aec308b3 Remove unnecessary pointer hasher for uset. 2021-04-04 14:00:46 +03:00
Stefan Weil
0611c892b6 Disable more code with GRAPHICS_DISABLED
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-02 16:43:26 +02:00
Egor Pugin
7a73875bd1
Merge pull request #3375 from amitdo/viewer
Disable more code with GRAPHICS_DISABLED
2021-04-02 12:27:24 +03:00
Amit Dovev
6d94b22c80 Disable more code with GRAPHICS_DISABLED 2021-04-02 11:12:38 +03:00
Egor Pugin
34e0d017ab Add Image::operator&=(). 2021-04-01 19:15:58 +03:00
Egor Pugin
9e3da4a724 Add Image::operator|=(). 2021-04-01 19:10:48 +03:00
Egor Pugin
e077b7255d Remove arg from Image::copy(). 2021-04-01 19:08:47 +03:00
Egor Pugin
d5fb7f9843 Init variable. 2021-04-01 17:16:46 +03:00
Egor Pugin
fe02ba2363 Add Image::isZero(). 2021-04-01 17:15:48 +03:00
Egor Pugin
306d296979 Add Image::clone(). 2021-04-01 17:06:30 +03:00
Egor Pugin
2aca22439e Add Image::copy(). 2021-04-01 16:55:43 +03:00
Stefan Weil
5159f9aa12 Fix name conflict between class and function named Image
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-01 14:00:08 +02:00
Egor Pugin
e429b607ae [misc] Update header guard. 2021-04-01 01:36:22 +03:00
Egor Pugin
1628a9aae3 Revert 4fa05b9147. Make a note. 2021-04-01 01:35:50 +03:00
Egor Pugin
a792b67983 Basic usage of new Image class. Only pixDestroy is wrapped at the moment.
Add new methods to Image class and replace them in non-public code.
2021-03-31 22:39:43 +03:00
Egor Pugin
ce6e2f1821 Initial tesseract Image wrapper.
Provide basic Pix conversions.
Add destroy() method.

It can be extended later to 1) image owner (raii), 2) different image libraries.
2021-03-31 22:38:32 +03:00
Egor Pugin
4fa05b9147 Remove unused ifdef. 2021-03-31 21:54:12 +03:00
Stefan Weil
722767633e Partially fix issue #3374
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-31 19:23:07 +02:00
Stefan Weil
b7c6d971f3 Fix some compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-31 07:08:53 +02:00
Stefan Weil
6684a727c1 Improve some structs further (fixes several CID issues)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-30 14:20:52 +02:00
Nick White
abea25ee2f lstm: Include missing header 2021-03-29 18:53:35 +02:00
Stefan Weil
2e349dbba5 Fix compilation for Tensorflow code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 16:19:06 +02:00
Stefan Weil
3c03d70e64 Fix some compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 16:12:52 +02:00
Stefan Weil
f639500a81 Add missing TESS_API for sw builds
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:34:23 +02:00
Stefan Weil
5c4de14567 Replace strdup / free by std::string in SVSync::StartProcess
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
3790413cc5 Replace remaining malloc / free in training code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
7c1bea505a Replace strdup / free by std::string for StringRenderer::features_
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
201686feb8 Use lept_free instead of free for memory which was allocated by Leptonica
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:55:33 +02:00
Stefan Weil
1b95eb1d19 Replace malloc / free by std::string for LABELEDLISTNODE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:29:08 +02:00
Stefan Weil
1620daffcd Replace malloc / free by std::string in LABELEDLISTNODE and MERGE_CLASS_NODE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:17:42 +02:00
Stefan Weil
0976e23387 Replace malloc / free by new / delete for KDTREE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 23:19:46 +02:00
Stefan Weil
c05d849381 Replace malloc / free by new / delete for NORM_PROTOS
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:37:47 +02:00
Stefan Weil
174210c849 Replace malloc / free by new / delete for MFEDGEPT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:24:51 +02:00
Stefan Weil
0c3d244238 Replace new / delete by std::vector for INT_CLASS_STRUCT::ProtoLengths
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:09:06 +02:00
Stefan Weil
486c257f42 Replace malloc / free by new / delete for MICROFEATURE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 21:20:59 +02:00
Stefan Weil
30f44f333a Replace malloc / free by new / delete for KDNODE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 21:11:22 +02:00
Stefan Weil
47a1fd7b45 Replace malloc / free by new / delete for INT_CLASS_STRUCT::ProtoLengths
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:41:37 +02:00
Stefan Weil
d6caae3793 Replace malloc / free by std::vector for BUCKETS
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:32:57 +02:00
Stefan Weil
78f8a47d05 Replace malloc / free by std::vector for PROTOTYPE::Distrib
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
b8488dac7a Replace malloc / free for TEMPCLUSTER
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
2a569c9cfb Replace malloc / free for FLOATUNION::Elliptical
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
5bf1af257c Use std::vector<BIT_VECTOR> for CLASS_STRUCT::Configurations
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
6f499f7fb5 Use std::vector<PROTO_STRUCT> for CLASS_STRUCT::Prototypes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
441f74c1e6 Replace malloc / free for STATISTICS
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
57d3a1eb99 Replace malloc / free for CLUSTER::Mean and PROTOTYPE::Mean
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:32 +02:00
Stefan Weil
667eee2344 Replace malloc / free for CLIST
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
0077bc46cf Replace malloc / free for ELIST2
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
2c273c1b3b Replace malloc / free for ELIST
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
582260a9bf Replace malloc / free for C_OUTLINE::steps
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
b15b5d1de7 Replace malloc / free by new / delete for FEATURE_STRUCT, FEATURE_SET_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
aa8dda89a3 Replace malloc / free by new / delete for CHAR_DESC_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 18:43:14 +01:00
Stefan Weil
0f90ccb9cd Replace malloc / free by new / delete for CHISTRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 16:45:14 +01:00
Stefan Weil
0a46866bcd Replace malloc / free by new / delete for PERM_CONFIG_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 16:19:40 +01:00