zdenop
3ca273f914
cmake silent message about changed behaviour
2021-10-28 12:07:53 +02:00
Stefan Weil
5cc649e5f9
Remove code which is wrong in combination with NFC
...
See comments in https://github.com/tesseract-ocr/tesseract/pull/3420 .
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:52:03 +02:00
Stefan Weil
5cee9a0cec
Merge remote-tracking branch 'nickjwhite/nfc'
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:40:02 +02:00
Stefan Weil
c602624012
Prepare support for image width and height larger than 32767 (continued)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:58:31 +02:00
Stefan Weil
59fbad0dd5
Prepare support for image width and height larger than 32767
...
Avoid using int16_t and use a new data type TDimension where needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:45:33 +02:00
Stefan Weil
56f54c24de
Fix heap use after free (issue #3523 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 19:19:59 +02:00
Amit D
cea2a6015e
Thresholding: Improve some debug messages
2021-10-26 19:09:06 +03:00
Stefan Weil
d6de055acf
Set default language for tesseract only if required
...
When running with --list-langs, --print-parameters or --print-fonts-table
no default language is needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
Stefan Weil
f5d22d0bcc
Don't set a default language in TessBaseAPI::Init (API change)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
zdenop
48c5d426ca
Merge pull request #3609 from stweil/api
...
Remove TessBaseAPI::InitLangMod (API change)
2021-10-26 07:23:52 +02:00
Stefan Weil
255d7c9675
Fix CID 1400763 Using invalid iterator (fixes issue #2806 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 22:20:45 +02:00
Stefan Weil
c2df9ce57b
Remove Tesseract::init_tesseract_lm which is no longer used
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
5738c44d40
Remove TessBaseAPI::InitLangMod (API change)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
cdd19d561b
Remove old comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:26:32 +02:00
Amit Dovev
0aeb2e7913
Thresholding: Change smooth scaling logic
...
As suggested by @bertsky.
2021-10-15 19:34:39 +03:00
Amit D
9a1ad4333e
Apply suggestions from code review
...
Extend help message for 2 parameters
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
2021-10-15 18:14:49 +03:00
Amit D
0d2d6e3b2a
Fix a mismatch between tprintf format string and args
2021-10-14 20:56:48 +03:00
Amit Dovev
a268c3092f
Thresholding: Change the window and tile size parameters to relative numbers
...
They are relative to the pixel density of the image.
2021-10-14 20:21:28 +03:00
Amit D
0d5705fe50
ThresholdMethod enum: AdaptiveOtsu -> LeptonicaOtsu ( #3593 )
2021-10-13 15:03:39 +03:00
Amit D
7f349a47b6
Fix a bug in the thresholder
2021-10-11 19:29:39 +03:00
Stefan Weil
d935502b48
Fix two LGTM alerts (Comparison between i of type int16_t and wider type int32_t)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:37:04 +02:00
Stefan Weil
4a56136d34
Disable conditional which is currently always false (reported by LGTM)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:21:12 +02:00
Stefan Weil
cc085f6bd6
Fix format string (reported by LGTM)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:19:58 +02:00
Stefan Weil
988102c41d
Disable incomplete code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
842cca1d49
Fix more signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
86d981eee6
wordrec: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
cb10da06be
training: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
5cce7342e5
textord: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3bb8263b3e
lstm: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a274f4a531
dict: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
bcc71c675a
classify: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
e1d7a21559
ccutil: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
97048fe3e4
ccstruct: Fix some signed/unsigned compiler warnings
...
Remove also a local buffer in function REJMAP::print.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
2e4bb8f5d7
genericvector: Change function size to return unsigned value
...
Sizes are generally unsigned in the C++ standard library,
and following this standard makes code changes easier.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
d040cce990
ccmain: Remove unused local variable
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
c8fd23d6dc
ccmain: Fix more signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3a4828bcf4
ccmain: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a9c3f6d87f
ccmain/paragraphs: Make local function UnicodeFor and fix signed/unsigned
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
4c36e2e29a
Fix compiler warnings in TWERD::MergeBlobs and optimize code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
0cdcd0f02b
Remove unused code
...
Fixes: 766b7bd620
("Don't drop words with low certainty")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
ca0e68f046
Avoid implicit conversions from float to double
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
9315d4c7e2
Change size and count arguments in TFile from int to size_t
...
This matches standard functions like fread, fwrite.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
85cb6678fa
Replace new / delete by std::unique_ptr and std::vector in class Classify
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:08:12 +02:00
Stefan Weil
5d903da1ce
Replace new / delete by std::vector in class Wordrec
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:07:25 +02:00
Stefan Weil
467f24c0b6
Replace new / delete by std::vector in class Trie
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:06:48 +02:00
Stefan Weil
ed1100832c
Replace new / delete by std::vector in class WERD_CHOICE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:05:47 +02:00
Stefan Weil
0aad8b8619
Fix build with OpenCL and add namespace to OpenCL code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-06 07:51:03 +02:00
Amit D
0cb9c40528
Add configurable variables to control thresholding ( #3577 )
2021-09-29 23:17:22 +03:00
zdenop
ebb214c443
destroy temporary page_pix
2021-09-25 10:26:31 +02:00
Amit D
adaaef87a4
Fix wrong tiles parameters in Sauvola ( #3570 )
...
Thanks to Robert Sachunsky @bertsky that pointed out the issue.
2021-09-23 10:26:07 +03:00
Merlijn Wajer
ca177e72f3
hocrrenderer: write scan_res property to the ocr_page
...
This will make Tesseract emit the DPI of the document, if known at OCR
time. This is requird to properly interpret the x_fsize (font size)
property of words, since Tesseract scales the font size to the DPI.
See issue #3326 (https://github.com/tesseract-ocr/tesseract/issues/3326 )
2021-09-21 11:02:52 +02:00
Stefan Weil
638045133f
Simplify function LoadTrainingData and fix mastertrainer_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-17 08:24:50 +02:00
Stefan Weil
d87e08f266
Fix crash of shapeclustering ( fixes #3564 )
...
Fixes: 4415209fd6
("Remove tessopt. This fixes mastertrainer test in shared build")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-16 22:31:09 +02:00
Stefan Weil
e5e12f2856
Disable HAVE_FRAMEWORK_ACCELERATE for compilers which fail to compile with it
...
g++-10 and g++-11 throw compiler errors in builds with the
Accelerate framework, so disable it for all GNU compilers
before version 12 (which still has to be tested).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 17:15:46 +02:00
Stefan Weil
ec87dd4d49
Abort LSTM training with integer model (fixes issue #1573 )
...
Tesseract currently cannot continue LSTM training from an
integer (fast) model.
Report this to users who try it nevertheless instead of crashing
with an assertion.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 08:18:55 +02:00
Stefan Weil
a027dca007
Extend URI support for Tesseract with libcurl
...
libcurl not only supports HTTP and HTTPS, but also a lot of other protocols,
for example FTP and SFTP. Those protocols can also be useful for Tesseract.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-05 16:49:22 +02:00
Stefan Weil
7fc9a34f79
Rename processed TIFF output file and add page number if needed (fixes issue #3544 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-01 14:16:05 +02:00
Robert Pösel
40fdacd485
Add missing check for __ARM_NEON
...
This makes it consistent with intsimdmatrixneon.cpp file and allows having this file included in builds even for non-NEON platforms (simplifies build config).
2021-08-26 15:28:59 +02:00
Stefan Weil
4dcd8fa591
Fix handling of TESSDATA_PREFIX containing // (fixes issue #3527 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 20:05:54 +02:00
Stefan Weil
391e713ae8
Use model prefix also for submodels
...
Fix also a regression in the for loop which handles submodels.
Fixes: 0d91c700c0
("Modernize code in Tesseract::init_tesseract")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 13:41:00 +02:00
Stefan Weil
0d91c700c0
Modernize code in Tesseract::init_tesseract
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-23 07:30:03 +02:00
Egor Pugin
1d3d1fbc62
Move member function bodies into class template.
2021-08-20 12:42:40 +03:00
Egor Pugin
c539328d7d
Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract
2021-08-20 12:38:12 +03:00
Egor Pugin
407346246c
[universalambigs] Use inline variables.
2021-08-20 12:38:03 +03:00
Stefan Weil
7acda5cb6c
Fix cloning of Image with pix_ == nullptr (issue #537 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-18 19:22:23 +02:00
Egor Pugin
6056c84977
[image] Mark PIX** cast explicit to prevent implicit bool checks in ternary operators.
2021-08-18 18:14:47 +03:00
Stefan Weil
59271470b4
Remove unneeded type cast
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 20:55:14 +02:00
Stefan Weil
aaec341449
Avoid call of ColumnFinder::DisplayBlocks (small optimization)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 15:23:44 +02:00
Stefan Weil
6da7d6fcda
Optimize check for non empty string and fix code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:45:22 +02:00
Stefan Weil
92cae8f194
Optimize check for non empty string
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:44:45 +02:00
Stefan Weil
3ef403c345
Compile LSTM::PrintW and LSTM::PrintDW conditionally
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
5d99041f5d
Remove unused function Wordrec::merge_fragments
...
Remove also more functions which are now also unused.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
f1c8df0ce9
Remove unused global variable fx_debug
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
16fd1439fa
Write image filename in ALTO output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
5f10fed5d9
Reduce size of TessResultRenderer
...
Changing the order reduces the size from 72 to 64 bytes
on 64 bit Linux.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
a73e7b97a4
Add float dotproduct implementation for NEON
...
Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
2021-08-03 10:35:22 +02:00
Stefan Weil
bb4a1219d7
Improve setting of dot product functions via environment variable
...
Apply the settings which are selected by environment variable DOTPRODUCT
after the autodetection which detects the available SIMD hardware.
'accelerate', 'fma' and 'std::inner_product' now no longer change
the setting for intSimdMatrix to 'generic' because they don't provide
their own implementation for it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 10:34:33 +02:00
Stefan Weil
edcf4fcd3b
Fix comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
0d0f203509
Add new configure option --enable-float32 for faster LSTM with float
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-29 06:49:08 +02:00
Stefan Weil
553ab64d8d
Rename UnicityTable<T>::get_id to UnicityTable<T>::get_index
...
This prepares replacing UnicityTable<FontInfo> by FontInfoTable.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-26 07:59:58 +02:00
Stefan Weil
df1295ea6b
Simplify *_VAR_H macros ( #3508 )
...
This avoids duplicate (and potentially inconsistent) code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 12:09:07 +03:00
Ger Hobbelt
27597883db
Implement DotProductSSE() for FAST_FLOAT
...
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
79e8b4f344
bugfixing the AVX2 Extract8+16 codes
...
There's lines like `__m256d scale01234567 = _mm256_loadu_ps(scales)`,
i.e. loading float vectors into double vector types.
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
24a29b79e5
bugfix of FMA port to FAST_FLOAT
...
8 float FPs fit in a single 256bit vector (8x32)
(contrasting 4 double FPs: 4*64)
[sw] Format commit message and use float instead of TFloat
2021-07-24 15:14:17 +02:00
Stefan Weil
472f5d9020
Add TFloat data type for neural network
...
Up to now Tesseract used double for training and recognition
with "best" models.
This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.
Ideally this should allow faster training.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 15:14:17 +02:00
Stefan Weil
66b77e6639
Prepare using float instead of double for LSTM calculations
...
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).
FloatToDouble is now a local function.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 13:59:37 +02:00
Stefan Weil
4df822a3fc
Revert "Merge pull request #3330 from Sintun/master" ( #3505 )
...
This reverts commit 122daf1d64
, reversing
changes made to 4cd56dc5f5
.
Those changes caused two regressions which resulted in an assertion
or a segmentation fault.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-22 09:04:23 +03:00
Stefan Weil
e176169a90
Remove stray spaces at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:59:15 +02:00
Ger Hobbelt
444fe14273
Fix a couple of 'shadowed local variables' compiler warnings
...
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)
[sw]: Format commit message and use different fix for blamer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
0fc6d8d7f0
Add missing hint for dotproduct parameter value "fma"
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:44:29 +02:00
Ger Hobbelt
f72d4b1fe7
NEON arch: dead ref cycle fix
...
When neon_available_ is ON, the DotProduct was set to point to DotProduct,
which should have been DotProductNative, as dotProduct is the *target* global itself:
see simddetect.h --> effectively making that part of the SetDotProduct() call
identical to this (no-op) statement: `DotProduct = DotProduct;`
Also added the Neon check in the Update() API, where it exists together
with the other checks (for AVX/SSE/etc.)
[sw: formatted commit message and merged into main branch]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:40:16 +02:00
Stefan Weil
dff7312aed
Modernize code in SIMDDetect::Update
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:16:49 +02:00
Stefan Weil
3ab8dcbf72
Use Apple Accelerate framework for training and best models
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 19:27:54 +02:00
Johannes Künsebeck
3be11f12a9
Removed unused parameters declarations and definitions
2021-07-20 15:08:10 +02:00
zdenop
8dd7936475
Solve clang reporting unused variable in ExtractMicros function ( #3501 )
...
* mark attribute as unused for compiler
* try c++17 standard https://en.cppreference.com/w/cpp/language/attributes/maybe_unused
2021-07-18 01:59:49 +02:00
nagadomi
7fe0624838
Fix spec string of convolution layer ( #3499 )
2021-07-16 18:21:52 +03:00
Stefan Weil
88d4028a5a
Enable pragma for SIMD also when _OPENMP is defined
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-15 16:03:43 +02:00
Stefan Weil
f0fb6809e3
Use SIMD instructions for DotProductNative
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-14 19:13:01 +02:00
Tadahito Yao
12e0fb4e01
Fix deadlock in lstmtraing. ( #3488 )
2021-07-10 10:59:10 +03:00
Stefan Weil
767fb5a177
Fix LSTMTrainerTest.BidiTest
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-04 18:41:19 +02:00
Stefan Weil
158c845228
Catch another FP division by 0 (fixes issue #3483 )
...
Rewriting the code avoids FP operations (so makes it potentially faster)
and fixes the division by 0.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-03 15:37:24 +02:00
Stefan Weil
4b630a8813
Catch FP division by 0 (fixes issue #3483 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-02 15:04:31 +02:00
Stefan Weil
a701454ae5
Fix vector resize with init for all elements (issue #3473 ) ( #3474 )
...
Fixes: c8b8d266d6
Fixes: 9710bc0465
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-06-29 21:05:29 +03:00
nagadomi
ff1062d39d
Add --reset_learning_rate option to lstmtraining ( #3470 )
...
When the --reset_learning_rate option is specified,
it resets the learning rate stored in each layer of the network
loaded with --continue_from to the value specified by the --learning_rate option.
If checkpoint is available, it does nothing.
2021-06-28 11:48:07 +03:00
nagadomi
d8bd78f8e2
Fix missing reset of best_error_history_ in LSTMTrainer::InitIterations() ( #3469 )
2021-06-27 09:26:32 +03:00
nagadomi
b2fa77f8f0
Show layer specified learning rates with combine_tessdata -l ( #3468 )
2021-06-26 08:08:54 +03:00
MonkeybreadSoftware
75e6c3ea4c
Null check for GetSourceYResolution ( #3457 )
...
* Null check for GetSourceYResolution
Added missing NULL check to avoid crash when we read property in our tesseract wrapper.
* Added missing return value.
added -1 to return if undefined.
2021-06-16 16:35:24 +03:00
Amit Dovev
bf979c801a
Remove unused variable
2021-05-21 20:34:09 +03:00
Egor Pugin
a72408fdef
Merge pull request #3438 from amitdo/pango
...
Raise Minimum required Pango version to 1.38.0
2021-05-21 20:09:27 +03:00
Amit Dovev
8615f65cc4
Raise Minimum required Pango version to 1.38.0
2021-05-21 19:56:37 +03:00
Amit Dovev
c24538518c
ThresholdMethod::TiledSauvola -> ThresholdMethod::Sauvola
...
The fact that this method uses tiles is implementation detail. It does not change the result compared to Sauvola without tiles. The use of tiles minimize memory consumption.
2021-05-21 18:15:30 +03:00
Stefan Weil
93348a83a3
Remove scripts for training
...
They were replaced by Python3 scripts (part of the tesstrain repository).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-18 10:47:44 +02:00
nagadomi
42e4b91132
Refactor ObjectCache::DeleteUnusedObjects with reverse iterator
2021-05-17 14:50:30 +02:00
nagadomi
dc4a8a6ce0
Fix crash in ObjectCache::DeleteUnusedObjects
2021-05-17 10:25:17 +09:00
Stefan Weil
0c4e2f1cb5
Fix comment in code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-16 07:47:19 +02:00
Stefan Weil
57b7974292
Remove an arbitrary limit for the image size
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
a0cf117c5d
Fix compiler warning in binarization code (uninitialized local variable)
...
Simplify the code also a little bit.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
bf84fb9f2d
Optimize code for binarization
...
Some code is only needed for Otsu or even not at all.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
4b5dd25b84
Fix compiler warning
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
12c29639fc
Add conditional compilation with GRAPHICS_DISABLED
...
This fixes a compiler warning when GRAPHICS_DISABLED is defined.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-13 17:22:24 +02:00
Nick White
ad7010a5eb
lstmeval: Only print char and word error rates for verbosity 2/3
2021-05-11 13:15:35 +01:00
Nick White
4787414d88
lstmeval: Print char and word error rates for each line tested
2021-05-11 10:54:34 +01:00
Nick White
9c82cc63c2
Switch to NFC normalisation everywhere
2021-05-11 10:18:06 +01:00
Egor Pugin
43747d6ea8
Postfix for #3418 .
2021-05-10 15:06:27 +03:00
Egor Pugin
e7c01a6f15
Merge pull request #3418 from amitdo/thresholder
...
Add more binarization options
2021-05-10 14:45:03 +03:00
Amit Dovev
21e76c7a13
Convert enum ThreshMethod to enum class
2021-05-09 18:49:09 +03:00
Egor Pugin
176d0927bd
Allow explicit casts of Image to Pix**.
2021-05-07 21:30:42 +03:00
Amit Dovev
11c73c9481
Add more binarization options
...
Use functions from Leptonica to provide more binarization options. The new options are: 1) Adaptive Otsu and 2) Sauvola (Tiled) .
2021-05-07 16:48:26 +03:00
Egor Pugin
65118b2e3a
[misc] Fix variable type. Fixes warning.
2021-05-04 16:12:40 +03:00
Egor Pugin
346b77c94e
Remove unneeded header.
2021-05-04 16:10:52 +03:00
Egor Pugin
4fbe9f1de2
Revert d6cdc52
. Fixes #3412 .
2021-05-04 00:51:39 +03:00
Ger Hobbelt
bd8adff829
fix compile error: PrintFontsTable() is for legacy builds only
...
# Conflicts:
# googletest
2021-04-29 23:27:20 +02:00
Lucas Cimon
b852d658cb
Adding --print-fonts-table parameter & tessedit_font_id configuration option
2021-04-29 11:25:40 +02:00
Stefan Weil
2e2a5b3ef4
Improved fix for issue #3405
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:36 +02:00
Stefan Weil
0b7fc068d2
Revert "Fix double free. Closes #3405."
...
This reverts commit 3997cf54d2
.
It will be replaced by a simpler fix.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:18 +02:00
Egor Pugin
3a195e5b05
Misc.
2021-04-27 22:08:29 +03:00
Egor Pugin
3997cf54d2
Fix double free. Closes #3405 .
2021-04-27 22:08:06 +03:00
Egor Pugin
e3ac1835e0
Remove unneeded ctor.
2021-04-23 04:26:18 +03:00
Egor Pugin
a7f938d28e
Make FontSet just a vector.
2021-04-23 04:25:45 +03:00
Egor Pugin
4ae5a7d6b5
Properly init font set.
2021-04-23 04:05:59 +03:00
Egor Pugin
048e63c02b
Replace FontSet struct with vector. It may be improved further (remove pointer?).
2021-04-23 02:38:25 +03:00
Egor Pugin
d6cdc521e5
Remove unused headers.
2021-04-23 02:06:06 +03:00
Stefan Weil
740d10b61b
Fix issue #3404 (empty page regression)
...
The regression was caused by a bug in commit 5db92b26aa
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-22 20:51:23 +02:00
Stefan Weil
66a963b50a
Remove two assertions which are triggered by fuzzing
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 19:04:49 +02:00
Stefan Weil
26c21a6db4
Fix some compiler warnings with GRAPHICS_DISABLED
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 07:58:31 +02:00
Stefan Weil
6d0595b443
Fix memory leak (OSS-Fuzz issue 33220)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-19 20:59:18 +02:00
Robert Pösel
c74ff1259b
Fix wrong parameter name and documentation
...
set_only_init_params -> set_only_non_debug_params
2021-04-19 16:55:01 +02:00
Stefan Weil
2dfa38a072
Fix old TODO for struct EDGEPT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-17 18:08:27 +02:00
Fabrizio Di Vittorio
2be896d2b9
Add SVSemaphore destructor to avoid system objects leaks
2021-04-15 09:23:22 +02:00
Stefan Weil
e6e871bc73
Replace pointer by value for ScrollView mutex
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-15 06:30:05 +02:00
Stefan Weil
4daf781916
Fix NULL pointer access (issue #3394 )
...
The regression was caused by commit 57c90eee02
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 22:10:12 +02:00
Stefan Weil
91b2b4f4a0
Fix OSS-Fuzz issue 32142 (container-overflow write)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 13:45:12 +02:00
Stefan Weil
f83f00496e
Clean, format and optimize code in edgblob.cpp / edgblob.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-12 08:03:30 +02:00
Egor Pugin
a732565cad
Fix headers.
2021-04-12 01:40:40 +03:00
Egor Pugin
4f6ff85123
Remove unneeded header.
2021-04-12 01:19:00 +03:00
Egor Pugin
57c90eee02
[edgblob] Replace unique ptr with vector. Fix possible index issues.
...
Closes #1921 .
2021-04-12 01:17:57 +03:00
Stefan Weil
cca46e6b29
Fix another use-after-free (issue #3394 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 21:37:46 +02:00
Stefan Weil
33fa9d3223
Fix use-after-free (issue #3394 )
...
This bug was introduced by commit f77b1c6881
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 19:10:44 +02:00
Egor Pugin
423f00c351
Merge pull request #3393 from eighttails/fix_zero_division
...
Fix division by zero during CJK training.
2021-04-11 15:38:28 +03:00
Tadahito Yao
8a8204e62a
Reverted one of zero value checks.
2021-04-11 21:30:02 +09:00
Tadahito Yao
05eef742df
Fix division by zero during CJK training.
2021-04-11 20:14:45 +09:00
Stefan Weil
0401b9470c
Fix some typos (most found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 11:06:36 +02:00
Stefan Weil
f77b1c6881
Fix memory leak (OSS-Fuzz issue #32246 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-10 21:35:31 +02:00
Amit D
a4a84c4c92
lstmrecognizer.cpp: Call OutputStats() only when 'invert' is true ( #3387 )
...
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 17:55:23 +02:00
Amit Dovev
e6ce048426
Change message from 'Found SSE' to 'Found SSE4.1'
2021-04-08 17:51:09 +02:00
Stefan Weil
63f4463028
Add const attribute to some functions (API change)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
253751c331
Simplify class REJ by replacing two std::bitset<16> by one std::bitset<32>
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
2fbcca783b
Make more functions in class REJ inline
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
a74bbb6032
Remove bits16.h and BITS16 data type
...
Add also const attribute to some functions.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-08 10:43:21 +02:00
Stefan Weil
2fa96b765b
Modernize and optimize list_rec a little bit
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 17:30:33 +02:00
Stefan Weil
7fd90498ca
Modernize code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 17:30:33 +02:00
Egor Pugin
edfce72340
Refactor microfeatures a bit.
2021-04-07 17:29:46 +03:00
Egor Pugin
47715e576a
Replace microfeatures from oldlist to std::forward_list.
2021-04-07 17:10:16 +03:00
Egor Pugin
2e17ee7327
Correct template args.
2021-04-07 13:28:57 +03:00
Stefan Weil
10255d013a
Fix new / delete class mismatch
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-07 09:25:37 +02:00
Egor Pugin
b1731b6e73
Add missing TESS_API.
2021-04-07 00:59:36 +03:00
Egor Pugin
6e3259593a
Reorder list templates.
2021-04-07 00:29:07 +03:00
Egor Pugin
409aa5296f
Misc.
2021-04-07 00:17:04 +03:00
Egor Pugin
9d40512ade
[elist2] Convert macros to template. Remove source file macro ELIST2IZE.
2021-04-07 00:15:01 +03:00
Egor Pugin
03435adca0
[elist] Rework macro into template and small macro. Move common iterator template into 'list_iterator.h'.
2021-04-07 00:04:30 +03:00
Egor Pugin
b9329e599f
Misc.
2021-04-06 23:45:28 +03:00
Egor Pugin
746b87363b
Remove unused methods.
2021-04-06 23:45:22 +03:00
Egor Pugin
29e75d0f51
[elist] Remove unused macros QUOTE_IT.
2021-04-06 23:40:56 +03:00
Egor Pugin
539f4b8255
[clist] Remove unused methods.
2021-04-06 23:40:35 +03:00
Egor Pugin
18e61d10ce
Rework big clist macro into template and small macro. Remove unused macros QUOTE_IT and CLISTIZE (source file macro).
2021-04-06 23:37:14 +03:00
Raf Schietekat
6bbfef7c85
RAII: TessBaseAPI::GetIterator()
...
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 17:57:23 +02:00
Raf Schietekat
d71413f4aa
RAII: TessBaseAPI::AnalyseLayout()
...
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 17:46:26 +02:00
Stefan Weil
897e59613d
Clean code for hOCR renderer
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 16:36:23 +02:00
Stefan Weil
3705989c94
Optimize length method for ELIST, ELIST2
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:57:12 +02:00
Stefan Weil
4104876b08
Add const attribute to some methods of ELIST, ELIST2 and related classes
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:48:18 +02:00
Stefan Weil
fb904d2265
Remove redundant debug code for CLIST
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:26:04 +02:00
Stefan Weil
b47ce5643b
Modernize CLIST code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:16:57 +02:00
Stefan Weil
fd187b0c18
Optimize CLIST
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 15:08:35 +02:00
Stefan Weil
4a628729b2
Delete assignment and copy constructor for ELIST
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:59:31 +02:00
Stefan Weil
b0b5600c30
Delete assignment and copy constructor for ELIST2, ELIST2_LINK
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:59:00 +02:00
Stefan Weil
24f91fab0b
Delete assignment and copy constructor for CLIST, CLIST_LINK
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 19:42:01 +02:00
Stefan Weil
eeb67e8ae8
Replace find / insert by insert on unordered set to optimize GridSearch
...
Both find and insert can be slow for a large unordered set.
Instead of using both methods, it is sufficient to simply try only
the insert method which returns whether the insertion was possible
or not.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-05 18:11:33 +02:00
Egor Pugin
50aec308b3
Remove unnecessary pointer hasher for uset.
2021-04-04 14:00:46 +03:00
Stefan Weil
0611c892b6
Disable more code with GRAPHICS_DISABLED
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-02 16:43:26 +02:00
Egor Pugin
7a73875bd1
Merge pull request #3375 from amitdo/viewer
...
Disable more code with GRAPHICS_DISABLED
2021-04-02 12:27:24 +03:00
Amit Dovev
6d94b22c80
Disable more code with GRAPHICS_DISABLED
2021-04-02 11:12:38 +03:00
Egor Pugin
34e0d017ab
Add Image::operator&=().
2021-04-01 19:15:58 +03:00
Egor Pugin
9e3da4a724
Add Image::operator|=().
2021-04-01 19:10:48 +03:00
Egor Pugin
e077b7255d
Remove arg from Image::copy().
2021-04-01 19:08:47 +03:00
Egor Pugin
d5fb7f9843
Init variable.
2021-04-01 17:16:46 +03:00
Egor Pugin
fe02ba2363
Add Image::isZero().
2021-04-01 17:15:48 +03:00
Egor Pugin
306d296979
Add Image::clone().
2021-04-01 17:06:30 +03:00
Egor Pugin
2aca22439e
Add Image::copy().
2021-04-01 16:55:43 +03:00
Stefan Weil
5159f9aa12
Fix name conflict between class and function named Image
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-01 14:00:08 +02:00
Egor Pugin
e429b607ae
[misc] Update header guard.
2021-04-01 01:36:22 +03:00
Egor Pugin
1628a9aae3
Revert 4fa05b9147
. Make a note.
2021-04-01 01:35:50 +03:00
Egor Pugin
a792b67983
Basic usage of new Image class. Only pixDestroy is wrapped at the moment.
...
Add new methods to Image class and replace them in non-public code.
2021-03-31 22:39:43 +03:00
Egor Pugin
ce6e2f1821
Initial tesseract Image wrapper.
...
Provide basic Pix conversions.
Add destroy() method.
It can be extended later to 1) image owner (raii), 2) different image libraries.
2021-03-31 22:38:32 +03:00
Egor Pugin
4fa05b9147
Remove unused ifdef.
2021-03-31 21:54:12 +03:00
Stefan Weil
722767633e
Partially fix issue #3374
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-31 19:23:07 +02:00
Stefan Weil
b7c6d971f3
Fix some compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-31 07:08:53 +02:00
Stefan Weil
6684a727c1
Improve some structs further (fixes several CID issues)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-30 14:20:52 +02:00
Nick White
abea25ee2f
lstm: Include missing header
2021-03-29 18:53:35 +02:00
Stefan Weil
2e349dbba5
Fix compilation for Tensorflow code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 16:19:06 +02:00
Stefan Weil
3c03d70e64
Fix some compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 16:12:52 +02:00
Stefan Weil
f639500a81
Add missing TESS_API for sw builds
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:34:23 +02:00
Stefan Weil
5c4de14567
Replace strdup / free by std::string in SVSync::StartProcess
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
3790413cc5
Replace remaining malloc / free in training code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
7c1bea505a
Replace strdup / free by std::string for StringRenderer::features_
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 11:24:58 +02:00
Stefan Weil
201686feb8
Use lept_free instead of free for memory which was allocated by Leptonica
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:55:33 +02:00
Stefan Weil
1b95eb1d19
Replace malloc / free by std::string for LABELEDLISTNODE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:29:08 +02:00
Stefan Weil
1620daffcd
Replace malloc / free by std::string in LABELEDLISTNODE and MERGE_CLASS_NODE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-29 10:17:42 +02:00
Stefan Weil
0976e23387
Replace malloc / free by new / delete for KDTREE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 23:19:46 +02:00
Stefan Weil
c05d849381
Replace malloc / free by new / delete for NORM_PROTOS
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:37:47 +02:00
Stefan Weil
174210c849
Replace malloc / free by new / delete for MFEDGEPT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:24:51 +02:00
Stefan Weil
0c3d244238
Replace new / delete by std::vector for INT_CLASS_STRUCT::ProtoLengths
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 22:09:06 +02:00
Stefan Weil
486c257f42
Replace malloc / free by new / delete for MICROFEATURE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 21:20:59 +02:00
Stefan Weil
30f44f333a
Replace malloc / free by new / delete for KDNODE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 21:11:22 +02:00
Stefan Weil
47a1fd7b45
Replace malloc / free by new / delete for INT_CLASS_STRUCT::ProtoLengths
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:41:37 +02:00
Stefan Weil
d6caae3793
Replace malloc / free by std::vector for BUCKETS
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:32:57 +02:00
Stefan Weil
78f8a47d05
Replace malloc / free by std::vector for PROTOTYPE::Distrib
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
b8488dac7a
Replace malloc / free for TEMPCLUSTER
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
2a569c9cfb
Replace malloc / free for FLOATUNION::Elliptical
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
5bf1af257c
Use std::vector<BIT_VECTOR> for CLASS_STRUCT::Configurations
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
6f499f7fb5
Use std::vector<PROTO_STRUCT> for CLASS_STRUCT::Prototypes
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
441f74c1e6
Replace malloc / free for STATISTICS
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:35 +02:00
Stefan Weil
57d3a1eb99
Replace malloc / free for CLUSTER::Mean and PROTOTYPE::Mean
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 20:31:32 +02:00
Stefan Weil
667eee2344
Replace malloc / free for CLIST
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
0077bc46cf
Replace malloc / free for ELIST2
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
2c273c1b3b
Replace malloc / free for ELIST
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
582260a9bf
Replace malloc / free for C_OUTLINE::steps
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
b15b5d1de7
Replace malloc / free by new / delete for FEATURE_STRUCT, FEATURE_SET_STRUCT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-28 12:12:18 +02:00
Stefan Weil
aa8dda89a3
Replace malloc / free by new / delete for CHAR_DESC_STRUCT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 18:43:14 +01:00
Stefan Weil
0f90ccb9cd
Replace malloc / free by new / delete for CHISTRUCT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 16:45:14 +01:00
Stefan Weil
0a46866bcd
Replace malloc / free by new / delete for PERM_CONFIG_STRUCT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-27 16:19:40 +01:00