Stefan Weil
5f27310d22
Fix some compiler warnings with --disable-legacy
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-20 14:57:08 +01:00
Amit D
47abbaa48f
Training: Fix compiler warnings ( #3650 )
...
warning: format ‘%c’ expects argument of type ‘int’, but argument 2 has type ‘tesseract::Validator::CharClass’ [-Wformat=]
2021-11-19 21:01:04 +02:00
Stefan Weil
455feb35f2
Replace char error by BCER in more training messages
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-18 21:34:16 +01:00
Stefan Weil
981c167f8c
Improve result message from lstmeval
...
Old message:
At iteration 0, stage 0, BCER eval=2.553356, BWER eval=5.586173
New message:
BCER eval=2.553356, BWER eval=5.586173
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-17 09:02:49 +01:00
Stefan Weil
c716ebdc42
Improve training messages (issue #3560 ) ( #3644 )
...
The old messages could wrongly be interpreted as CER / WER values,
but Tesseract training currently uses simple bag of characters /
bag of words error rates (see LSTMTrainer::ComputeCharError,
LSTMTrainer::ComputeWordError).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-17 09:39:23 +02:00
Stefan Weil
ef3bf98cc1
lstmtrainer: Fix comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:19:54 +01:00
Stefan Weil
83ad8a18de
Clean code with clang-tidy (performance-move-const)
...
Command used:
clang-tidy --checks="-*,performance-move-const-arg"
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:18:29 +01:00
Stefan Weil
f48620fffb
scrollview: Add const attributes
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-15 20:17:59 +01:00
Stefan Weil
f0b8c0254b
stepblob: Fix some warnings from clang-tidy
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:40:38 +01:00
Stefan Weil
25cdca6492
combine_tessdata: Print "Version:" instead of "Version string:"
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:38:52 +01:00
Stefan Weil
d8d63fd71b
Optimize performance with clang-tidy
...
The code was partially formatted with clang-format and optimized with
clang-tidy --checks="-*,perfor*" --fix src/*/*.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 15:54:04 +01:00
Stefan Weil
e5011c545a
Remove unused function ScrollView::AwaitEventAnyWindow
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 12:10:37 +01:00
Stefan Weil
37b33749da
ScrollView: Fix memory leak and modernize code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 10:34:20 +01:00
Stefan Weil
371ee2232e
Remove spaces at line endings and empty last lines
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 22:45:47 +01:00
Stefan Weil
e18826cfab
Fix some compiler warnings and modernize code in class TrainingSampleSet
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 22:33:22 +01:00
Stefan Weil
6360e60877
Modernize code in TessBaseAPI::Init
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:43:46 +01:00
Stefan Weil
03f2cfdf02
Show tessdata directory when listing models
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:43:01 +01:00
Stefan Weil
c2ee0cd06f
Fix listing of languages
...
The last fix for OCR with more than one model introduced
a regression for `tesseract --list-langs`.
Fixes: 9091055783
("Fix loading of additional model files")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 21:34:29 +01:00
Stefan Weil
ebce8ab2eb
combine_tessdata: Support -dl and -ld options
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-13 11:33:10 +01:00
Stefan Weil
9091055783
Fix loading of additional model files (issue #3635 )
...
Modernize also a for loop statement.
Fixes: d6de055acf
("Set default language for tesseract only if required")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-10 20:34:06 +01:00
Amit D
827900675b
Don't add a page separator for a single page image ( #3632 )
...
This change was requested in issue #3628 .
2021-11-08 20:49:49 +01:00
Stefan Weil
2fbe4f54bb
Fix out-of-memory in fuzzer-api (oss-fuzz issue #39185 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-07 13:49:30 +01:00
Stefan Weil
183bb3f519
Use TDimension for arguments of make_edgept
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-06 10:01:22 +01:00
Stefan Weil
6c7cfe41cc
Remove some unneeded type casts
...
Those type casts were also wrong for large image support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-06 10:01:22 +01:00
Amit D
8865fefdba
Improve the disable legacy build ( #3627 )
...
Undo API changes done in e9b8b840bf
.
2021-11-04 18:26:15 +02:00
Amit D
e9b8b840bf
Improve the disable legacy build ( #3624 )
...
Disable more code related to equation detection and osd.
2021-11-03 19:15:15 +01:00
Stefan Weil
62bfbf5aa4
Use bool instead of int8_t for boolean variable
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 11:22:14 +01:00
Stefan Weil
333f7bfc5c
Use bool instead of int for boolean variable
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 11:02:30 +01:00
Stefan Weil
87a5689f8d
Format code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 10:57:40 +01:00
Stefan Weil
a91ea10924
Optimize function ApproximateOutline
...
The compiler can now inline several functions which are
only used in this compilation unit.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-03 10:53:35 +01:00
Stefan Weil
17e795aaae
Add missing include statement for INT_MIN, INT_MAX
...
Fixes: c6b25f3b6e
("Add assertions in IntCastRounded")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-02 10:20:37 +01:00
Stefan Weil
c6b25f3b6e
Add assertions in IntCastRounded
...
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39185 could be
caused by an integer overflow in IntCastRounded.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-02 07:52:31 +01:00
Stefan Weil
565d3912c6
Fix compiler warnings with -Wformat-security
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-01 22:58:56 +01:00
Stefan Weil
a5f2f90c8d
Fix legacy build
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-01 08:34:34 +01:00
Stefan Weil
104ef8f30e
Move src/api/tesseractmain.cpp to src/tesseract.cpp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-31 21:43:30 +01:00
Stefan Weil
c0b529f2e1
Move declaration of ThresholdMethod from public API to thresholder.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 20:15:25 +02:00
Stefan Weil
97cd07f2a0
Add format attributes
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 19:55:27 +02:00
Stefan Weil
68017dbf2a
lstmtraining: Handle missing traineddata with error message (fix issue #1075 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 12:27:35 +02:00
Stefan Weil
ca9ea78494
Format code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:42:41 +02:00
Stefan Weil
57af712f2f
Fix some compiler warnings for unused parameters
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:39:05 +02:00
Stefan Weil
20203de8d9
Fix format strings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-30 09:37:30 +02:00
Stefan Weil
b4b2cacd40
Avoid segmentation fault with classify_enable_adaptive_matcher == false (issue #256 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-29 19:42:34 +02:00
Stefan Weil
612ff9b7e8
Fix sw build error by using TESS_API for global variable log_level
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 22:13:21 +02:00
Stefan Weil
b4e4e00653
Fix two memory leaks in LineFinder::FindAndRemoveLines
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 21:09:46 +02:00
Stefan Weil
1f8835d731
Fix compiler error in try / catch statement
...
Fixes: 1a6c298696
("Add new command line option --loglevel")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 20:55:46 +02:00
Stefan Weil
69e0a02399
Remove banner message completely
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 20:43:23 +02:00
Stefan Weil
491e60296c
Add missing include statement
...
Fixes: 1a6c298696
("Add new command line option --loglevel")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 15:14:55 +02:00
Amit D
fe16277fad
Disable music staff detection and removal
...
Change the default value of pageseg_apply_music_mask to false. See #1255 .
2021-10-28 15:04:27 +02:00
Stefan Weil
73a1bfc4e8
Run ReCachePages synchronously during training (fix issue #3111 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 14:40:27 +02:00
Stefan Weil
1a6c298696
Add new command line option --loglevel
...
By default some less important log messages are suppressed now.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-28 14:26:48 +02:00
zdenop
3ca273f914
cmake silent message about changed behaviour
2021-10-28 12:07:53 +02:00
Stefan Weil
5cc649e5f9
Remove code which is wrong in combination with NFC
...
See comments in https://github.com/tesseract-ocr/tesseract/pull/3420 .
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:52:03 +02:00
Stefan Weil
5cee9a0cec
Merge remote-tracking branch 'nickjwhite/nfc'
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 18:40:02 +02:00
Stefan Weil
c602624012
Prepare support for image width and height larger than 32767 (continued)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:58:31 +02:00
Stefan Weil
59fbad0dd5
Prepare support for image width and height larger than 32767
...
Avoid using int16_t and use a new data type TDimension where needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-27 08:45:33 +02:00
Stefan Weil
56f54c24de
Fix heap use after free (issue #3523 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 19:19:59 +02:00
Amit D
cea2a6015e
Thresholding: Improve some debug messages
2021-10-26 19:09:06 +03:00
Stefan Weil
d6de055acf
Set default language for tesseract only if required
...
When running with --list-langs, --print-parameters or --print-fonts-table
no default language is needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
Stefan Weil
f5d22d0bcc
Don't set a default language in TessBaseAPI::Init (API change)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-26 11:05:06 +02:00
zdenop
48c5d426ca
Merge pull request #3609 from stweil/api
...
Remove TessBaseAPI::InitLangMod (API change)
2021-10-26 07:23:52 +02:00
Stefan Weil
255d7c9675
Fix CID 1400763 Using invalid iterator (fixes issue #2806 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 22:20:45 +02:00
Stefan Weil
c2df9ce57b
Remove Tesseract::init_tesseract_lm which is no longer used
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
5738c44d40
Remove TessBaseAPI::InitLangMod (API change)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:28:23 +02:00
Stefan Weil
cdd19d561b
Remove old comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 21:26:32 +02:00
Amit Dovev
0aeb2e7913
Thresholding: Change smooth scaling logic
...
As suggested by @bertsky.
2021-10-15 19:34:39 +03:00
Amit D
9a1ad4333e
Apply suggestions from code review
...
Extend help message for 2 parameters
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
2021-10-15 18:14:49 +03:00
Amit D
0d2d6e3b2a
Fix a mismatch between tprintf format string and args
2021-10-14 20:56:48 +03:00
Amit Dovev
a268c3092f
Thresholding: Change the window and tile size parameters to relative numbers
...
They are relative to the pixel density of the image.
2021-10-14 20:21:28 +03:00
Amit D
0d5705fe50
ThresholdMethod enum: AdaptiveOtsu -> LeptonicaOtsu ( #3593 )
2021-10-13 15:03:39 +03:00
Amit D
7f349a47b6
Fix a bug in the thresholder
2021-10-11 19:29:39 +03:00
Stefan Weil
d935502b48
Fix two LGTM alerts (Comparison between i of type int16_t and wider type int32_t)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:37:04 +02:00
Stefan Weil
4a56136d34
Disable conditional which is currently always false (reported by LGTM)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:21:12 +02:00
Stefan Weil
cc085f6bd6
Fix format string (reported by LGTM)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 21:19:58 +02:00
Stefan Weil
988102c41d
Disable incomplete code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
842cca1d49
Fix more signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
86d981eee6
wordrec: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
cb10da06be
training: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
5cce7342e5
textord: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3bb8263b3e
lstm: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a274f4a531
dict: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
bcc71c675a
classify: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
e1d7a21559
ccutil: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
97048fe3e4
ccstruct: Fix some signed/unsigned compiler warnings
...
Remove also a local buffer in function REJMAP::print.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
2e4bb8f5d7
genericvector: Change function size to return unsigned value
...
Sizes are generally unsigned in the C++ standard library,
and following this standard makes code changes easier.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
d040cce990
ccmain: Remove unused local variable
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
c8fd23d6dc
ccmain: Fix more signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
3a4828bcf4
ccmain: Fix some signed/unsigned compiler warnings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Stefan Weil
a9c3f6d87f
ccmain/paragraphs: Make local function UnicodeFor and fix signed/unsigned
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
4c36e2e29a
Fix compiler warnings in TWERD::MergeBlobs and optimize code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
0cdcd0f02b
Remove unused code
...
Fixes: 766b7bd620
("Don't drop words with low certainty")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
ca0e68f046
Avoid implicit conversions from float to double
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
9315d4c7e2
Change size and count arguments in TFile from int to size_t
...
This matches standard functions like fread, fwrite.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:30 +02:00
Stefan Weil
85cb6678fa
Replace new / delete by std::unique_ptr and std::vector in class Classify
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:08:12 +02:00
Stefan Weil
5d903da1ce
Replace new / delete by std::vector in class Wordrec
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:07:25 +02:00
Stefan Weil
467f24c0b6
Replace new / delete by std::vector in class Trie
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:06:48 +02:00
Stefan Weil
ed1100832c
Replace new / delete by std::vector in class WERD_CHOICE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 14:05:47 +02:00
Stefan Weil
0aad8b8619
Fix build with OpenCL and add namespace to OpenCL code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-06 07:51:03 +02:00
Amit D
0cb9c40528
Add configurable variables to control thresholding ( #3577 )
2021-09-29 23:17:22 +03:00
zdenop
ebb214c443
destroy temporary page_pix
2021-09-25 10:26:31 +02:00
Amit D
adaaef87a4
Fix wrong tiles parameters in Sauvola ( #3570 )
...
Thanks to Robert Sachunsky @bertsky that pointed out the issue.
2021-09-23 10:26:07 +03:00
Merlijn Wajer
ca177e72f3
hocrrenderer: write scan_res property to the ocr_page
...
This will make Tesseract emit the DPI of the document, if known at OCR
time. This is requird to properly interpret the x_fsize (font size)
property of words, since Tesseract scales the font size to the DPI.
See issue #3326 (https://github.com/tesseract-ocr/tesseract/issues/3326 )
2021-09-21 11:02:52 +02:00
Stefan Weil
638045133f
Simplify function LoadTrainingData and fix mastertrainer_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-17 08:24:50 +02:00
Stefan Weil
d87e08f266
Fix crash of shapeclustering ( fixes #3564 )
...
Fixes: 4415209fd6
("Remove tessopt. This fixes mastertrainer test in shared build")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-16 22:31:09 +02:00
Stefan Weil
e5e12f2856
Disable HAVE_FRAMEWORK_ACCELERATE for compilers which fail to compile with it
...
g++-10 and g++-11 throw compiler errors in builds with the
Accelerate framework, so disable it for all GNU compilers
before version 12 (which still has to be tested).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 17:15:46 +02:00
Stefan Weil
ec87dd4d49
Abort LSTM training with integer model (fixes issue #1573 )
...
Tesseract currently cannot continue LSTM training from an
integer (fast) model.
Report this to users who try it nevertheless instead of crashing
with an assertion.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-06 08:18:55 +02:00
Stefan Weil
a027dca007
Extend URI support for Tesseract with libcurl
...
libcurl not only supports HTTP and HTTPS, but also a lot of other protocols,
for example FTP and SFTP. Those protocols can also be useful for Tesseract.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-05 16:49:22 +02:00
Stefan Weil
7fc9a34f79
Rename processed TIFF output file and add page number if needed (fixes issue #3544 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-01 14:16:05 +02:00
Robert Pösel
40fdacd485
Add missing check for __ARM_NEON
...
This makes it consistent with intsimdmatrixneon.cpp file and allows having this file included in builds even for non-NEON platforms (simplifies build config).
2021-08-26 15:28:59 +02:00
Stefan Weil
4dcd8fa591
Fix handling of TESSDATA_PREFIX containing // (fixes issue #3527 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 20:05:54 +02:00
Stefan Weil
391e713ae8
Use model prefix also for submodels
...
Fix also a regression in the for loop which handles submodels.
Fixes: 0d91c700c0
("Modernize code in Tesseract::init_tesseract")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-24 13:41:00 +02:00
Stefan Weil
0d91c700c0
Modernize code in Tesseract::init_tesseract
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-23 07:30:03 +02:00
Egor Pugin
1d3d1fbc62
Move member function bodies into class template.
2021-08-20 12:42:40 +03:00
Egor Pugin
c539328d7d
Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract
2021-08-20 12:38:12 +03:00
Egor Pugin
407346246c
[universalambigs] Use inline variables.
2021-08-20 12:38:03 +03:00
Stefan Weil
7acda5cb6c
Fix cloning of Image with pix_ == nullptr (issue #537 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-18 19:22:23 +02:00
Egor Pugin
6056c84977
[image] Mark PIX** cast explicit to prevent implicit bool checks in ternary operators.
2021-08-18 18:14:47 +03:00
Stefan Weil
59271470b4
Remove unneeded type cast
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 20:55:14 +02:00
Stefan Weil
aaec341449
Avoid call of ColumnFinder::DisplayBlocks (small optimization)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 15:23:44 +02:00
Stefan Weil
6da7d6fcda
Optimize check for non empty string and fix code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:45:22 +02:00
Stefan Weil
92cae8f194
Optimize check for non empty string
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-12 14:44:45 +02:00
Stefan Weil
3ef403c345
Compile LSTM::PrintW and LSTM::PrintDW conditionally
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
5d99041f5d
Remove unused function Wordrec::merge_fragments
...
Remove also more functions which are now also unused.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
f1c8df0ce9
Remove unused global variable fx_debug
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-10 22:04:57 +02:00
Stefan Weil
16fd1439fa
Write image filename in ALTO output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
5f10fed5d9
Reduce size of TessResultRenderer
...
Changing the order reduces the size from 72 to 64 bytes
on 64 bit Linux.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-07 22:14:03 +02:00
Stefan Weil
a73e7b97a4
Add float dotproduct implementation for NEON
...
Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>
2021-08-03 10:35:22 +02:00
Stefan Weil
bb4a1219d7
Improve setting of dot product functions via environment variable
...
Apply the settings which are selected by environment variable DOTPRODUCT
after the autodetection which detects the available SIMD hardware.
'accelerate', 'fma' and 'std::inner_product' now no longer change
the setting for intSimdMatrix to 'generic' because they don't provide
their own implementation for it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-03 10:34:33 +02:00
Stefan Weil
edcf4fcd3b
Fix comment
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-08-01 13:17:45 +02:00
Stefan Weil
0d0f203509
Add new configure option --enable-float32 for faster LSTM with float
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-29 06:49:08 +02:00
Stefan Weil
553ab64d8d
Rename UnicityTable<T>::get_id to UnicityTable<T>::get_index
...
This prepares replacing UnicityTable<FontInfo> by FontInfoTable.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-26 07:59:58 +02:00
Stefan Weil
df1295ea6b
Simplify *_VAR_H macros ( #3508 )
...
This avoids duplicate (and potentially inconsistent) code.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-25 12:09:07 +03:00
Ger Hobbelt
27597883db
Implement DotProductSSE() for FAST_FLOAT
...
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
79e8b4f344
bugfixing the AVX2 Extract8+16 codes
...
There's lines like `__m256d scale01234567 = _mm256_loadu_ps(scales)`,
i.e. loading float vectors into double vector types.
[sw] Formatted commit message
2021-07-24 15:14:17 +02:00
Ger Hobbelt
24a29b79e5
bugfix of FMA port to FAST_FLOAT
...
8 float FPs fit in a single 256bit vector (8x32)
(contrasting 4 double FPs: 4*64)
[sw] Format commit message and use float instead of TFloat
2021-07-24 15:14:17 +02:00
Stefan Weil
472f5d9020
Add TFloat data type for neural network
...
Up to now Tesseract used double for training and recognition
with "best" models.
This commit replaces double by a new data type TFloat which
is double by default, but float if FAST_FLOAT is defined.
Ideally this should allow faster training.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 15:14:17 +02:00
Stefan Weil
66b77e6639
Prepare using float instead of double for LSTM calculations
...
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).
FloatToDouble is now a local function.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-24 13:59:37 +02:00
Stefan Weil
4df822a3fc
Revert "Merge pull request #3330 from Sintun/master" ( #3505 )
...
This reverts commit 122daf1d64
, reversing
changes made to 4cd56dc5f5
.
Those changes caused two regressions which resulted in an assertion
or a segmentation fault.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-22 09:04:23 +03:00
Stefan Weil
e176169a90
Remove stray spaces at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:59:15 +02:00
Ger Hobbelt
444fe14273
Fix a couple of 'shadowed local variables' compiler warnings
...
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)
[sw]: Format commit message and use different fix for blamer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
0fc6d8d7f0
Add missing hint for dotproduct parameter value "fma"
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:44:29 +02:00
Ger Hobbelt
f72d4b1fe7
NEON arch: dead ref cycle fix
...
When neon_available_ is ON, the DotProduct was set to point to DotProduct,
which should have been DotProductNative, as dotProduct is the *target* global itself:
see simddetect.h --> effectively making that part of the SetDotProduct() call
identical to this (no-op) statement: `DotProduct = DotProduct;`
Also added the Neon check in the Update() API, where it exists together
with the other checks (for AVX/SSE/etc.)
[sw: formatted commit message and merged into main branch]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:40:16 +02:00
Stefan Weil
dff7312aed
Modernize code in SIMDDetect::Update
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:16:49 +02:00
Stefan Weil
3ab8dcbf72
Use Apple Accelerate framework for training and best models
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 19:27:54 +02:00
Johannes Künsebeck
3be11f12a9
Removed unused parameters declarations and definitions
2021-07-20 15:08:10 +02:00
zdenop
8dd7936475
Solve clang reporting unused variable in ExtractMicros function ( #3501 )
...
* mark attribute as unused for compiler
* try c++17 standard https://en.cppreference.com/w/cpp/language/attributes/maybe_unused
2021-07-18 01:59:49 +02:00
nagadomi
7fe0624838
Fix spec string of convolution layer ( #3499 )
2021-07-16 18:21:52 +03:00
Stefan Weil
88d4028a5a
Enable pragma for SIMD also when _OPENMP is defined
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-15 16:03:43 +02:00
Stefan Weil
f0fb6809e3
Use SIMD instructions for DotProductNative
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-14 19:13:01 +02:00
Tadahito Yao
12e0fb4e01
Fix deadlock in lstmtraing. ( #3488 )
2021-07-10 10:59:10 +03:00
Stefan Weil
767fb5a177
Fix LSTMTrainerTest.BidiTest
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-04 18:41:19 +02:00
Stefan Weil
158c845228
Catch another FP division by 0 (fixes issue #3483 )
...
Rewriting the code avoids FP operations (so makes it potentially faster)
and fixes the division by 0.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-03 15:37:24 +02:00
Stefan Weil
4b630a8813
Catch FP division by 0 (fixes issue #3483 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-02 15:04:31 +02:00
Stefan Weil
a701454ae5
Fix vector resize with init for all elements (issue #3473 ) ( #3474 )
...
Fixes: c8b8d266d6
Fixes: 9710bc0465
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-06-29 21:05:29 +03:00
nagadomi
ff1062d39d
Add --reset_learning_rate option to lstmtraining ( #3470 )
...
When the --reset_learning_rate option is specified,
it resets the learning rate stored in each layer of the network
loaded with --continue_from to the value specified by the --learning_rate option.
If checkpoint is available, it does nothing.
2021-06-28 11:48:07 +03:00
nagadomi
d8bd78f8e2
Fix missing reset of best_error_history_ in LSTMTrainer::InitIterations() ( #3469 )
2021-06-27 09:26:32 +03:00
nagadomi
b2fa77f8f0
Show layer specified learning rates with combine_tessdata -l ( #3468 )
2021-06-26 08:08:54 +03:00
MonkeybreadSoftware
75e6c3ea4c
Null check for GetSourceYResolution ( #3457 )
...
* Null check for GetSourceYResolution
Added missing NULL check to avoid crash when we read property in our tesseract wrapper.
* Added missing return value.
added -1 to return if undefined.
2021-06-16 16:35:24 +03:00
Amit Dovev
bf979c801a
Remove unused variable
2021-05-21 20:34:09 +03:00
Egor Pugin
a72408fdef
Merge pull request #3438 from amitdo/pango
...
Raise Minimum required Pango version to 1.38.0
2021-05-21 20:09:27 +03:00
Amit Dovev
8615f65cc4
Raise Minimum required Pango version to 1.38.0
2021-05-21 19:56:37 +03:00
Amit Dovev
c24538518c
ThresholdMethod::TiledSauvola -> ThresholdMethod::Sauvola
...
The fact that this method uses tiles is implementation detail. It does not change the result compared to Sauvola without tiles. The use of tiles minimize memory consumption.
2021-05-21 18:15:30 +03:00
Stefan Weil
93348a83a3
Remove scripts for training
...
They were replaced by Python3 scripts (part of the tesstrain repository).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-18 10:47:44 +02:00
nagadomi
42e4b91132
Refactor ObjectCache::DeleteUnusedObjects with reverse iterator
2021-05-17 14:50:30 +02:00
nagadomi
dc4a8a6ce0
Fix crash in ObjectCache::DeleteUnusedObjects
2021-05-17 10:25:17 +09:00
Stefan Weil
0c4e2f1cb5
Fix comment in code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-16 07:47:19 +02:00
Stefan Weil
57b7974292
Remove an arbitrary limit for the image size
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
a0cf117c5d
Fix compiler warning in binarization code (uninitialized local variable)
...
Simplify the code also a little bit.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
bf84fb9f2d
Optimize code for binarization
...
Some code is only needed for Otsu or even not at all.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
4b5dd25b84
Fix compiler warning
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-15 15:03:22 +02:00
Stefan Weil
12c29639fc
Add conditional compilation with GRAPHICS_DISABLED
...
This fixes a compiler warning when GRAPHICS_DISABLED is defined.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-05-13 17:22:24 +02:00
Nick White
ad7010a5eb
lstmeval: Only print char and word error rates for verbosity 2/3
2021-05-11 13:15:35 +01:00
Nick White
4787414d88
lstmeval: Print char and word error rates for each line tested
2021-05-11 10:54:34 +01:00
Nick White
9c82cc63c2
Switch to NFC normalisation everywhere
2021-05-11 10:18:06 +01:00
Egor Pugin
43747d6ea8
Postfix for #3418 .
2021-05-10 15:06:27 +03:00
Egor Pugin
e7c01a6f15
Merge pull request #3418 from amitdo/thresholder
...
Add more binarization options
2021-05-10 14:45:03 +03:00
Amit Dovev
21e76c7a13
Convert enum ThreshMethod to enum class
2021-05-09 18:49:09 +03:00
Egor Pugin
176d0927bd
Allow explicit casts of Image to Pix**.
2021-05-07 21:30:42 +03:00
Amit Dovev
11c73c9481
Add more binarization options
...
Use functions from Leptonica to provide more binarization options. The new options are: 1) Adaptive Otsu and 2) Sauvola (Tiled) .
2021-05-07 16:48:26 +03:00
Egor Pugin
65118b2e3a
[misc] Fix variable type. Fixes warning.
2021-05-04 16:12:40 +03:00
Egor Pugin
346b77c94e
Remove unneeded header.
2021-05-04 16:10:52 +03:00
Egor Pugin
4fbe9f1de2
Revert d6cdc52
. Fixes #3412 .
2021-05-04 00:51:39 +03:00
Ger Hobbelt
bd8adff829
fix compile error: PrintFontsTable() is for legacy builds only
...
# Conflicts:
# googletest
2021-04-29 23:27:20 +02:00
Lucas Cimon
b852d658cb
Adding --print-fonts-table parameter & tessedit_font_id configuration option
2021-04-29 11:25:40 +02:00
Stefan Weil
2e2a5b3ef4
Improved fix for issue #3405
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:36 +02:00
Stefan Weil
0b7fc068d2
Revert "Fix double free. Closes #3405."
...
This reverts commit 3997cf54d2
.
It will be replaced by a simpler fix.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-27 22:15:18 +02:00
Egor Pugin
3a195e5b05
Misc.
2021-04-27 22:08:29 +03:00
Egor Pugin
3997cf54d2
Fix double free. Closes #3405 .
2021-04-27 22:08:06 +03:00
Egor Pugin
e3ac1835e0
Remove unneeded ctor.
2021-04-23 04:26:18 +03:00
Egor Pugin
a7f938d28e
Make FontSet just a vector.
2021-04-23 04:25:45 +03:00
Egor Pugin
4ae5a7d6b5
Properly init font set.
2021-04-23 04:05:59 +03:00
Egor Pugin
048e63c02b
Replace FontSet struct with vector. It may be improved further (remove pointer?).
2021-04-23 02:38:25 +03:00
Egor Pugin
d6cdc521e5
Remove unused headers.
2021-04-23 02:06:06 +03:00
Stefan Weil
740d10b61b
Fix issue #3404 (empty page regression)
...
The regression was caused by a bug in commit 5db92b26aa
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-22 20:51:23 +02:00
Stefan Weil
66a963b50a
Remove two assertions which are triggered by fuzzing
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 19:04:49 +02:00
Stefan Weil
26c21a6db4
Fix some compiler warnings with GRAPHICS_DISABLED
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-20 07:58:31 +02:00
Stefan Weil
6d0595b443
Fix memory leak (OSS-Fuzz issue 33220)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-19 20:59:18 +02:00
Robert Pösel
c74ff1259b
Fix wrong parameter name and documentation
...
set_only_init_params -> set_only_non_debug_params
2021-04-19 16:55:01 +02:00
Stefan Weil
2dfa38a072
Fix old TODO for struct EDGEPT
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-17 18:08:27 +02:00
Fabrizio Di Vittorio
2be896d2b9
Add SVSemaphore destructor to avoid system objects leaks
2021-04-15 09:23:22 +02:00
Stefan Weil
e6e871bc73
Replace pointer by value for ScrollView mutex
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-15 06:30:05 +02:00