Commit Graph

2089 Commits

Author SHA1 Message Date
Stefan Weil
f833491ddb Remove whitespace at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-31 22:16:43 +02:00
Amit D
fa4d4449c5
Fix issue #4010 (#4041)
Enable some code blocks that were wrongly disabled when the legacy engine is disabled at compile time.
2023-03-28 18:05:57 +03:00
Stefan Weil
c7a55c1ec1 Fix some typos (partially found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-24 22:39:28 +01:00
Stefan Weil
1569e50808 textord: Catch empty rows in block iterator (fixes #4039)
When textord_blockndoc_fixed was set to 1 empty rows caused a segmentation
fault. Test also textord_blockndoc_fixed first because it is typically 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-24 15:51:40 +01:00
Ger Hobbelt
98e61a7e10
Improve the DebugDump output by slightly adjusting the format. (#4022)
* Improve the DebugDump output by slightly adjusting the format for the numeric columns, which was 3,3,3,3 and overflowing in our test runs, damaging the table layout. See rationale in the code comment:

------

  // The largest (positive and negative) numbers are reported for lindent & rindent.
  // While the column header has widths 5,4,4,5, it is therefore opportune to slightly
  // offset the widths in the format string here to allow ample space for lindent & rindent
  // while keeeping the final table output nicely readable: 4,5,5,4.

# Conflicts:
#	src/ccmain/paragraphs.cpp

* comment fix, pointed out by @stweil
2023-03-06 15:42:43 +02:00
Zdenko Podobný
9bac701d5e cmake: fix gcc-7 fatal error: filesystem: No such file or directory 2023-02-10 09:51:59 +01:00
Stefan Weil
f1e3697dd4 Fix some typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-02-08 20:51:58 +01:00
Stefan Weil
1e04be842d Replace 'can not' by 'cannot'
Both forms are used in American English, but 'cannot' is more common
(also in Tesseract code), so use it always.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-02-08 17:34:22 +01:00
Khem Raj
2025b53de6 Fix build with gcc 13 by including <cstdint>
gcc 13 moved some includes around and as a result <cstdint> is
no longer transitively included [1]. Explicitly include it for
int32_t.

[1] https://gcc.gnu.org/gcc-13/porting_to.html#header-dep-changes

Signed-off-by: Khem Raj <raj.khem@gmail.com>
2023-01-30 11:28:24 -08:00
Leander Schulten
680d1e231c Fix linkage of icu and pango 2023-01-28 04:19:45 +01:00
Stefan Weil
3bedea1bdd Fix FP division by zero in LanguageModel::ExtractFeaturesFromPath (issue #3995)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-01-20 16:45:09 +01:00
Stefan Weil
1852afe9f8 Remove unneeded type cast in LanguageModel::ExtractFeaturesFromPath
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-01-20 16:45:09 +01:00
zdenop
0ef192050a fix "cannot pass non-trivial object of type 'std::string'" 2023-01-08 19:13:48 +01:00
zdenop
804b63646f show out filename on successful created of traineddata (combine_lang_model) 2023-01-08 18:30:31 +01:00
zdenop
005bfe4950 fix "cannot pass non-trivial object of type 'std::string'" 2023-01-06 18:34:16 +01:00
zdenop
8a26329623 unicharset_extractor:
- run ReadMemBoxes only for box files
- do not write unicharset in case of broken box file
2023-01-06 15:52:42 +01:00
Stefan Weil
6a21a74ecf Suppress compiler warning caused by very long string
Add pragmas which suppress this warning from gcc or clang:

    src/ccutil/universalambigs.h:26:5: warning:
     string literal of length 170929 exceeds maximum length 65536 that
     C++ compilers are required to support [-Woverlength-strings]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 13:34:01 +01:00
Stefan Weil
369b811c99 Replace at accessor by [] operator in function Classify::CreateIntTemplates
UnicityTable did not provide the [] operator, so add it for this change.

Suggested-by: Egor Pugin <egor.pugin@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
a806d21883 Fix function ReadTrainingSamples (issue #3925)
This fixes duplicate delete when running cntraining.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
23138ab88a Fix function Classify::WriteIntTemplates (issue #3925)
It crashed when running mftraining because unicharset_size in file
"inttemp" was written with 8 bytes instead of 4 bytes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
4fa046b1b3 Fix function tesseract::write_set (issue #3925)
It crashed when running mftraining with fs.size() == 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
1fd8f8165f Fix function UnicityTable::push_back (issue #3925)
mftraining crashed because the returned value was 1 instead of 0
for the first call of UnicityTable::push_back.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
1d3b410968 Fix function ComputeChiSquared (issue #3925)
mftraining crashed if the search did not find anything.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
5591bc04ef Remove assertion in function NewSimpleProto (issue #3925)
It was triggered by mftraining.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
f969ba9161 Fix function Classify::CreateIntTemplates (issue #3925)
The old code did not work correctly if FClass->font_set.size() was 0.
It created the FontSet fs with size 1 instead of 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
6b7cb1cbc6 Add missing serialization to FILE for vector of pointers (issue #3925)
It is required for mftraining which otherwise writes a wrong shapetable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
90c09a3df3 Replace void_proc by kdwald_proc with correct arguments
This allows removing a reinterpret_cast and fixes a runtime error
with sanitizers:

runtime error: call to function
tesseract::MakePotentialClusters(tesseract::ClusteringContext*, tesseract::CLUSTER*, int)
through pointer to incorrect function type 'void (*)(...)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
8c34b0de62 Modernize function ObjectCache::DeleteUnusedObjects (fix issue with sanitizers)
The old code did not work with compiler option `-fsanitize=address,undefined`
and caused apiexample_test to run forever with this error message:

Running main() from unittest/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 4 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from EuroText
[ RUN      ] EuroText.FastLatinOCR
/usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/debug/safe_iterator.h:608:
In function:
    _Safe_iterator<type-parameter-0-0, type-parameter-0-1,
    std::bidirectional_iterator_tag>
    &__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount
    *,
    std::__cxx1998::vector<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount,
    std::allocator<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount>>>,
[...]

That error message was followed by an endless sequence of newlines.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-12 15:00:18 +01:00
zdenop
b37de16633 Revert "fix: index variable in OpenMP 'for' statement must have signed integral type"
This reverts commit bc7a7eea2f.
2022-12-11 19:49:54 +01:00
zdenop
d89ff4667b reformat code (files with tabs) 2022-12-10 20:33:35 +01:00
zdenop
f77c63d446 report missing or empty box file 2022-12-10 19:28:17 +01:00
zdenop
b7319c26f9 Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2022-12-04 18:56:40 +01:00
zdenop
bc7a7eea2f fix: index variable in OpenMP 'for' statement must have signed integral type 2022-12-04 18:56:30 +01:00
zdenop
51cf430899 fix typo (missing space) 2022-12-04 18:49:56 +01:00
Stefan Weil
a5292214b8
Fix function tesseract::WriteFeature (issue #3925) (#3972)
Fixes: 3b0759940c ("Replace more STRING by std::string")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-04 01:01:20 +02:00
Stefan Weil
af131241af
Fix training tools for legacy engine (issue #3925) (#3970)
Fixes: cac116dd11 ("Replace more PointerVector by std::vector [...]")
Signed-off-by: Stefan Weil <sw@weilnetz.de>

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-30 09:39:45 +02:00
Stefan Weil
a9c1be658e Fix a number of performance issues (reported by Coverity Scan)
Coverity Scan reports "Unnecessary object copies can affect performance"
and suggests using the auto keyword with an &.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-20 19:00:51 +01:00
Stefan Weil
4c0051d533 Add const attribute to several compare operators
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-19 21:09:24 +01:00
Robert Sachunsky
8f4aae70b8 lstm.train: allow .box from .raw.png too 2022-11-12 21:11:22 +01:00
Stefan Weil
fd83f3dc61
Merge pull request #3959 from amitdo/amitdo-pdf-Ignore-non-text-blocks
pdfrenderer.cpp: Ignore non-text blocks
2022-11-10 07:57:04 +01:00
Zdenko Podobný
7d073f24fb github action/cmake: fix macos icu linking 2022-11-09 12:40:48 +01:00
Amit Dovev
c1964560b6 pdfrenderer.cpp: Ignore non-text blocks
Fix #3957.
2022-11-08 08:02:09 +02:00
zdenop
490611e4c6 cmake: fix linux&mac build 2022-11-06 18:11:22 +01:00
zdenop
4ab09a63b2 fix typo in variable 2022-11-06 17:43:53 +01:00
zdenop
b593a57676 show dropped unrendable words 2022-11-01 18:54:48 +01:00
zdenop
41c480d4f2 cmake: install common_training and unicharset_training libs 2022-11-01 18:54:37 +01:00
zdenop
954c5413c1 cmake: we can build training tools without PkgConfig 2022-11-01 18:54:19 +01:00
Stefan Weil
23613c5c24 Fix regression (broken unit tests)
Fixes: 95019a8c ("fix issue #3940 - remove colormap before thresholding")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-10-24 06:40:11 +02:00
Stefan Weil
1ae0ec9a82 Restore comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-10-24 06:25:21 +02:00
zdenop
bca0a7fe82 Fix memory leaks in ImageThresholder::ThresholdToPix 2022-10-23 20:19:54 +02:00