Commit Graph

6149 Commits

Author SHA1 Message Date
Amit D
7f5345d207
Update README.md
'Promote' @stweil ... :-)
2022-12-19 10:14:32 +02:00
Zdenko Podobný
f25196151b cmake - msvc/openmp: clean&document configuration 2022-12-15 13:26:56 +01:00
Zdenko Podobný
f2f37a8323 cmake - mscvc: silent warning C4068: unknown pragma 'GCC' 2022-12-15 13:25:43 +01:00
Stefan Weil
86a7bc6c06 Create new release 5.3.0-rc1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 17:37:35 +01:00
Stefan Weil
6e4de524d0 Replace MacOS -> macOS
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 17:37:35 +01:00
Stefan Weil
6a21a74ecf Suppress compiler warning caused by very long string
Add pragmas which suppress this warning from gcc or clang:

    src/ccutil/universalambigs.h:26:5: warning:
     string literal of length 170929 exceeds maximum length 65536 that
     C++ compilers are required to support [-Woverlength-strings]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 13:34:01 +01:00
Stefan Weil
369b811c99 Replace at accessor by [] operator in function Classify::CreateIntTemplates
UnicityTable did not provide the [] operator, so add it for this change.

Suggested-by: Egor Pugin <egor.pugin@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
a806d21883 Fix function ReadTrainingSamples (issue #3925)
This fixes duplicate delete when running cntraining.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
23138ab88a Fix function Classify::WriteIntTemplates (issue #3925)
It crashed when running mftraining because unicharset_size in file
"inttemp" was written with 8 bytes instead of 4 bytes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
4fa046b1b3 Fix function tesseract::write_set (issue #3925)
It crashed when running mftraining with fs.size() == 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
1fd8f8165f Fix function UnicityTable::push_back (issue #3925)
mftraining crashed because the returned value was 1 instead of 0
for the first call of UnicityTable::push_back.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
1d3b410968 Fix function ComputeChiSquared (issue #3925)
mftraining crashed if the search did not find anything.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
5591bc04ef Remove assertion in function NewSimpleProto (issue #3925)
It was triggered by mftraining.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
f969ba9161 Fix function Classify::CreateIntTemplates (issue #3925)
The old code did not work correctly if FClass->font_set.size() was 0.
It created the FontSet fs with size 1 instead of 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
6b7cb1cbc6 Add missing serialization to FILE for vector of pointers (issue #3925)
It is required for mftraining which otherwise writes a wrong shapetable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Stefan Weil
90c09a3df3 Replace void_proc by kdwald_proc with correct arguments
This allows removing a reinterpret_cast and fixes a runtime error
with sanitizers:

runtime error: call to function
tesseract::MakePotentialClusters(tesseract::ClusteringContext*, tesseract::CLUSTER*, int)
through pointer to incorrect function type 'void (*)(...)'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-13 08:04:50 +01:00
Zdenko Podobný
04551ce2a6 clang-format: use default value for line width (80) 2022-12-12 16:55:34 +01:00
Egor Pugin
0680ba870e
Merge pull request #3978 from stweil/sanfix
Modernize function ObjectCache::DeleteUnusedObjects (fix issue with s…
2022-12-12 17:56:42 +03:00
Stefan Weil
8c34b0de62 Modernize function ObjectCache::DeleteUnusedObjects (fix issue with sanitizers)
The old code did not work with compiler option `-fsanitize=address,undefined`
and caused apiexample_test to run forever with this error message:

Running main() from unittest/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 4 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from EuroText
[ RUN      ] EuroText.FastLatinOCR
/usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/debug/safe_iterator.h:608:
In function:
    _Safe_iterator<type-parameter-0-0, type-parameter-0-1,
    std::bidirectional_iterator_tag>
    &__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount
    *,
    std::__cxx1998::vector<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount,
    std::allocator<tesseract::ObjectCache<tesseract::Dawg>::ReferenceCount>>>,
[...]

That error message was followed by an endless sequence of newlines.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-12 15:00:18 +01:00
zdenop
b37de16633 Revert "fix: index variable in OpenMP 'for' statement must have signed integral type"
This reverts commit bc7a7eea2f.
2022-12-11 19:49:54 +01:00
zdenop
d89ff4667b reformat code (files with tabs) 2022-12-10 20:33:35 +01:00
zdenop
f77c63d446 report missing or empty box file 2022-12-10 19:28:17 +01:00
zdenop
4ebaa4bffb GA: use png 1.6.39 from cmake-win64 2022-12-08 20:04:10 +01:00
zdenop
b7319c26f9 Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2022-12-04 18:56:40 +01:00
zdenop
bc7a7eea2f fix: index variable in OpenMP 'for' statement must have signed integral type 2022-12-04 18:56:30 +01:00
zdenop
51cf430899 fix typo (missing space) 2022-12-04 18:49:56 +01:00
Stefan Weil
a5292214b8
Fix function tesseract::WriteFeature (issue #3925) (#3972)
Fixes: 3b0759940c ("Replace more STRING by std::string")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-12-04 01:01:20 +02:00
zdenop
c1a1d7e00c
Update cmake-win64.yml
start scheduling cmake-win64 GA
2022-11-30 15:43:52 +01:00
zdenop
cdf6b601ce
Update cmake-win64.yml 2022-11-30 14:37:32 +01:00
zdenop
9cd5012e89
Update cmake-win64.yml
remove unused features in GA test
2022-11-30 14:36:48 +01:00
Zdenko Podobný
7e51f0bac5 GA cmake-win64: uninstall strawberryperl to fix libtiff build 2022-11-30 11:34:10 +01:00
Zdenko Podobný
ac8ff2eae9 GA cmake-win64: fix getting version info 2022-11-30 10:42:39 +01:00
Stefan Weil
af131241af
Fix training tools for legacy engine (issue #3925) (#3970)
Fixes: cac116dd11 ("Replace more PointerVector by std::vector [...]")
Signed-off-by: Stefan Weil <sw@weilnetz.de>

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-30 09:39:45 +02:00
zdenop
7221973c3b stop cron cmake-win64 build 2022-11-29 19:33:48 +01:00
zdenop
8fffed42ad
Update cmake-win64.yml 2022-11-28 07:45:20 +01:00
Egor Pugin
e7307fd6b4
[sw] Remove VS2019 builds. 2022-11-28 00:39:01 +03:00
zdenop
a94a9ef01a
Update cmake-win64.yml 2022-11-27 16:10:39 +01:00
zdenop
e30b36bb02
Update cmake-win64.yml
test cmake -win64 GA failure
2022-11-26 11:52:25 +01:00
zdenop
5f297dc0b8
Merge pull request #3967 from stweil/coverity
Fix a number of performance issues (reported by Coverity Scan)
2022-11-21 07:03:46 +01:00
Stefan Weil
a9c1be658e Fix a number of performance issues (reported by Coverity Scan)
Coverity Scan reports "Unnecessary object copies can affect performance"
and suggests using the auto keyword with an &.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-20 19:00:51 +01:00
Stefan Weil
4c0051d533 Add const attribute to several compare operators
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-19 21:09:24 +01:00
Egor Pugin
1751fba623
[sw] Do a lightweight build during PRs. 2022-11-15 15:24:50 +03:00
Stefan Weil
adbefa8316 Fix AMD64 detection with autobuild on FreeBSD (#3964)
Tesseract for FreeBSD was built without support for SSE4.1, AVX,
AVX2 or FMA because it uses a different value for `host_cpu`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-14 18:30:42 +01:00
Robert Sachunsky
8f4aae70b8 lstm.train: allow .box from .raw.png too 2022-11-12 21:11:22 +01:00
zdenop
78bcc0d84c
Update cmake.yml
github action cmake macos: add setting for compilers to find icu4c
2022-11-10 08:58:02 +01:00
Stefan Weil
fd83f3dc61
Merge pull request #3959 from amitdo/amitdo-pdf-Ignore-non-text-blocks
pdfrenderer.cpp: Ignore non-text blocks
2022-11-10 07:57:04 +01:00
Stefan Weil
c01ddc033c Remove remaining references to deprecated LGTM (fix for #3958)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-11-10 07:54:24 +01:00
Zdenko Podobný
7d073f24fb github action/cmake: fix macos icu linking 2022-11-09 12:40:48 +01:00
Amit Dovev
c1964560b6 pdfrenderer.cpp: Ignore non-text blocks
Fix #3957.
2022-11-08 08:02:09 +02:00
zdenop
490611e4c6 cmake: fix linux&mac build 2022-11-06 18:11:22 +01:00