Commit Graph

6420 Commits

Author SHA1 Message Date
Stefan Weil
a9ad3601b8 Avoid conversions from std::string to char* to std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-09 18:30:19 +02:00
Stefan Weil
6cb82d8b1d Avoid 32 bit overflow in multiplication (fixes 3 CodeQL CI alerts)
The CodeQL CI reports "Multiplication result converted to larger type".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-09 13:14:10 +02:00
zdenop
c9895dbad4
Merge pull request #4066 from stweil/lstmtraining
Improve format of logging from lstmtraining
2023-05-08 19:33:38 +02:00
zdenop
62962e089b
Merge pull request #4068 from stweil/sprintf
Replace deprecated sprintf
2023-05-08 13:14:57 +02:00
Stefan Weil
6b4eb8cf92 Remove unused code in function fix_rep_char
This also fixes a compiler warning:

    src/ccmain/control.cpp:1694:9: warning: variable 'gap' set but not used [-Wunused-but-set-variable]

Fixes: dbf6197471 ("Major refactor of control.cpp to enable line recognition")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-08 11:09:21 +02:00
Stefan Weil
f2452a68ad Replace deprecated sprintf
Either use snprintf or std::stringstream.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-08 10:29:51 +02:00
zdenop
7cc215c9c2
Merge pull request #4067 from stweil/misc
Replace bool array by more compact vector
2023-05-06 10:17:21 +02:00
Stefan Weil
38a49e45b4 Use less digits in filenames of checkpoints written by lstmtraining
lstmtraining had written checkpoints using names like ONB_68.852000_6368_6500.checkpoint.
Now the superfluous '000' is omitted and the name will be ONB_68.852_6368_6500.checkpoint.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-05 19:19:03 +02:00
Stefan Weil
41c5db9074 Replace bool array by more compact vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-05 12:30:52 +02:00
Stefan Weil
0f56340151 Improve format of logging from lstmtraining
- always use C ("classic") locale
- limit output of floating point values to 3 digits
- remove unneeded linefeed after log message "wrote checkpoint"

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-05-05 12:28:24 +02:00
ben417
ed69e574a9
Support for Sgaw and W Pwo Karen languages in the Myanmar validator. (#4065)
1. Added 0x102c and 0x1062 in the tone mark section, in Karen these can
be tones too.

2. Added the optional 0x103a, 0x1037, and 0x1038 after the tones. Asat
is part of the Sgaw tone mark and dot below and visarga are used as
nasal marks following the Pwo tones.
2023-05-05 09:42:56 +03:00
Amit D
9422915eb7
issue-bug.yml: Windows versions 7, 8, 8.1 are not supported anymore 2023-04-04 18:16:53 +03:00
Stefan Weil
b48f08e20c snap: Update from leptonica 1.74.2 to latest 1.83.1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-04-03 15:09:22 +02:00
林博仁(Buo-ren Lin)
7b05f9478e fix: Fix snap package building
This patch fixes the outdated snap package recipe to make the snap
buildable with the current Snapcraft release(7.3.1).

Signed-off-by: 林博仁(Buo-ren Lin) <Buo.Ren.Lin@gmail.com>
2023-04-03 15:05:26 +02:00
Stefan Weil
71af454299 Create new release 5.3.1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-04-01 21:50:30 +02:00
Stefan Weil
f833491ddb Remove whitespace at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-31 22:16:43 +02:00
Amit D
fa4d4449c5
Fix issue #4010 (#4041)
Enable some code blocks that were wrongly disabled when the legacy engine is disabled at compile time.
2023-03-28 18:05:57 +03:00
zdenop
bbc2dfcfe1 cmake: add missing HAVE_NEON to config_auto.h 2023-03-27 19:23:18 +02:00
zdenop
de6d99db7d Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2023-03-27 19:22:39 +02:00
zdenop
8045cbb7c9 cmake: adjust build to autotool settings 2023-03-27 19:22:28 +02:00
zdenop
4c59535e48 cmake: adjust build to autotool settings 2023-03-27 19:20:51 +02:00
zdenop
a0708eaff2 Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2023-03-27 19:20:16 +02:00
zdenop
426ed87c97 cmake: improve NEON build 2023-03-27 19:20:11 +02:00
Stefan Weil
c7a55c1ec1 Fix some typos (partially found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-24 22:39:28 +01:00
Stefan Weil
1569e50808 textord: Catch empty rows in block iterator (fixes #4039)
When textord_blockndoc_fixed was set to 1 empty rows caused a segmentation
fault. Test also textord_blockndoc_fixed first because it is typically 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-03-24 15:51:40 +01:00
zdenop
691de2b945 cmake: sync with autotools (OPENMP_SIMD, fast-math) 2023-03-23 20:21:34 +01:00
zdenop
484d427c67 cmake: improve style 2023-03-23 20:20:51 +01:00
Zdenko Podobný
f779c434b0 cmake: disable "-march=native" by default 2023-03-23 12:55:51 +01:00
Amit D
a6e0aa7f48
Update issue-bug.yml 2023-03-16 10:41:27 +02:00
Amit D
a7e51c2e33
Update issue-bug.yml 2023-03-16 10:13:29 +02:00
Amit D
19fe1a6785
autotools.yml: Update compilers 2023-03-10 13:56:36 +02:00
Amit D
c38471b90d
Update cmake.yml
G++ 8 is not installed by default on Ubuntu 20.04.
2023-03-10 13:22:36 +02:00
Amit D
3245322d3d
Update cmake.yml 2023-03-08 15:21:51 +02:00
Amit D
2aed93fa69
cmake.yml: Update compilers 2023-03-08 15:16:15 +02:00
Ger Hobbelt
98e61a7e10
Improve the DebugDump output by slightly adjusting the format. (#4022)
* Improve the DebugDump output by slightly adjusting the format for the numeric columns, which was 3,3,3,3 and overflowing in our test runs, damaging the table layout. See rationale in the code comment:

------

  // The largest (positive and negative) numbers are reported for lindent & rindent.
  // While the column header has widths 5,4,4,5, it is therefore opportune to slightly
  // offset the widths in the format string here to allow ample space for lindent & rindent
  // while keeeping the final table output nicely readable: 4,5,5,4.

# Conflicts:
#	src/ccmain/paragraphs.cpp

* comment fix, pointed out by @stweil
2023-03-06 15:42:43 +02:00
tooomm
ae3bfec757
Link to list of supported languages in docs (#4027)
Addresses https://github.com/tesseract-ocr/tessdoc/issues/83
2023-03-06 11:25:42 +02:00
zdenop
0977ded2b3
Update autotools.yml
gcc 7 does not implement all of C++17
2023-03-05 14:28:53 +01:00
zdenop
066fc2e11c
Update cmake.yml
gcc 7 does not implement all of C++17
2023-03-03 18:54:23 +01:00
zdenop
79065a03a3
Update cmake.yml
fix cmake GA
2023-03-01 12:44:17 +01:00
Zdenko Podobný
9d71da7854 Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2023-02-10 12:13:26 +01:00
zdenop
392e56cd87
Update cmake.yml
libarchive is broken on macos: https://github.com/libarchive/libarchive/issues/1819
2023-02-10 12:12:38 +01:00
Zdenko Podobný
9bac701d5e cmake: fix gcc-7 fatal error: filesystem: No such file or directory 2023-02-10 09:51:59 +01:00
Stefan Weil
f1e3697dd4 Fix some typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-02-08 20:51:58 +01:00
Egor Pugin
0221094275
Merge pull request #4015 from stweil/spelling
Replace 'can not' by 'cannot'
2023-02-08 22:02:06 +03:00
Stefan Weil
1e04be842d Replace 'can not' by 'cannot'
Both forms are used in American English, but 'cannot' is more common
(also in Tesseract code), so use it always.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-02-08 17:34:22 +01:00
zdenop
7becbbd627
Update cmake-win64.yml 2023-02-07 15:11:00 +01:00
Egor Pugin
efa89c6dfa
Merge pull request #4013 from ferdnyc/patch-1
Fix libdir in tesseract.pc from CMake
2023-02-03 14:23:43 +03:00
Frank Dana
5e116fa5ca
Fix libdir in tesseract.pc from CMake
tesseract.pc.cmake was hardcoding libdir to
`{prefix}/lib`, which is wrong for systems that use
`/usr/lib64/` on 64-bit. `CMAKE_INSTALL_LIBDIR`
is already expected to contain the libdir path
relative to the install prefix.
2023-02-02 19:57:59 -05:00
autoantwort
1c09782354
msvc debug: fix wrong lib name in generated pkgconfig file (#4008) 2023-01-31 15:30:45 +01:00
Egor Pugin
e3fb0c657d
Merge pull request #4009 from kraj/gcc13
Fix build with gcc 13 by including <cstdint>
2023-01-30 23:11:06 +03:00