Commit Graph

6489 Commits

Author SHA1 Message Date
Stefan Weil
1e8640a02e Fix CID 1534938 (COPY_INSTEAD_OF_MOVE)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-10 11:22:49 +01:00
Stefan Weil
3fedc6cdfc Fix CID 1534939 (COPY_INSTEAD_OF_MOVE)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-10 11:21:04 +01:00
Stefan Weil
02409f578a Fix CID 1534945 (COPY_INSTEAD_OF_MOVE)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-10 11:19:34 +01:00
Stefan Weil
e83f78020e Fix stringToOEM and stringToPSM
Remove debug output and fix an out-of-bounds read for unsupported arguments.

Fixes: e8a9a56f9f ("Support symbolic values for --oem and --psm options")
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-10 10:16:37 +01:00
Stefan Weil
49cbe2b47d Fix compiler warning for argument of getaddrinfo
Some checks failed
CodeQL / Analyze (cpp) (push) Has been cancelled
sw / build (fedora:latest, ubuntu-22.04) (push) Has been cancelled
sw / build (macos-latest) (push) Has been cancelled
sw / build (windows-2022) (push) Has been cancelled
unittest-disablelegacy / linux (clang++-15, ubuntu-22.04) (push) Has been cancelled
unittest-disablelegacy / linux (g++, ubuntu-22.04) (push) Has been cancelled
msys2 / windows (mingw-w64-x86_64, MINGW64) (push) Has been cancelled
Fix this clang warning:

    src/viewer/svutil.cpp:277:51:
      warning: missing field 'ai_protocol' initializer [-Wmissing-field-initializers]

Replace also PF_INET by AF_INET which is the recommended value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-09 12:05:03 +01:00
Stefan Weil
cdb7ff90e4 Update submodule googletest to release v1.15.2
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-08 20:30:36 +01:00
Stefan Weil
2a1ce80a42 Fix compilation of unittest/third_party/utf/rune.c
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-08 08:15:59 +01:00
sunyuechi
16fc9d90a4 Add RISC-V V support (#4346)
Convert riscv-v-spec-1.0.pdf into 111 PNG images,
then perform OCR on each one in sequence,
and measure the testing time on banana_f3:

old:        31m16.267s
new:        16m51.155s

Co-authored-by: sunyuechi <sunyuechi@iscas.ac.cn>
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2024-11-08 08:09:01 +01:00
Stefan Weil
d7c0a05ffa Remove Tensorflow support
Tensorflow was never used because of missing models.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-07 13:40:43 +01:00
Stefan Weil
daaa902a5e Update documentation on history of development
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-07 11:39:00 +01:00
Amit D.
d1b89204ec
Update README.md: Remove CI badges 2024-11-07 10:22:08 +02:00
Stefan Weil
e3ac3fce2d Run GitHub action sw less often
It is no longer run on push or pull requests.
The scheduled runs are reduced from daily to every 3rd day.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-07 08:57:51 +01:00
Stefan Weil
d2f311bf7c Get the right compiler DLL files for the Windows installer
libstdc++-6.dll and libgcc_s_seh-1.dll must be taken from the compiler
directory, not from the pacman DLLs.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-06 21:44:50 +01:00
Stefan Weil
4bd94c6147 Make sure that required packages are installed for build of Windows installer
The build process needs the packages curl, python3-venv and unzip
which are missing in the Docker image for Ubuntu.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-06 21:44:50 +01:00
Stefan Weil
708621a2ab Remove unneeded pkg-config-crosswrapper
The Debian package mingw-w64-tools already contains the required
/usr/bin/x86_64-w64-mingw32-pkg-config.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-06 21:44:50 +01:00
Stefan Weil
3ec34f1755 Don't install tesseract.exe twice in Tesseract installer for Windows
The pattern for the training tools *.exe also includes tesseract.exe,
so it must be excluded explicitly.

Add also a macro BINDIR which simplifies the NSIS rules.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-06 21:44:50 +01:00
Stefan Weil
914a9589aa Reduce size of Tesseract installer for Windows
Strip all installed executables and libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-06 21:44:50 +01:00
Stefan Weil
eed339b3ba Replace some tprintf by tesserr stream (fixes Windows compiler warnings)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-03 17:53:45 +01:00
Stefan Weil
60ed299550 Make downloads with curl silent in build process
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:37:19 +01:00
Stefan Weil
b7c7540bd7 Fix download of jar files for scrollview
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:37:09 +01:00
Stefan Weil
e8a9a56f9f Support symbolic values for --oem and --psm options
This fixes issue #4332.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:00:59 +01:00
Stefan Weil
827a4e7c7f Add Python script which finds Windows dependencies
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:00:33 +01:00
Stefan Weil
d0d43dfbce Update NSIS installer
- Add manual pages in HTML format and helper for Tesseract command line
- Don't remove the installation directory recursively
- Add GitHub action for Tesseract installer for Windows
- Add docbook-xml to required packages (needed for doc)
- Use unicode for NSIS installer
- Optionally sign executables
- Add more file properties to installer
- Update configuration for use with pacman
- Build Windows installer only for 64 bit Windows

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:00:33 +01:00
Regina Retter
b7c5996248 Update installer for Windows
- Added a couple of languages that are available for the Linux version
- Add new section for script data
- Get data from tessdata_fast
  The data files are now in the "script" subdirectory.
- Update list of scripts and languages
- Update path for script trained data
- Add data for Han Simplified vertical script
- Fix names of tessdata (jpn_vert, kmr)
- Fix some path names for 64 bit version
- Remove testing files from installation
  Those files were moved from tesseract.git to test.git.
- Don't enforce admin mode, but use highest available
- Don't use a checkbox for the license
- Remove unused code for registry settings (PATH, TESSDATA)
- Don't show README.md (did not work)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:00:33 +01:00
Stefan Weil
c886e3b639 Update NSIS configuration
- Move NSIS installer file to new location
- Support cross builds with NSIS
- Clean nsis configuration
- Fix typos in nsis configuration
- Add jar files needed for ScrollView.jar
- Move ScrollView.jar to a new section
- Add missing configurations to tessdata
- Registry settings are now disabled (problems with long PATH)
- Add menu sections for all languages
- Simplify language downloads
- Tune and improve nsis configuration
- Add sizes for language data
- Add missing translations to nsis configuration
- Don't show details in installer by default
- Initial code for 64 bit Tesseract installer
- Fix uninstall for TESSDATA_PREFIX registry key
- Remove cube code
- nsis: Add all training executables
- nsis: Disable registry settings

Trying to add to PATH fails if the old PATH is very long and
will result in an empty PATH.

Remove these settings as they were already disabled by default,
and both are not needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-02 07:00:33 +01:00
zdenop@gmail.com
678e427d8b add NSIS script for Windows installer
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@815 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2024-11-02 07:00:33 +01:00
Stefan Weil
7fd6d2388a Fix more typos in code comments and variable name
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-31 15:00:55 +01:00
zdenop
9f8e07cdf9
Merge pull request #4337 from stweil/typos
Fix some typos and grammer issues
2024-10-28 14:12:56 +01:00
Amit D.
3633e88b2a
Update README.md: Fix OSS-Fuzz link 2024-10-28 14:32:09 +02:00
Stefan Weil
3400ce7662 Fix more typos in code comments 2024-10-23 15:05:58 +02:00
Stefan Weil
31e864b4a4 Fix Settup -> Setup in method names
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-23 15:03:44 +02:00
Stefan Weil
688f8283c5 Fix some code comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-23 14:59:15 +02:00
Stefan Weil
638868ed38
Modernize code for renderers and remove filename conversion for Windows (#4330)
Commit db52047420 added the filename conversion for the hOCR renderer,
but it was removed later for TSV in commit 6700edd8bc.

Tesseract does not use a filename conversion anywhere else, so remove it
for the other renderers, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-23 08:34:06 +03:00
Stefan Weil
3020c14a60 CI: Install libtool as required dependency for macOS build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-23 07:07:09 +02:00
Stefan Weil
e9fc2af0b2 CI: Install curl and icu4c as required dependencies for macOS build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-22 16:41:29 +02:00
Zdenko Podobný
2976eb1678 Revert "use variable instead of hardcoded name for pkg-config file"
This reverts commit b4a4f5c6cb.
2024-10-22 11:03:58 +02:00
Stefan Weil
b4adf2464b Replace deprecated runner macos-12 by macos-latest in GitHub actions
The macOS 12 runner image will be removed by December 3rd, 2024.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-10-17 18:39:46 +02:00
Zdenko Podobný
3ebed57878 Merge branch 'main' of https://github.com/tesseract-ocr/tesseract 2024-10-17 09:11:30 +02:00
Zdenko Podobný
b4a4f5c6cb use variable instead of hardcoded name for pkg-config file 2024-10-17 09:11:22 +02:00
zdenop
aacc9052b9
Update cmake.yml
Use macOS 15 as the macOS 12 runner image will be removed by 12/3/2024
2024-10-17 07:06:25 +02:00
Egor Pugin
61ed4d9f36 Do not export PDBs for static libraries. Fixes #4279. 2024-10-07 20:42:52 +03:00
zdenop
900c721f14
Merge pull request #4319 from Conan-Kudo/fix-soversion
cmake: Correctly set the soversion based on SemVer properties
2024-09-19 15:18:08 +02:00
Neal Gompa
280779c615 cmake: Correctly set the soversion based on SemVer properties
As this project follows Semantic Versioning, the shared object
version should match these semantics.

The two options that make sense here are to have the soversion
set to the version major (so only breaking changes are tracked)
or to set to version major and minor (so breaking and API additions
are tracked).

Since the Windows version of the library already uses version major
and version minor, let's just do this universally.

Fixes: 832926f5af ("Update library version handling for cmake")

Signed-off-by: Neal Gompa <neal@gompa.dev>
2024-09-18 07:44:29 -04:00
Stefan Weil
4f43536335
Merge pull request #4314 from stweil/optimize
Add C++ stream for log messages and use it in two debug messages
2024-09-04 05:22:03 +02:00
Stefan Weil
37d1c6506d Add TESS_API in declaration for tesserr (fix sw build)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 17:31:38 +02:00
Stefan Weil
7ef8e3c7ee Print time for ErrorCounter::ComputeErrorRate in milliseconds
Optimize also the code, replace tprintf by C++ stream
and call clock() only when needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 16:26:50 +02:00
Stefan Weil
bd7b3571cc Print time for tessedit_timing_debug in milliseconds
Optimize also the code a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 16:26:50 +02:00
Stefan Weil
33d673c46d tprintf: Add C++ stream for log messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 16:26:50 +02:00
Stefan Weil
a63e7ec2e6 tprintf: Modernize and simplify the code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 15:42:03 +02:00
Stefan Weil
3a4a013dfe tprintf: Remove unused macro and update comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-09-03 15:42:03 +02:00