Convert riscv-v-spec-1.0.pdf into 111 PNG images,
then perform OCR on each one in sequence,
and measure the testing time on banana_f3:
old: 31m16.267s
new: 16m51.155s
Co-authored-by: sunyuechi <sunyuechi@iscas.ac.cn>
Co-authored-by: Stefan Weil <sw@weilnetz.de>
- Move NSIS installer file to new location
- Support cross builds with NSIS
- Clean nsis configuration
- Fix typos in nsis configuration
- Add jar files needed for ScrollView.jar
- Move ScrollView.jar to a new section
- Add missing configurations to tessdata
- Registry settings are now disabled (problems with long PATH)
- Add menu sections for all languages
- Simplify language downloads
- Tune and improve nsis configuration
- Add sizes for language data
- Add missing translations to nsis configuration
- Don't show details in installer by default
- Initial code for 64 bit Tesseract installer
- Fix uninstall for TESSDATA_PREFIX registry key
- Remove cube code
- nsis: Add all training executables
- nsis: Disable registry settings
Trying to add to PATH fails if the old PATH is very long and
will result in an empty PATH.
Remove these settings as they were already disabled by default,
and both are not needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It was not used for all sources before. Therefore some parts of the
code (especially the code for training) used different compiler
options. For example NDEBUG was not defined, and so the training
code was built with debug assertions (resulting in potentially
slower execution).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Add PAGE XML export and documentation.
To generate PAGE XML output just add 'page' to the tesseract command.
The output is outputname + '.page.xml' to avoid conflicts with ALTO export.
The output can be customized with the flags:
tessedit_create_page_polygon and tessedit_create_page_wordlevel.
Co-authored-by: Stefan Weil <sw@weilnetz.de>
Both forms are used in American English, but 'cannot' is more common
(also in Tesseract code), so use it always.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Move also its source code svpaint.cpp from src/viewer/ to src/,
so it is no longer included in libtesseract by the cmake build.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
MSYS2 clang64 uses the lld linker which does not support --as-needed.
The normal GNU ld uses that linker option with ELF targets but ignores
it for PE targets (.exe, .dll), so it can be removed.
Remove also the -Wl, which is only needed when linker options are
passed to the compiler but not when they are directly passed to the
linker.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Using those flags is not acceptable for Linux distributions
because the resulting code then depends on the build
infrastructure, so the build result is not deterministic.
It is still possible to use those compiler flags by specifying
CXXFLAGS.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The old commit only silenced parts of the build,
while the new one silences the whole build.
Fixes: 47af1282f4
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The new header file ccutils/tesstypes.h also prepares support
for larger images by introducing a new data type for image
size and coordinates (still unused).
FloatToDouble is now a local function.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This reverts commit 122daf1d64, reversing
changes made to 4cd56dc5f5.
Those changes caused two regressions which resulted in an assertion
or a segmentation fault.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
On latest MacOS 11.3 the system header file "ostream" includes a file
named "version".
The macro DEFAULT_INCLUDES adds the source root to the list of include
directories by default. As MacOS uses a case insensitive file system,
the compiler finds and includes the file "VERSION" there which causes
compiler errors and a failing build process.
Setting an empty DEFAULT_INCLUDES fixes that, but requires moving
config_auto.h to another directory in the include search path.
Signed-off-by: Stefan Weil <sw@weilnetz.de>