* Restore support for the legacy engine
It is still needed to get text attributes which are unsupported by the
LSTM engine, and it also has better recognition rates for some texts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* tesseractmain: Add missing 'static' attributes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Support different help texts for normal and advanced users
The old option --help now shows a very basic help text.
The new option --help-extra shows the full help information.
It now also includes a hint that Tesseract supports lists of images.
Fix also the indentation in the PSM help and
use a more neutral text in the OEM help.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Add missing line feed in error message
Signed-off-by: Stefan Weil <sw@weilnetz.de>
When Tesseract terminates by calling the exit function,
the destructor of any local auto variable is not called.
Fix two cases by using static variables.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This information is not needed for normal runs, so it is sufficient
to show it on request (like versions and OpenCL information).
This also fixes a crash caused by undefined order of global constructors:
When the global variable SIMDDetect::detector is initialized before the
global variable debug_file, the first tprintf call in simddetect.cpp
crashes because of a NULL pointer in debug_file. This was only seen when
running with a shared library (libtesseract.so).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It was introduced recently in commit f24ef67d, so there is no need
to support the old variant for compatibility reasons.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
When exit() is called from ParseArgs(), no destructors are executed
for the auto variables vars_vec and vars_values.
Making both variables static fixes the memory leaks, because now the
destructors are always executed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The previous commit added a dependency on tiffio.h, so enable the new
code only if that file is available.
The code which conditionally defines HAVE_TIFFIO_H was already there
although that macro was unused up to now.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It is common practice for command line programs to print
user requested information on stdout.
This seems to be reasonable for Tesseract, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.
Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Revert fd429c32, 43834da7, 05de195e.
See #49, #59.
The code in this commit solves the issue in a more elegant way, IMHO.
Now you can use:
* `tesseract eurotext.tif eurotext txt pdf`
* `tesseract eurotext.tif eurotext txt hocr`
* `tesseract eurotext.tif eurotext txt hocr pdf`
NOTE:
With `tesseract eurotext.tif eurotext`
or `tesseract eurotext.tif eurotext txt`
the psm will be set to '3', but...
With `tesseract eurotext.tif eurotext txt pdf`
or `tesseract eurotext.tif eurotext txt hocr`
the psm will be set to '1'.
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.
* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
(partially revert commit 99110df757).
In addition, whitespace characters at end of lines were removed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>