Merge pull request #1949 from stweil/manpage

Document some more config options for tesseract
This commit is contained in:
zdenop 2018-10-05 16:38:06 +02:00 committed by GitHub
commit 551abb2114
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -34,7 +34,9 @@ IN/OUT ARGUMENTS
'outputbase'::
The basename of the output file (to which the appropriate extension
will be appended). By default the output will be named 'outbase.txt'.
will be appended). By default the output will be a text file
with `.txt` added to the basename unless there are one or more
'configfile' options which explicitly specify the desired output.
'stdout'::
Instruction to sent output data to standard output
@ -88,8 +90,19 @@ OPTIONS
contains a list of variables and their values, one per line, with a
space separating variable from value. Interesting config files
include: +
* hocr - Output in hOCR format instead of as a text file.
* pdf - Output in pdf instead of a text file.
* `hocr` - Output in hOCR format (file extension `.hocr`).
* `pdf` - Output PDF (file extension `.pdf`).
* `tsv` - Output TSV (file extension `.tsv`).
* `txt` - Output plain text (file extension `.txt`).
* `get.images` - Write images.
* `logfile` - Write debug file `tesseract.log`.
* `lstm.train` - Used for LSTM training.
* `makebox` - Output box file.
* `quiet` - Write debug file to /dev/null.
It is possible to select several config files, for example
`tesseract image.png demo hocr pdf txt` will create three output files
`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
*Nota Bene:* The options `-l lang` and `--psm N` must occur
before any 'configfile'.
@ -122,7 +135,7 @@ LANGUAGES
The currently available traineddata files for tesseract 4.0
for the following languages are in
(in https://github.com/tesseract-ocr/tessdata_fast):
https://github.com/tesseract-ocr/tessdata_fast:
*afr* (Afrikaans),
*amh* (Amharic),