mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-12-01 07:59:05 +08:00
Merge pull request #1949 from stweil/manpage
Document some more config options for tesseract
This commit is contained in:
commit
551abb2114
@ -34,7 +34,9 @@ IN/OUT ARGUMENTS
|
||||
|
||||
'outputbase'::
|
||||
The basename of the output file (to which the appropriate extension
|
||||
will be appended). By default the output will be named 'outbase.txt'.
|
||||
will be appended). By default the output will be a text file
|
||||
with `.txt` added to the basename unless there are one or more
|
||||
'configfile' options which explicitly specify the desired output.
|
||||
|
||||
'stdout'::
|
||||
Instruction to sent output data to standard output
|
||||
@ -88,8 +90,19 @@ OPTIONS
|
||||
contains a list of variables and their values, one per line, with a
|
||||
space separating variable from value. Interesting config files
|
||||
include: +
|
||||
* hocr - Output in hOCR format instead of as a text file.
|
||||
* pdf - Output in pdf instead of a text file.
|
||||
* `hocr` - Output in hOCR format (file extension `.hocr`).
|
||||
* `pdf` - Output PDF (file extension `.pdf`).
|
||||
* `tsv` - Output TSV (file extension `.tsv`).
|
||||
* `txt` - Output plain text (file extension `.txt`).
|
||||
* `get.images` - Write images.
|
||||
* `logfile` - Write debug file `tesseract.log`.
|
||||
* `lstm.train` - Used for LSTM training.
|
||||
* `makebox` - Output box file.
|
||||
* `quiet` - Write debug file to /dev/null.
|
||||
|
||||
It is possible to select several config files, for example
|
||||
`tesseract image.png demo hocr pdf txt` will create three output files
|
||||
`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
|
||||
|
||||
*Nota Bene:* The options `-l lang` and `--psm N` must occur
|
||||
before any 'configfile'.
|
||||
@ -122,7 +135,7 @@ LANGUAGES
|
||||
|
||||
The currently available traineddata files for tesseract 4.0
|
||||
for the following languages are in
|
||||
(in https://github.com/tesseract-ocr/tessdata_fast):
|
||||
https://github.com/tesseract-ocr/tessdata_fast:
|
||||
|
||||
*afr* (Afrikaans),
|
||||
*amh* (Amharic),
|
||||
|
Loading…
Reference in New Issue
Block a user