Document some more config options for tesseract

Clarify also the name(s) of the generated OCR result file(s): Tesseract does not create a file named outbase.txt by default. Fix also a sentence in the language section. Signed-off-by: Stefan Weil <sw@weilnetz.de>
2025-01-18 06:30:14 +08:00 · 2018-10-05 15:45:45 +02:00 · 2018-10-05 15:45:45 +02:00 · 383dcf70b5
commit 383dcf70b5
parent e03ee932d2
1 changed files with 17 additions and 4 deletions
--- a/doc/tesseract.1.asc
+++ b/doc/tesseract.1.asc
@ -34,7 +34,9 @@ IN/OUT ARGUMENTS

 'outputbase'::
 	The basename of the output file (to which the appropriate extension
-	will be appended).  By default the output will be named 'outbase.txt'.
+	will be appended).  By default the output will be a text file
+	with `.txt` added to the basename unless there are one or more
+	'configfile' options which explicitly specify the desired output.

 'stdout'::
 	Instruction to sent output data to standard output
@ -88,8 +90,19 @@ OPTIONS
 	contains a list of variables and their values, one per line, with a
 	space separating variable from value.  Interesting config files
 	include: +
-  * hocr - Output in hOCR format instead of as a text file.
-  * pdf  - Output in pdf instead of a text file.
+  * `hocr` - Output in hOCR format (file extension `.hocr`).
+  * `pdf` - Output PDF (file extension `.pdf`).
+  * `tsv` - Output TSV (file extension `.tsv`).
+  * `txt` - Output plain text (file extension `.txt`).
+  * `get.images` - Write images.
+  * `logfile` - Write debug file `tesseract.log`.
+  * `lstm.train` - Used for LSTM training.
+  * `makebox` - Output box file.
+  * `quiet` - Write debug file to /dev/null.
+
+It is possible to select several config files, for example
+`tesseract image.png demo hocr pdf txt` will create three output files
+`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.

 *Nota Bene:*   The options `-l lang` and `--psm N` must occur
 before any 'configfile'.
@ -122,7 +135,7 @@ LANGUAGES

 The currently available traineddata files for tesseract 4.0
 for the following languages are in
-(in https://github.com/tesseract-ocr/tessdata_fast):
+https://github.com/tesseract-ocr/tessdata_fast:

 *afr* (Afrikaans),
 *amh* (Amharic),