mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-07-31 09:46:15 +08:00
Updated Command Line Usage (markdown)
parent
5b6fd39f8c
commit
94485516ed
@ -40,16 +40,43 @@ This page has not been (fully) updated for Tesseract 4.0.
|
|||||||
--list-langs List available languages for tesseract engine.
|
--list-langs List available languages for tesseract engine.
|
||||||
--print-parameters Print tesseract parameters to stdout.
|
--print-parameters Print tesseract parameters to stdout.
|
||||||
|
|
||||||
## OCR only first page of a multi-page tiff
|
--------------------------------------------
|
||||||
|
|
||||||
|
## Add page break in output
|
||||||
|
|
||||||
|
Use the config variable as part of command `-c include_page_breaks=1 -c page_separator="[PAGE SEPRATOR]"`
|
||||||
|
|
||||||
|
Default page separator is the form feed control character.
|
||||||
|
|
||||||
|
`tesseract -c include_page_breaks=1 input.tiff output`
|
||||||
|
|
||||||
|
## OCR multiple images with one run of tesseract
|
||||||
|
|
||||||
|
Prepare a text file that has the path to each image:
|
||||||
|
|
||||||
|
```
|
||||||
|
path/to/1.png
|
||||||
|
path/to/2.png
|
||||||
|
path/to/3.tiff
|
||||||
|
```
|
||||||
|
|
||||||
|
Save it, and then give its name as input file to Tesseract.
|
||||||
|
|
||||||
|
`tesseract savedlist output`
|
||||||
|
|
||||||
|
|
||||||
|
## OCR single page of a multi-page tiff
|
||||||
|
|
||||||
Use the config variable as part of command `-c tessedit_page_number=0 `
|
Use the config variable as part of command `-c tessedit_page_number=0 `
|
||||||
|
|
||||||
## Integrate original image file and detected text into searchable PDF
|
## Integrate original image file and detected text into PDF
|
||||||
|
|
||||||
Use the config variable `-c textonly_pdf=1` and Merge your image-only and text-only PDF.
|
Use the config variable `-c textonly_pdf=1` and Merge your image-only and text-only PDF.
|
||||||
|
|
||||||
see https://github.com/tesseract-ocr/tesseract/issues/660#issuecomment-274213632 for details
|
see https://github.com/tesseract-ocr/tesseract/issues/660#issuecomment-274213632 for details
|
||||||
|
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
## Simplest Invocation to OCR an image
|
## Simplest Invocation to OCR an image
|
||||||
|
|
||||||
tesseract imagename outputbase
|
tesseract imagename outputbase
|
||||||
|
Loading…
Reference in New Issue
Block a user