From 4c07ce872d86999d29a59115f9feeb064b1e035f Mon Sep 17 00:00:00 2001 From: Shreeshrii Date: Sat, 30 Mar 2019 17:49:54 +0530 Subject: [PATCH] Delete command syntax - refer to updated `man` page --- Command-Line-Usage.md | 86 ++----------------------------------------- 1 file changed, 3 insertions(+), 83 deletions(-) diff --git a/Command-Line-Usage.md b/Command-Line-Usage.md index 90fd90f..91c61b8 100644 --- a/Command-Line-Usage.md +++ b/Command-Line-Usage.md @@ -1,93 +1,13 @@ ## [Tesseract 'man' page](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc) -Information updated for Tesseract-4.0.0-beta-1 +See the [man](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc) page for command line syntax and other details. -## tesseract --version -``` -tesseract 4.0.0-beta.1-207-g984a - leptonica-1.76.0 - libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : libopenjp2 2.3.0 - Found AVX - Found SSE -``` - -## tesseract --help - -``` -Usage: - tesseract --help | --help-extra | --version - tesseract --list-langs - tesseract imagename outputbase [options...] [configfile...] - -OCR options: - -l LANG[+LANG] Specify language(s) used for OCR. -NOTE: These options must occur before any configfile. - -Single options: - --help Show this help message. - --help-extra Show extra help for advanced users. - --version Show version information. - --list-langs List available languages for tesseract engine. -``` - -## tesseract --help-extra - -``` -Usage: - tesseract --help | --help-extra | --help-psm | --help-oem | --version - tesseract --list-langs [--tessdata-dir PATH] - tesseract --print-parameters [options...] [configfile...] - tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...] - -OCR options: - --tessdata-dir PATH Specify the location of tessdata path. - --user-words PATH Specify the location of user words file. - --user-patterns PATH Specify the location of user patterns file. - -l LANG[+LANG] Specify language(s) used for OCR. - -c VAR=VALUE Set value for config variables. - Multiple -c arguments are allowed. - --psm NUM Specify page segmentation mode. - --oem NUM Specify OCR Engine mode. -NOTE: These options must occur before any configfile. - -Page segmentation modes: - 0 Orientation and script detection (OSD) only. - 1 Automatic page segmentation with OSD. - 2 Automatic page segmentation, but no OSD, or OCR. - 3 Fully automatic page segmentation, but no OSD. (Default) - 4 Assume a single column of text of variable sizes. - 5 Assume a single uniform block of vertically aligned text. - 6 Assume a single uniform block of text. - 7 Treat the image as a single text line. - 8 Treat the image as a single word. - 9 Treat the image as a single word in a circle. - 10 Treat the image as a single character. - 11 Sparse text. Find as much text as possible in no particular order. - 12 Sparse text with OSD. - 13 Raw line. Treat the image as a single text line, - bypassing hacks that are Tesseract-specific. - -OCR Engine modes: - 0 Legacy engine only. - 1 Neural nets LSTM engine only. - 2 Legacy + LSTM engines. - 3 Default, based on what is available. - -Single options: - -h, --help Show minimal help message. - --help-extra Show extra help for advanced users. - --help-psm Show page segmentation modes. - --help-oem Show OCR Engine modes. - -v, --version Show version information. - --list-langs List available languages for tesseract engine. - --print-parameters Print tesseract parameters. -``` -------------------------------------------- -## Using LSTM Engine with Tesseract 4.0alpha +## Using LSTM Engine with Tesseract 4 - Use --oem 1 for LSTM, --oem 0 for Legacy Tesseract + Use `--oem 1` for LSTM, `--oem 0` for Legacy Tesseract. Please note that Legacy Tesseract models are only included in traineddata files from [tessdata](https://github.com/tesseract-ocr/tessdata) repo. `tesseract input.tiff output --oem 1 -l eng`