From 757bcd1bfd8266149e39d10b27b3524b3fc09f5c Mon Sep 17 00:00:00 2001 From: Shreeshrii Date: Thu, 10 Mar 2016 12:50:09 +0530 Subject: [PATCH] Updated Command Line Usage (markdown) --- Command-Line-Usage.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/Command-Line-Usage.md b/Command-Line-Usage.md index e4c2583..4c5f92a 100644 --- a/Command-Line-Usage.md +++ b/Command-Line-Usage.md @@ -42,7 +42,9 @@ tesseract imagename outputbase -This uses English as the default language and 3 as the Page Segmentation Mode. The default output format is text. osd.traineddata, for Orientation and Segmentation and eng.traineddata and other language data files for English should be in the tessdata directory. TESSDATA_PREFIX environment variable should be set to the parent directory of your "tessdata" directory. +This uses **English **as the default language and 3 as the Page Segmentation Mode. The default output format is **text**. + +osd.traineddata, for Orientation and Segmentation and eng.traineddata and other language data files for English should be in the "tessdata" directory. TESSDATA_PREFIX environment variable should be set to the parent directory of "tessdata" directory. The following command would give the same result as above, if eng.traineddata and osd.traineddata files are in /usr/share/tessdata directory. @@ -50,14 +52,22 @@ The following command would give the same result as above, if eng.traineddata an ## Using One Language - tesseract --tessdata-dir /usr/share ./testing/phototest.tif ./testing/phototest -l eng -psm 3 + tesseract --tessdata-dir /usr/share ./testing/phototest.tif ./testing/phototest -l eng -![phototest.tif](https://github.com/tesseract-ocr/tesseract/blob/master/testing/phototest.tif?raw=true) + ![phototest.tif](https://github.com/tesseract-ocr/tesseract/blob/master/testing/phototest.tif?raw=true) ## Using Multiple Languages + tesseract --tessdata-dir /usr/share ./testing/eurotext.tif ./testing/eurotext-engdeu -l eng+deu + +The output can be different based on the order of languages, so -l eng+deu can give different result than -l deu+eng. + ## Using different Page Segmentation Modes + tesseract --tessdata-dir /usr/share testing/san002.tif testing/san002-psm3 -l san + + tesseract --tessdata-dir /usr/share testing/san002.tif testing/san002-psm6 -l san -psm 6 + ## Searchable pdf ouptput ## HOCR output @@ -67,4 +77,3 @@ The following command would give the same result as above, if eng.traineddata an -