Updated Command Line Usage (markdown)

Shreeshrii 2016-03-18 15:43:21 +05:30
parent 9d31f9c41b
commit 59555e6b1a

@ -49,31 +49,139 @@ osd.traineddata, for Orientation and Segmentation and eng.traineddata and other
The following command would give the same result as above, if eng.traineddata and osd.traineddata files are in /usr/share/tessdata directory.
tesseract --tessdata-dir /usr/share imagename outputbase -l eng psm 3
____________________________________
## Using One Language
tesseract --tessdata-dir /usr/share ./testing/phototest.tif ./testing/phototest -l eng
## Using Multiple Languages
tesseract --tessdata-dir /usr/share ./testing/eurotext.png ./testing/eurotext-engdeu -l eng+deu
Following examples use this image which has text in multiple languages.
![eurotext.png](http://dev.blog.fairway.ne.jp/wp-content/uploads/2014/04/eurotext.png)
## Using One Language
tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l eng
Output
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspammer@website.com is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre 0 C50 preguieoso.
## Using Multiple Languages
tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-engdeu -l eng+deu
Output
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from aspammer@website.com is spam.
Der „schnelle” braune Fuchs springt
über den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrön räpido salta sobre el perro
perezoso. A raposa marrom räpida
salta sobre o cäo preguieoso.
The output can be different based on the order of languages, so -l eng+deu can give different result than -l deu+eng.
## Using different Page Segmentation Modes
tesseract --tessdata-dir /usr/share testing/san002.tif testing/san002-psm3 -l san
The following examples are using this image with text in Devanagari script and Sanskrit language.
tesseract --tessdata-dir /usr/share testing/san002.tif testing/san002-psm6 -l san -psm 6
![san002.png] (https://cloud.githubusercontent.com/assets/82178/13678011/81953684-e6ba-11e5-91e8-5c40518e94a6.png)
## Searchable pdf ouptput
tesseract --tessdata-dir /usr/share testing/san002.png testing/san002-psm6 -l san -psm 6
Output
विर्व्य 16
ज्यालत्रुखीसह्स्रनामक्तोव्रम्- नामाकळिट्. 191
दुर्गासहस्रनामस्तीत्रम्- १ नामांक्ळिन्नू ॰213
द्रुर्गासहस्रनत्मस्तीन्रम्- २ नामावळिऽ 238
द्दुगसिद्द्स्रनत्मक्तोत्रम्दकाराद्दि(३) नामाव'ळिऽ 263
ट्टुगसिहस्रनामक्तोत्रम्- ४ नामावळिइं 300
पार्वतीं ह्यो) सहस्रनामातोत्रम्- नामावळिऽ’ 329
द्दुर्गानवाक्षरीन्निशतींनत्माव'क्ति 355
द्बुर्गाष्टोत्तरङ्प्तनत्मरतोव्रम्- नामावक्ति 360
र्व्यत्मामस्वोत्रम्- नामाक्ळिऽ 363
अन्नपूण्स्सिहस्रनत्मस्तीत्रम्- नामावक्ति 365
अन्नघूर्गाष्टोत्तस्यातनामस्तीन्रम्- नामावक्ति 394
क्रुलकुर्व्यसहस्रनत्मक्तोत्रम्- कवचम्… नामावळिथ् 397-
कुमारींसहृस्रनामक्तोन्नम्- नामावळिय् 432
गङ्ग’म्यासद्वृस्रनप्मक्तोव्रम्- नाम।वक्ति` 457
गङ्ग’म्याष्टोत्तराप्तनामप्तोत्रम्- नामावळिऽ 488
गङ्गादातनप्तास्तोत्रम्- नामावक्ति 491
यमुनासहस्रनामरतोव्रम्- नम्पावळिय् 493
'शिवगङ्गासद्दृस्रनत्माव'ळि 517
गम्पत्रीसह्स्रनत्मक्तोत्रम्- नाम।व'ळिऽ (१) 531
## Searchable pdf output
tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l eng pdf
## HOCR output
tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l eng hocr
Output
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract 3.05.00dev' />
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "./testing/eurotext.png"; bbox 0 0 1024 800; ppageno 0'>
<div class='ocr_carea' id='block_1_1' title="bbox 98 66 918 661">
<p class='ocr_par' id='par_1_1' lang='eng' title="bbox 98 66 918 661">
<span class='ocr_line' id='line_1_1' title="bbox 105 66 823 113; baseline 0.015 -18; x_size 39; x_descenders 7; x_ascenders 9"><span class='ocrx_word' id='word_1_1' title='bbox 105 66 178 97; x_wconf 90'>The</span> <span class='ocrx_word' id='word_1_2' title='bbox 205 67 347 106; x_wconf 87'><strong>(quick)</strong></span> <span class='ocrx_word' id='word_1_3' title='bbox 376 69 528 109; x_wconf 89'>[brown]</span> <span class='ocrx_word' id='word_1_4' title='bbox 559 71 663 110; x_wconf 89'>{fox}</span> <span class='ocrx_word' id='word_1_5' title='bbox 687 73 823 113; x_wconf 89'>jumps!</span>
</span>
<span class='ocr_line' id='line_1_2' title="bbox 104 115 887 165; baseline 0.013 -19; x_size 42; x_descenders 9; x_ascenders 11"><span class='ocrx_word' id='word_1_6' title='bbox 104 115 199 147; x_wconf 91'>Over</span> <span class='ocrx_word' id='word_1_7' title='bbox 224 117 283 148; x_wconf 89'>the</span> <span class='ocrx_word' id='word_1_8' title='bbox 310 117 533 155; x_wconf 88'>$43,456.78</span> <span class='ocrx_word' id='word_1_9' title='bbox 561 121 696 162; x_wconf 92'>&lt;lazy&gt;</span> <span class='ocrx_word' id='word_1_10' title='bbox 722 123 791 154; x_wconf 92'>#90</span> <span class='ocrx_word' id='word_1_11' title='bbox 818 125 887 165; x_wconf 89'>dog</span>
</span>
<span class='ocr_line' id='line_1_3' title="bbox 103 165 835 206; baseline 0.014 -10; x_size 40; x_descenders 8; x_ascenders 9"><span class='ocrx_word' id='word_1_12' title='bbox 103 165 134 196; x_wconf 91'>&amp;</span> <span class='ocrx_word' id='word_1_13' title='bbox 160 166 396 206; x_wconf 88'>duck/goose,</span> <span class='ocrx_word' id='word_1_14' title='bbox 424 178 463 201; x_wconf 92'>as</span> <span class='ocrx_word' id='word_1_15' title='bbox 493 171 614 203; x_wconf 91'>12.5%</span> <span class='ocrx_word' id='word_1_16' title='bbox 638 172 680 204; x_wconf 89'>of</span> <span class='ocrx_word' id='word_1_17' title='bbox 700 174 835 206; x_wconf 91'>E-mail</span>
</span>
<span class='ocr_line' id='line_1_4' title="bbox 103 215 911 264; baseline 0.015 -19; x_size 39; x_descenders 8; x_ascenders 9"><span class='ocrx_word' id='word_1_18' title='bbox 103 215 194 247; x_wconf 89'>from</span> <span class='ocrx_word' id='word_1_19' title='bbox 220 219 716 260; x_wconf 87'>aspammer@website.com</span> <span class='ocrx_word' id='word_1_20' title='bbox 742 223 773 255; x_wconf 93'>is</span> <span class='ocrx_word' id='word_1_21' title='bbox 799 233 911 264; x_wconf 88'>spam.</span>
</span>
<span class='ocr_line' id='line_1_5' title="bbox 102 266 877 314; baseline 0.013 -18; x_size 39; x_descenders 8; x_ascenders 9"><span class='ocrx_word' id='word_1_22' title='bbox 102 266 173 297; x_wconf 89'>Der</span> <span class='ocrx_word' id='word_1_23' title='bbox 198 267 406 302; x_wconf 78'>,,schnelle”</span> <span class='ocrx_word' id='word_1_24' title='bbox 433 269 568 302; x_wconf 91'>braune</span> <span class='ocrx_word' id='word_1_25' title='bbox 594 272 709 304; x_wconf 91'>Fuchs</span> <span class='ocrx_word' id='word_1_26' title='bbox 735 274 877 314; x_wconf 83'>springt</span>
</span>
<span class='ocr_line' id='line_1_6' title="bbox 102 315 918 357; baseline 0.013 -11; x_size 39.311646; x_descenders 8.3116436; x_ascenders 9"><span class='ocrx_word' id='word_1_27' title='bbox 102 315 187 347; x_wconf 75'>fiber</span> <span class='ocrx_word' id='word_1_28' title='bbox 212 317 280 348; x_wconf 91'>den</span> <span class='ocrx_word' id='word_1_29' title='bbox 306 318 430 350; x_wconf 86'>faulen</span> <span class='ocrx_word' id='word_1_30' title='bbox 456 320 572 352; x_wconf 92'>Hund.</span> <span class='ocrx_word' id='word_1_31' title='bbox 601 322 648 354; x_wconf 87'>Le</span> <span class='ocrx_word' id='word_1_32' title='bbox 674 324 803 356; x_wconf 87'>renard</span> <span class='ocrx_word' id='word_1_33' title='bbox 827 325 918 357; x_wconf 90'>brun</span>
</span>
<span class='ocr_line' id='line_1_7' title="bbox 101 366 833 409; baseline 0.016 -15; x_size 39; x_descenders 8; x_ascenders 9"><span class='ocrx_word' id='word_1_34' title='bbox 101 366 274 405; x_wconf 88'>«rapide»</span> <span class='ocrx_word' id='word_1_35' title='bbox 302 373 403 400; x_wconf 88'>saute</span> <span class='ocrx_word' id='word_1_36' title='bbox 428 371 641 409; x_wconf 87'>par-dessus</span> <span class='ocrx_word' id='word_1_37' title='bbox 667 372 700 404; x_wconf 91'>le</span> <span class='ocrx_word' id='word_1_38' title='bbox 725 374 833 406; x_wconf 90'>chien</span>
</span>
<span class='ocr_line' id='line_1_8' title="bbox 100 419 859 464; baseline 0.013 -18; x_size 39; x_descenders 8; x_ascenders 8"><span class='ocrx_word' id='word_1_39' title='bbox 100 424 308 454; x_wconf 89'>paresseux.</span> <span class='ocrx_word' id='word_1_40' title='bbox 337 419 384 450; x_wconf 90'>La</span> <span class='ocrx_word' id='word_1_41' title='bbox 409 420 516 459; x_wconf 90'>volpe</span> <span class='ocrx_word' id='word_1_42' title='bbox 543 430 707 455; x_wconf 89'>marrone</span> <span class='ocrx_word' id='word_1_43' title='bbox 733 424 859 464; x_wconf 85'>rapida</span>
</span>
<span class='ocr_line' id='line_1_9' title="bbox 100 466 834 511; baseline 0.014 -15; x_size 40; x_descenders 9; x_ascenders 9"><span class='ocrx_word' id='word_1_44' title='bbox 100 466 192 497; x_wconf 90'>salta</span> <span class='ocrx_word' id='word_1_45' title='bbox 219 475 324 507; x_wconf 90'>sopra</span> <span class='ocrx_word' id='word_1_46' title='bbox 351 468 376 499; x_wconf 90'>i]</span> <span class='ocrx_word' id='word_1_47' title='bbox 403 478 491 501; x_wconf 90'>cane</span> <span class='ocrx_word' id='word_1_48' title='bbox 517 471 633 511; x_wconf 89'>pigro.</span> <span class='ocrx_word' id='word_1_49' title='bbox 662 473 703 504; x_wconf 96'>El</span> <span class='ocrx_word' id='word_1_50' title='bbox 729 482 834 506; x_wconf 88'>zorro</span>
</span>
<span class='ocr_line' id='line_1_10' title="bbox 99 516 833 563; baseline 0.014 -17; x_size 39; x_descenders 8; x_ascenders 9"><span class='ocrx_word' id='word_1_51' title='bbox 99 516 242 548; x_wconf 78'>marrén</span> <span class='ocrx_word' id='word_1_52' title='bbox 268 517 395 557; x_wconf 77'>répido</span> <span class='ocrx_word' id='word_1_53' title='bbox 421 520 513 552; x_wconf 90'>salta</span> <span class='ocrx_word' id='word_1_54' title='bbox 540 521 644 554; x_wconf 93'>sobre</span> <span class='ocrx_word' id='word_1_55' title='bbox 669 523 702 554; x_wconf 90'>el</span> <span class='ocrx_word' id='word_1_56' title='bbox 728 532 833 563; x_wconf 87'>perro</span>
</span>
<span class='ocr_line' id='line_1_11' title="bbox 98 568 829 613; baseline 0.014 -17; x_size 40; x_descenders 8; x_ascenders 10"><span class='ocrx_word' id='word_1_57' title='bbox 98 574 284 604; x_wconf 89'>perezoso.</span> <span class='ocrx_word' id='word_1_58' title='bbox 313 568 342 598; x_wconf 92'>A</span> <span class='ocrx_word' id='word_1_59' title='bbox 369 578 497 609; x_wconf 91'>raposa</span> <span class='ocrx_word' id='word_1_60' title='bbox 523 579 677 604; x_wconf 89'>marrom</span> <span class='ocrx_word' id='word_1_61' title='bbox 703 573 829 613; x_wconf 75'>répida</span>
</span>
<span class='ocr_line' id='line_1_12' title="bbox 98 616 710 661; baseline 0.013 -15; x_size 41; x_descenders 9; x_ascenders 9"><span class='ocrx_word' id='word_1_62' title='bbox 98 616 190 647; x_wconf 86'>salta</span> <span class='ocrx_word' id='word_1_63' title='bbox 217 617 320 649; x_wconf 90'>sobre</span> <span class='ocrx_word' id='word_1_64' title='bbox 346 627 366 650; x_wconf 89'><strong>0</strong></span> <span class='ocrx_word' id='word_1_65' title='bbox 391 621 456 651; x_wconf 72'>C50</span> <span class='ocrx_word' id='word_1_66' title='bbox 481 621 710 661; x_wconf 74'>preguieoso.</span>
</span>
</p>
</div>
</div>
</body>
</html>
## TSV output (only available in 3.05-dev in master branch)