Update language list based on tessdata_fast; fix #1343

2024-11-24 02:59:07 +08:00 · 2018-02-23 11:19:18 +01:00 · 2018-02-23 11:19:18 +01:00 · 035325dfd0
commit 035325dfd0
parent 6f80c35b3f
1 changed files with 15 additions and 2 deletions
--- a/doc/tesseract.1.asc
+++ b/doc/tesseract.1.asc
@ -115,8 +115,9 @@ SINGLE OPTIONS
 LANGUAGES
 ---------

-There are currently language packs available for the following languages
-(in https://github.com/tesseract-ocr/tessdata):
+The currently available traineddata files for tesseract 4.00
+for the following languages are in
+(in https://github.com/tesseract-ocr/tessdata_fast):

 *afr* (Afrikaans)
 *amh* (Amharic)
@ -176,26 +177,33 @@ There are currently language packs available for the following languages
 *khm* (Central Khmer)
 *kir* (Kirghiz; Kyrgyz)
 *kor* (Korean)
+*kor_vert* (Korean (vertical))
 *kur* (Kurdish)
+*kur_ara* (Kurdish (Arabic))
 *lao* (Lao)
 *lat* (Latin)
 *lav* (Latvian)
 *lit* (Lithuanian)
+*ltz* (Luxembourgish)
 *mal* (Malayalam)
 *mar* (Marathi)
 *mkd* (Macedonian)
 *mlt* (Maltese)
+*mon* (Mongolian)
+*mri* (Maori)
 *msa* (Malay)
 *mya* (Burmese)
 *nep* (Nepali)
 *nld* (Dutch; Flemish)
 *nor* (Norwegian)
+*oci* (Occitan (post 1500))
 *ori* (Oriya)
 *osd* (Orientation and script detection module)
 *pan* (Panjabi; Punjabi)
 *pol* (Polish)
 *por* (Portuguese)
 *pus* (Pushto; Pashto)
+*que* (Quechua)
 *ron* (Romanian; Moldavian; Moldovan)
 *rus* (Russian)
 *san* (Sanskrit)
@ -203,20 +211,24 @@ There are currently language packs available for the following languages
 *slk* (Slovak)
 *slk_frak* (Slovak - Fraktur)
 *slv* (Slovenian)
+*snd* (Sindhi)
 *spa* (Spanish; Castilian)
 *spa_old* (Spanish; Castilian - Old)
 *sqi* (Albanian)
 *srp* (Serbian)
 *srp_latn* (Serbian - Latin)
+*sun* (Sundanese)
 *swa* (Swahili)
 *swe* (Swedish)
 *syr* (Syriac)
 *tam* (Tamil)
+*tat* (Tatar)
 *tel* (Telugu)
 *tgk* (Tajik)
 *tgl* (Tagalog)
 *tha* (Thai)
 *tir* (Tigrinya)
+*ton* (Tonga)
 *tur* (Turkish)
 *uig* (Uighur; Uyghur)
 *ukr* (Ukrainian)
@ -225,6 +237,7 @@ There are currently language packs available for the following languages
 *uzb_cyrl* (Uzbek - Cyrilic)
 *vie* (Vietnamese)
 *yid* (Yiddish)
+*yor* (Yoruba)

 To use a non-standard language pack named *foo.traineddata*, set the
 *TESSDATA_PREFIX* environment variable so the file can be found at