Update language list based on tessdata_fast; fix #1343

This commit is contained in:
Zdenko Podobný 2018-02-23 11:19:18 +01:00
parent 6f80c35b3f
commit 035325dfd0

View File

@ -115,8 +115,9 @@ SINGLE OPTIONS
LANGUAGES LANGUAGES
--------- ---------
There are currently language packs available for the following languages The currently available traineddata files for tesseract 4.00
(in https://github.com/tesseract-ocr/tessdata): for the following languages are in
(in https://github.com/tesseract-ocr/tessdata_fast):
*afr* (Afrikaans) *afr* (Afrikaans)
*amh* (Amharic) *amh* (Amharic)
@ -176,26 +177,33 @@ There are currently language packs available for the following languages
*khm* (Central Khmer) *khm* (Central Khmer)
*kir* (Kirghiz; Kyrgyz) *kir* (Kirghiz; Kyrgyz)
*kor* (Korean) *kor* (Korean)
*kor_vert* (Korean (vertical))
*kur* (Kurdish) *kur* (Kurdish)
*kur_ara* (Kurdish (Arabic))
*lao* (Lao) *lao* (Lao)
*lat* (Latin) *lat* (Latin)
*lav* (Latvian) *lav* (Latvian)
*lit* (Lithuanian) *lit* (Lithuanian)
*ltz* (Luxembourgish)
*mal* (Malayalam) *mal* (Malayalam)
*mar* (Marathi) *mar* (Marathi)
*mkd* (Macedonian) *mkd* (Macedonian)
*mlt* (Maltese) *mlt* (Maltese)
*mon* (Mongolian)
*mri* (Maori)
*msa* (Malay) *msa* (Malay)
*mya* (Burmese) *mya* (Burmese)
*nep* (Nepali) *nep* (Nepali)
*nld* (Dutch; Flemish) *nld* (Dutch; Flemish)
*nor* (Norwegian) *nor* (Norwegian)
*oci* (Occitan (post 1500))
*ori* (Oriya) *ori* (Oriya)
*osd* (Orientation and script detection module) *osd* (Orientation and script detection module)
*pan* (Panjabi; Punjabi) *pan* (Panjabi; Punjabi)
*pol* (Polish) *pol* (Polish)
*por* (Portuguese) *por* (Portuguese)
*pus* (Pushto; Pashto) *pus* (Pushto; Pashto)
*que* (Quechua)
*ron* (Romanian; Moldavian; Moldovan) *ron* (Romanian; Moldavian; Moldovan)
*rus* (Russian) *rus* (Russian)
*san* (Sanskrit) *san* (Sanskrit)
@ -203,20 +211,24 @@ There are currently language packs available for the following languages
*slk* (Slovak) *slk* (Slovak)
*slk_frak* (Slovak - Fraktur) *slk_frak* (Slovak - Fraktur)
*slv* (Slovenian) *slv* (Slovenian)
*snd* (Sindhi)
*spa* (Spanish; Castilian) *spa* (Spanish; Castilian)
*spa_old* (Spanish; Castilian - Old) *spa_old* (Spanish; Castilian - Old)
*sqi* (Albanian) *sqi* (Albanian)
*srp* (Serbian) *srp* (Serbian)
*srp_latn* (Serbian - Latin) *srp_latn* (Serbian - Latin)
*sun* (Sundanese)
*swa* (Swahili) *swa* (Swahili)
*swe* (Swedish) *swe* (Swedish)
*syr* (Syriac) *syr* (Syriac)
*tam* (Tamil) *tam* (Tamil)
*tat* (Tatar)
*tel* (Telugu) *tel* (Telugu)
*tgk* (Tajik) *tgk* (Tajik)
*tgl* (Tagalog) *tgl* (Tagalog)
*tha* (Thai) *tha* (Thai)
*tir* (Tigrinya) *tir* (Tigrinya)
*ton* (Tonga)
*tur* (Turkish) *tur* (Turkish)
*uig* (Uighur; Uyghur) *uig* (Uighur; Uyghur)
*ukr* (Ukrainian) *ukr* (Ukrainian)
@ -225,6 +237,7 @@ There are currently language packs available for the following languages
*uzb_cyrl* (Uzbek - Cyrilic) *uzb_cyrl* (Uzbek - Cyrilic)
*vie* (Vietnamese) *vie* (Vietnamese)
*yid* (Yiddish) *yid* (Yiddish)
*yor* (Yoruba)
To use a non-standard language pack named *foo.traineddata*, set the To use a non-standard language pack named *foo.traineddata*, set the
*TESSDATA_PREFIX* environment variable so the file can be found at *TESSDATA_PREFIX* environment variable so the file can be found at