Updated Data Files in tessdata_fast (markdown)

Shreeshrii 2019-08-07 14:03:40 +05:30
parent 13c011b7e8
commit 2c1a8015e9

@ -1,6 +1,6 @@
## Traineddata Files for Version 4.00 +
We have three sets of .traineddata files for `tesseract` versions 4.00 and above on GitHub in three separate repositories.
We have three sets of official .traineddata files trained at Google, for `tesseract` versions 4.00 and above, in three separate repositories.
* [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) (Sep 2017) best "value for money" in speed vs accuracy, Integer models
* [tessdata_best](https://github.com/tesseract-ocr/tessdata_best) (Sep 2017) best results on Google's eval data, slower, Float models. These are the only models that can be used as base for finetune training
@ -8,6 +8,11 @@ We have three sets of .traineddata files for `tesseract` versions 4.00 and above
When using the traineddata files from the **`tessdata_best`** and **`tessdata_fast`** repositories, only the new LSTM-based OCR engine is supported. The legacy tesseract engine is NOT supported with these files, so Tesseract's oem modes '0' and '2' won't work with them.
Community contributed traineddata files can be found at:
* [tessdata_contrib](https://github.com/tesseract-ocr/tessdata_contrib) repo
* [Wiki page with links to externals repos](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files-Contributions)
## Information specific to tessdata_fast
First, `fast` is trained with a spec that produces a smaller net than `best`. As a result of smaller model, the prediction will be faster.