From d16006730807a31ca0727af4afe9f83887c5765b Mon Sep 17 00:00:00 2001 From: Shree Devi Kumar Date: Thu, 4 Oct 2018 04:17:49 +0000 Subject: [PATCH] Update README about both OCR engines in tesseract 4 --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index eb57f12d..4ca775a9 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,12 @@ ## About This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`. +Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused +on line recognition, but also still supports the legacy Tesseract OCR engine of +Tesseract 3 which works by recognizing character patterns. Compatibility with +Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). +It also needs traineddata files which support the legacy engine, for example +those from the tessdata repository. The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS)