Merge pull request #1941 from Shreeshrii/master

Update man page and readme reg two OCR engines in Tesseract 4
2025-01-18 06:30:14 +08:00 · 2018-10-04 07:49:08 +02:00 · 2018-10-04 07:49:08 +02:00 · b15fbf1d0f
commit b15fbf1d0f
parent 1beeeee215 d160067308
2 changed files with 12 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -12,6 +12,12 @@
 ## About

 This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`.
+Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused
+on line recognition, but also still supports the legacy Tesseract OCR engine of 
+Tesseract 3 which works by recognizing character patterns. Compatibility with 
+Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). 
+It also needs traineddata files which support the legacy engine, for example 
+those from the tessdata repository.

 The lead developer is Ray Smith. The maintainer is Zdenko Podobny.
 For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS)
--- a/doc/tesseract.1.asc
+++ b/doc/tesseract.1.asc
@ -17,6 +17,12 @@ between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by
 UNLV. It was open-sourced by HP and UNLV in 2005, and has been developed
 at Google since then.

+Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused
+on line recognition, but also still supports the legacy Tesseract OCR engine of 
+Tesseract 3 which works by recognizing character patterns. Compatibility with 
+Tesseract 3 is enabled by --oem 0. It also needs traineddata files which support 
+the legacy engine, for example those from the tessdata repository.
+

 IN/OUT ARGUMENTS
 ----------------