mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-06-07 18:02:40 +08:00
Update version in README and manpages (#1381)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This commit is contained in:
parent
8fb68746fb
commit
bdf6629722
@ -33,7 +33,7 @@ In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
|
||||
|
||||
The latest stable version is **[3.05.01](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.01)**, released on June 1, 2017. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05).
|
||||
|
||||
Source code for the new **[LSTM based 4.00.00alpha version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on GitHub. Please note this branch is under active development.
|
||||
Source code for the new **[LSTM based 4.0 version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on GitHub. Please note this branch is under active development.
|
||||
|
||||
See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases.
|
||||
|
||||
|
@ -81,7 +81,7 @@ CAVEATS
|
||||
COMPONENTS
|
||||
----------
|
||||
The components in a Tesseract lang.traineddata file as of
|
||||
Tesseract 4.00alpha are briefly described below; For more information on
|
||||
Tesseract 4.0 are briefly described below; For more information on
|
||||
many of these files, see
|
||||
<https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
|
||||
and
|
||||
@ -89,7 +89,7 @@ and
|
||||
|
||||
lang.config::
|
||||
(Optional) Language-specific overrides to default config variables.
|
||||
For 4.00alpha traineddata files, lang.config provides control parameters which
|
||||
For 4.0 traineddata files, lang.config provides control parameters which
|
||||
can affect layout analysis, and sub-languages.
|
||||
|
||||
lang.unicharset::
|
||||
@ -148,34 +148,34 @@ lang.params-model::
|
||||
(Optional - 3.0x legacy tesseract) .
|
||||
|
||||
lang.lstm::
|
||||
(Required - 4.00alpha LSTM) Neural net trained recognition model generated by lstmtraining.
|
||||
(Required - 4.0 LSTM) Neural net trained recognition model generated by lstmtraining.
|
||||
|
||||
lang.lstm-punc-dawg::
|
||||
(Optional - 4.00alpha LSTM) A dawg made from punctuation patterns found around words.
|
||||
(Optional - 4.0 LSTM) A dawg made from punctuation patterns found around words.
|
||||
The "word" part is replaced by a single space. Uses lang.lstm-unicharset.
|
||||
|
||||
lang.lstm-word-dawg::
|
||||
(Optional - 4.00alpha LSTM) A dawg made from dictionary words from the language.
|
||||
(Optional - 4.0 LSTM) A dawg made from dictionary words from the language.
|
||||
Uses lang.lstm-unicharset.
|
||||
|
||||
lang.lstm-number-dawg::
|
||||
(Optional - 4.00alpha LSTM) A dawg made from tokens which originally contained digits.
|
||||
(Optional - 4.0 LSTM) A dawg made from tokens which originally contained digits.
|
||||
Each digit is replaced by a space character. Uses lang.lstm-unicharset.
|
||||
|
||||
lang.lstm-unicharset::
|
||||
(Required - 4.00alpha LSTM) The unicode character set that Tesseract recognizes, with properties.
|
||||
(Required - 4.0 LSTM) The unicode character set that Tesseract recognizes, with properties.
|
||||
Same unicharset must be used to train the LSTM and build the lstm-*-dawgs files.
|
||||
|
||||
lang.lstm-recoder::
|
||||
(Required - 4.00alpha LSTM) Unicharcompress, aka the recoder, which maps the unicharset
|
||||
(Required - 4.0 LSTM) Unicharcompress, aka the recoder, which maps the unicharset
|
||||
further to the codes actually used by the neural network recognizer. This is created as
|
||||
part of the starter traineddata by combine_lang_model.
|
||||
|
||||
lang.version::
|
||||
(Optional) Version string for the traineddata file.
|
||||
First appeared in version 4.00alpha of Tesseract.
|
||||
First appeared in version 4.0 of Tesseract.
|
||||
Old version of traineddata files will report Version string:Pre-4.0.0.
|
||||
4.00alpha version of traineddata files may include the network spec
|
||||
4.0 version of traineddata files may include the network spec
|
||||
used for LSTM training as part of version string.
|
||||
|
||||
HISTORY
|
||||
|
@ -115,7 +115,7 @@ SINGLE OPTIONS
|
||||
LANGUAGES
|
||||
---------
|
||||
|
||||
The currently available traineddata files for tesseract 4.00
|
||||
The currently available traineddata files for tesseract 4.0
|
||||
for the following languages are in
|
||||
(in https://github.com/tesseract-ocr/tessdata_fast):
|
||||
|
||||
@ -244,7 +244,7 @@ argument '-l foo'.
|
||||
SCRIPTS
|
||||
-------
|
||||
|
||||
The traineddata files for the following scripts for tesseract 4.00
|
||||
The traineddata files for the following scripts for tesseract 4.0
|
||||
are also in https://github.com/tesseract-ocr/tessdata_fast.
|
||||
|
||||
In most cases, each of these contains all the languages that use that script PLUS English.
|
||||
|
Loading…
Reference in New Issue
Block a user