mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-24 02:59:07 +08:00
Update documentation
- The wiki was moved to GitHub pages. - The master branch was renamed and is now the main branch. Signed-off-by: Stefan Weil <sw@weilnetz.de>
This commit is contained in:
parent
15aee49d0a
commit
9b688e6e12
@ -14,4 +14,4 @@ RUN bundle install --gemfile ~/.travis/travis-build/Gemfile
|
|||||||
ADD . /tesseract
|
ADD . /tesseract
|
||||||
WORKDIR /tesseract
|
WORKDIR /tesseract
|
||||||
|
|
||||||
RUN travis compile | sed -e "s/--branch\\\=\\\'\\\'/--branch=master/g" | bash
|
RUN travis compile | sed -e "s/--branch\\\=\\\'\\\'/--branch=4.1/g" | bash
|
||||||
|
30
README.md
30
README.md
@ -19,18 +19,18 @@ It also needs traineddata files which support the legacy engine, for example
|
|||||||
those from the tessdata repository.
|
those from the tessdata repository.
|
||||||
|
|
||||||
The lead developer is Ray Smith. The maintainer is Zdenko Podobny.
|
The lead developer is Ray Smith. The maintainer is Zdenko Podobny.
|
||||||
For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS)
|
For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/main/AUTHORS)
|
||||||
and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors).
|
and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors).
|
||||||
|
|
||||||
Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box".
|
Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box".
|
||||||
|
|
||||||
Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output.
|
Tesseract supports **various output formats**: plain text, ALTO, hOCR (HTML), PDF, invisible-text-only PDF, TSV.
|
||||||
|
|
||||||
You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract.
|
You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality) of the image** you are giving Tesseract.
|
||||||
|
|
||||||
This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page.
|
This project **does not include a GUI application**. If you need one, please see [3rdParty](https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty).
|
||||||
|
|
||||||
Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information.
|
Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00) for more information.
|
||||||
|
|
||||||
## Brief history
|
## Brief history
|
||||||
|
|
||||||
@ -39,15 +39,15 @@ at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some
|
|||||||
more changes made in 1996 to port to Windows, and some C++izing in 1998.
|
more changes made in 1996 to port to Windows, and some C++izing in 1998.
|
||||||
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
|
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
|
||||||
|
|
||||||
The latest (LSTM based) stable version is **[4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1)**, released on December 26, 2019. Latest source code is available from [master branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/master). Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues), and [Planning wiki](https://github.com/tesseract-ocr/tesseract/wiki/Planning).
|
The latest (LSTM based) stable version is **[4.1.2](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.2)**, released on September 18, 2021. Latest source code is available from [main branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/main). Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues), and on the [Planning page](https://tesseract-ocr.github.io/tessdoc/Planning).
|
||||||
|
|
||||||
The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). There is no development for this version, but it can be used for special cases (e.g. see [Regression of features from 3.0x](https://github.com/tesseract-ocr/tesseract/wiki/Planning#regression-of-features-from-30x)).
|
The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). There is no development for this version, but it can be used for special cases.
|
||||||
|
|
||||||
See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases.
|
See **[Release Notes](https://tesseract-ocr.github.io/tessdoc/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/4.1/ChangeLog)** for more details of the releases.
|
||||||
|
|
||||||
## Installing Tesseract
|
## Installing Tesseract
|
||||||
|
|
||||||
You can either [Install Tesseract via pre-built binary package](https://github.com/tesseract-ocr/tesseract/wiki) or [build it from source](https://github.com/tesseract-ocr/tesseract/wiki/Compiling).
|
You can either [Install Tesseract via pre-built binary package](https://tesseract-ocr.github.io/tessdoc/) or [build it from source](https://tesseract-ocr.github.io/tessdoc/Compiling).
|
||||||
|
|
||||||
Supported Compilers are:
|
Supported Compilers are:
|
||||||
|
|
||||||
@ -59,25 +59,25 @@ Other compilers might work, but are not officially supported.
|
|||||||
|
|
||||||
## Running Tesseract
|
## Running Tesseract
|
||||||
|
|
||||||
Basic **[command line usage](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)**:
|
Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage)**:
|
||||||
|
|
||||||
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
|
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
|
||||||
|
|
||||||
For more information about the various command line options use `tesseract --help` or `man tesseract`.
|
For more information about the various command line options use `tesseract --help` or `man tesseract`.
|
||||||
|
|
||||||
Examples can be found in the [wiki](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#simplest-invocation-to-ocr-an-image).
|
Examples can be found in the [documentation](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage#simplest-invocation-to-ocr-an-image).
|
||||||
|
|
||||||
## For developers
|
## For developers
|
||||||
|
|
||||||
Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page.
|
Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/main/src/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/main/src/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper section](https://tesseract-ocr.github.io/tessdoc/AddOns#tesseract-wrappers) on AddOns documentation page.
|
||||||
|
|
||||||
Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/).
|
Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/).
|
||||||
|
|
||||||
## Support
|
## Support
|
||||||
|
|
||||||
Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md)**.
|
Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/main/CONTRIBUTING.md)**.
|
||||||
|
|
||||||
For support, first read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists.
|
For support, first read the [documentation](https://tesseract-ocr.github.io/tessdoc/), particularly the [FAQ](https://tesseract-ocr.github.io/tessdoc/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists.
|
||||||
|
|
||||||
Mailing-lists:
|
Mailing-lists:
|
||||||
* [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users.
|
* [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users.
|
||||||
@ -116,4 +116,4 @@ It is suggested to use leptonica with built-in support for [zlib](https://zlib.n
|
|||||||
|
|
||||||
For the latest online version of the README.md see:
|
For the latest online version of the README.md see:
|
||||||
|
|
||||||
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
|
https://github.com/tesseract-ocr/tesseract/blob/main/README.md
|
||||||
|
@ -427,10 +427,10 @@ Version 2.00 brought Unicode (UTF-8) support, six languages, and the ability
|
|||||||
to train Tesseract.
|
to train Tesseract.
|
||||||
|
|
||||||
Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy.
|
Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy.
|
||||||
See <https://github.com/tesseract-ocr/docs/blob/master/AT-1995.pdf>.
|
See <https://github.com/tesseract-ocr/docs/blob/main/AT-1995.pdf>.
|
||||||
Since Tesseract 2.00,
|
Since Tesseract 2.00,
|
||||||
scripts are now included to allow anyone to reproduce some of these tests.
|
scripts are now included to allow anyone to reproduce some of these tests.
|
||||||
See <https://github.com/tesseract-ocr/tesseract/wiki/TestingTesseract> for more
|
See <https://tesseract-ocr.github.io/tessdoc/TestingTesseract> for more
|
||||||
details.
|
details.
|
||||||
|
|
||||||
Tesseract 3.00 added a number of new languages, including Chinese, Japanese,
|
Tesseract 3.00 added a number of new languages, including Chinese, Japanese,
|
||||||
@ -447,16 +447,15 @@ Tesseract 3 is enabled by `--oem 0`. This also needs traineddata files which
|
|||||||
support the legacy engine, for example those from the tessdata repository
|
support the legacy engine, for example those from the tessdata repository
|
||||||
(https://github.com/tesseract-ocr/tessdata).
|
(https://github.com/tesseract-ocr/tessdata).
|
||||||
|
|
||||||
For further details, see the release notes in the Tesseract wiki
|
For further details, see the release notes in the Tesseract documentation
|
||||||
(<https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes>).
|
(<https://tesseract-ocr.github.io/tessdoc/ReleaseNotes>).
|
||||||
|
|
||||||
|
|
||||||
RESOURCES
|
RESOURCES
|
||||||
---------
|
---------
|
||||||
Main web site: <https://github.com/tesseract-ocr> +
|
Main web site: <https://github.com/tesseract-ocr> +
|
||||||
User forum: <http://groups.google.com/group/tesseract-ocr> +
|
User forum: <http://groups.google.com/group/tesseract-ocr> +
|
||||||
Wiki: <https://github.com/tesseract-ocr/tesseract/wiki> +
|
Documentation: <https://tesseract-ocr.github.io/tessdoc/> +
|
||||||
Information on training: <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
--------
|
--------
|
||||||
@ -477,7 +476,7 @@ Romano, Ray Smith, Rika Antonova, Robert Moss, Samuel Charron, Sheelagh
|
|||||||
Lloyd, Shobhit Saxena, and Thomas Kielbus.
|
Lloyd, Shobhit Saxena, and Thomas Kielbus.
|
||||||
|
|
||||||
For a list of contributors see
|
For a list of contributors see
|
||||||
<https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS>.
|
<https://github.com/tesseract-ocr/tesseract/blob/4.1/AUTHORS>.
|
||||||
|
|
||||||
COPYING
|
COPYING
|
||||||
-------
|
-------
|
||||||
|
Loading…
Reference in New Issue
Block a user