Update documentation

- The wiki was moved to GitHub pages.
- The master branch was renamed and is now the main branch.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
This commit is contained in:
Stefan Weil 2021-09-18 12:25:43 +02:00
parent 15aee49d0a
commit 9b688e6e12
3 changed files with 22 additions and 23 deletions

View File

@ -14,4 +14,4 @@ RUN bundle install --gemfile ~/.travis/travis-build/Gemfile
ADD . /tesseract ADD . /tesseract
WORKDIR /tesseract WORKDIR /tesseract
RUN travis compile | sed -e "s/--branch\\\=\\\'\\\'/--branch=master/g" | bash RUN travis compile | sed -e "s/--branch\\\=\\\'\\\'/--branch=4.1/g" | bash

View File

@ -19,18 +19,18 @@ It also needs traineddata files which support the legacy engine, for example
those from the tessdata repository. those from the tessdata repository.
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. The lead developer is Ray Smith. The maintainer is Zdenko Podobny.
For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS) For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/main/AUTHORS)
and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors). and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors).
Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box". Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box".
Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output. Tesseract supports **various output formats**: plain text, ALTO, hOCR (HTML), PDF, invisible-text-only PDF, TSV.
You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract. You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality) of the image** you are giving Tesseract.
This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. This project **does not include a GUI application**. If you need one, please see [3rdParty](https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty).
Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00) for more information.
## Brief history ## Brief history
@ -39,15 +39,15 @@ at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some
more changes made in 1996 to port to Windows, and some C++izing in 1998. more changes made in 1996 to port to Windows, and some C++izing in 1998.
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
The latest (LSTM based) stable version is **[4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1)**, released on December 26, 2019. Latest source code is available from [master branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/master). Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues), and [Planning wiki](https://github.com/tesseract-ocr/tesseract/wiki/Planning). The latest (LSTM based) stable version is **[4.1.2](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.2)**, released on September 18, 2021. Latest source code is available from [main branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/main). Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues), and on the [Planning page](https://tesseract-ocr.github.io/tessdoc/Planning).
The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). There is no development for this version, but it can be used for special cases (e.g. see [Regression of features from 3.0x](https://github.com/tesseract-ocr/tesseract/wiki/Planning#regression-of-features-from-30x)). The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). There is no development for this version, but it can be used for special cases.
See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases. See **[Release Notes](https://tesseract-ocr.github.io/tessdoc/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/4.1/ChangeLog)** for more details of the releases.
## Installing Tesseract ## Installing Tesseract
You can either [Install Tesseract via pre-built binary package](https://github.com/tesseract-ocr/tesseract/wiki) or [build it from source](https://github.com/tesseract-ocr/tesseract/wiki/Compiling). You can either [Install Tesseract via pre-built binary package](https://tesseract-ocr.github.io/tessdoc/) or [build it from source](https://tesseract-ocr.github.io/tessdoc/Compiling).
Supported Compilers are: Supported Compilers are:
@ -59,25 +59,25 @@ Other compilers might work, but are not officially supported.
## Running Tesseract ## Running Tesseract
Basic **[command line usage](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)**: Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage)**:
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
For more information about the various command line options use `tesseract --help` or `man tesseract`. For more information about the various command line options use `tesseract --help` or `man tesseract`.
Examples can be found in the [wiki](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#simplest-invocation-to-ocr-an-image). Examples can be found in the [documentation](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage#simplest-invocation-to-ocr-an-image).
## For developers ## For developers
Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/main/src/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/main/src/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper section](https://tesseract-ocr.github.io/tessdoc/AddOns#tesseract-wrappers) on AddOns documentation page.
Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/). Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/).
## Support ## Support
Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md)**. Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/main/CONTRIBUTING.md)**.
For support, first read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists. For support, first read the [documentation](https://tesseract-ocr.github.io/tessdoc/), particularly the [FAQ](https://tesseract-ocr.github.io/tessdoc/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists.
Mailing-lists: Mailing-lists:
* [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users. * [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users.
@ -116,4 +116,4 @@ It is suggested to use leptonica with built-in support for [zlib](https://zlib.n
For the latest online version of the README.md see: For the latest online version of the README.md see:
https://github.com/tesseract-ocr/tesseract/blob/master/README.md https://github.com/tesseract-ocr/tesseract/blob/main/README.md

View File

@ -427,10 +427,10 @@ Version 2.00 brought Unicode (UTF-8) support, six languages, and the ability
to train Tesseract. to train Tesseract.
Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy. Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy.
See <https://github.com/tesseract-ocr/docs/blob/master/AT-1995.pdf>. See <https://github.com/tesseract-ocr/docs/blob/main/AT-1995.pdf>.
Since Tesseract 2.00, Since Tesseract 2.00,
scripts are now included to allow anyone to reproduce some of these tests. scripts are now included to allow anyone to reproduce some of these tests.
See <https://github.com/tesseract-ocr/tesseract/wiki/TestingTesseract> for more See <https://tesseract-ocr.github.io/tessdoc/TestingTesseract> for more
details. details.
Tesseract 3.00 added a number of new languages, including Chinese, Japanese, Tesseract 3.00 added a number of new languages, including Chinese, Japanese,
@ -447,16 +447,15 @@ Tesseract 3 is enabled by `--oem 0`. This also needs traineddata files which
support the legacy engine, for example those from the tessdata repository support the legacy engine, for example those from the tessdata repository
(https://github.com/tesseract-ocr/tessdata). (https://github.com/tesseract-ocr/tessdata).
For further details, see the release notes in the Tesseract wiki For further details, see the release notes in the Tesseract documentation
(<https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes>). (<https://tesseract-ocr.github.io/tessdoc/ReleaseNotes>).
RESOURCES RESOURCES
--------- ---------
Main web site: <https://github.com/tesseract-ocr> + Main web site: <https://github.com/tesseract-ocr> +
User forum: <http://groups.google.com/group/tesseract-ocr> + User forum: <http://groups.google.com/group/tesseract-ocr> +
Wiki: <https://github.com/tesseract-ocr/tesseract/wiki> + Documentation: <https://tesseract-ocr.github.io/tessdoc/> +
Information on training: <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
SEE ALSO SEE ALSO
-------- --------
@ -477,7 +476,7 @@ Romano, Ray Smith, Rika Antonova, Robert Moss, Samuel Charron, Sheelagh
Lloyd, Shobhit Saxena, and Thomas Kielbus. Lloyd, Shobhit Saxena, and Thomas Kielbus.
For a list of contributors see For a list of contributors see
<https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS>. <https://github.com/tesseract-ocr/tesseract/blob/4.1/AUTHORS>.
COPYING COPYING
------- -------