mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-27 12:49:35 +08:00
Revert deletion of stub files
parent
df012104d0
commit
bfae500ee4
11
4.0-Accuracy-and-Performance.md
Normal file
11
4.0-Accuracy-and-Performance.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/4.0-Accuracy-and-Performance**
|
||||
|
11
4.0-Docker-Containers.md
Normal file
11
4.0-Docker-Containers.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/4.0-Docker-Containers**
|
||||
|
11
4.0-with-LSTM.md
Normal file
11
4.0-with-LSTM.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/4.0-with-LSTM**
|
||||
|
11
4.0x-Changelog.md
Normal file
11
4.0x-Changelog.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/4.0x-Changelog**
|
||||
|
11
4.0x-Common-Errors-and-Resolutions.md
Normal file
11
4.0x-Common-Errors-and-Resolutions.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/4.0x-Common-Errors-and-Resolutions**
|
||||
|
11
APIExample-user_patterns.md
Normal file
11
APIExample-user_patterns.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/APIExample-user_patterns**
|
||||
|
11
APIExample.md
Normal file
11
APIExample.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/APIExample**
|
||||
|
11
AddOns.md
Normal file
11
AddOns.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/AddOns**
|
||||
|
11
Command-Line-Usage.md
Normal file
11
Command-Line-Usage.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage**
|
||||
|
11
Compiling-–-GitInstallation.md
Normal file
11
Compiling-–-GitInstallation.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Compiling-–-GitInstallation**
|
||||
|
11
Compiling.md
Normal file
11
Compiling.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Compiling**
|
||||
|
11
ControlParams.md
Normal file
11
ControlParams.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/ControlParams**
|
||||
|
11
Data-Files-Contributions.md
Normal file
11
Data-Files-Contributions.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Data-Files-Contributions**
|
||||
|
11
Data-Files-in-different-versions.md
Normal file
11
Data-Files-in-different-versions.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions**
|
||||
|
11
Data-Files-in-tessdata_fast.md
Normal file
11
Data-Files-in-tessdata_fast.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-tessdata_fast**
|
||||
|
11
Data-Files.md
Normal file
11
Data-Files.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Data-Files**
|
||||
|
11
Documentation.md
Normal file
11
Documentation.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Documentation**
|
||||
|
11
Downloads.md
Normal file
11
Downloads.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Downloads**
|
||||
|
11
FAQ-Old.md
Normal file
11
FAQ-Old.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/FAQ-Old**
|
||||
|
11
FAQ.md
Normal file
11
FAQ.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/FAQ**
|
||||
|
11
Fonts.md
Normal file
11
Fonts.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Fonts**
|
||||
|
298
Home.md
298
Home.md
@ -1,9 +1,299 @@
|
||||
# Tesseract
|
||||
|
||||
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0) It can be used directly, or (for programmers) using an [API](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/baseapi.h) to extract printed text from images. It supports a wide variety of languages.
|
||||
|
||||
- - -
|
||||
**These wiki pages are no longer maintained. All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).**
|
||||
|
||||
**The latest documentation is available at https://tesseract-ocr.github.io/.**
|
||||
- - -
|
||||
|
||||
# Introduction
|
||||
|
||||
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0) It can be used directly, or (for programmers) using an [API](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/baseapi.h) to extract printed text from images. It supports a wide variety of languages.
|
||||
|
||||
Tesseract doesn't have a built-in GUI, but there are several available from the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) page.
|
||||
|
||||
# Installation
|
||||
|
||||
There are two parts to install, the engine itself, and the training data for a language.
|
||||
|
||||
## Linux
|
||||
|
||||
Tesseract is available directly from many Linux distributions. The package is generally called **'tesseract'** or **'tesseract-ocr'** - search your distribution's repositories to find it.
|
||||
Thus you can install Tesseract 4.x and its developer tools on Ubuntu 18.x bionic by simply running:
|
||||
```
|
||||
sudo apt install tesseract-ocr
|
||||
sudo apt install libtesseract-dev
|
||||
```
|
||||
|
||||
**Note for Ubuntu users**: In case ```apt``` is unable to find the package try adding ```universe``` entry to the ```sources.list``` file as shown below.
|
||||
```
|
||||
sudo vi /etc/apt/sources.list
|
||||
|
||||
Copy the first line "deb http://archive.ubuntu.com/ubuntu bionic main" and paste it as shown below on the next line.
|
||||
If you are using a different release of ubuntu, then replace bionic with the respective release name.
|
||||
|
||||
deb http://archive.ubuntu.com/ubuntu bionic universe
|
||||
```
|
||||
|
||||
|
||||
Packages for over 130 languages and over 35 scripts are also available directly from the Linux distributions. The language packages are called **'tesseract-ocr-langcode'** and **'tesseract-ocr-script-scriptcode'**, where langcode is three letter language code and scriptcode is four letter script code.
|
||||
|
||||
**Examples:** tesseract-ocr-eng (**English**), tesseract-ocr-ara (**Arabic**), tesseract-ocr-chi-sim (**Simplified Chinese**), tesseract-ocr-script-latn (**Latin Script**), tesseract-ocr-script-deva (**Devanagari script**), etc.
|
||||
|
||||
For distributions that are supported by snapd you may also run the following command to install the `tesseract` built binaries([Don't have snapd installed?](https://snapcraft.io/docs/core/install)):
|
||||
|
||||
sudo snap install --channel=edge tesseract
|
||||
|
||||
The traineddata is currently not shipped with the snap package and must be placed manually to `~/snap/tesseract/current`.
|
||||
|
||||
### Tesseract Development Version with LSTM engine and related traineddata
|
||||
|
||||
_**5.00 Alpha**_
|
||||
|
||||
#### Ubuntu PPA
|
||||
|
||||
* [Ubuntu Eoan 19.10](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel?field.series_filter=eoan)
|
||||
* [Ubuntu Disco 19.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel?field.series_filter=disco)
|
||||
* [Ubuntu Bionic 18.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel?field.series_filter=bionic)
|
||||
* [Ubuntu Xenial 16.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel?field.series_filter=xenial)
|
||||
* [Ubuntu Trusty 14.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel?field.series_filter=trusty)
|
||||
|
||||
#### Debian
|
||||
|
||||
https://notesalexp.org/tesseract-ocr/
|
||||
|
||||
### Tesseract 4 packages with LSTM engine and related traineddata
|
||||
|
||||
#### Debian
|
||||
|
||||
##### 4.1.x
|
||||
|
||||
* [Debian testing](https://packages.debian.org/testing/tesseract-ocr)
|
||||
* [Debian Sid (unstable)](https://packages.debian.org/sid/tesseract-ocr)
|
||||
|
||||
There are also 4.1.x packages for other versions of Debian, check it here [https://notesalexp.org/tesseract-ocr/](https://notesalexp.org/tesseract-ocr/)
|
||||
|
||||
##### 4.0.x
|
||||
|
||||
* [Debian 10 Buster (stable)](https://packages.debian.org/buster/tesseract-ocr)
|
||||
* [Debian 9 Stretch backports (oldstable)](https://packages.debian.org/stretch-backports/tesseract-ocr)
|
||||
|
||||
#### Ubuntu
|
||||
|
||||
* [Ubuntu Bionic 20.04](https://packages.ubuntu.com/focal/tesseract-ocr-all)
|
||||
* [Ubuntu Bionic 18.04](https://packages.ubuntu.com/bionic/tesseract-ocr-all)
|
||||
|
||||
#### Ubuntu PPA
|
||||
* [Ubuntu Eoan 19.10](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr?field.series_filter=eoan)
|
||||
* [Ubuntu Disco 19.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr?field.series_filter=disco)
|
||||
* [Ubuntu Bionic 18.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr?field.series_filter=bionic)
|
||||
* [Ubuntu Xenial 16.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr?field.series_filter=xenial)
|
||||
* [Ubuntu Trusty 14.04](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr?field.series_filter=trusty)
|
||||
|
||||
#### Raspbian
|
||||
* [Raspbian Stretch(notesalexp.org)](https://notesalexp.org/tesseract-ocr/)
|
||||
* [Raspbian Buster](http://raspbian.org/RaspbianRepository)
|
||||
|
||||
#### RHEL/CentOS/Scientific Linux, Fedora, openSUSE packages
|
||||
|
||||
* [rpm package with tesseract-ocr](https://build.opensuse.org/project/show/home:Alexander_Pozdnyakov)
|
||||
|
||||
For example to install Tesseract with German language traineddata:
|
||||
|
||||
|
||||
**For CentOS 8 run the following as root:**
|
||||
```
|
||||
dnf config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_8/
|
||||
rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
|
||||
dnf install tesseract
|
||||
dnf install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For RHEL 7 run the following as root:**
|
||||
|
||||
```
|
||||
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/RHEL_7/
|
||||
yum update
|
||||
yum install tesseract
|
||||
yum install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For CentOS 7 run the following as root:**
|
||||
```
|
||||
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
|
||||
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
|
||||
yum update
|
||||
yum install tesseract
|
||||
yum install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For Scientific Linux 7 run the following as root:**
|
||||
```
|
||||
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/ScientificLinux_7/
|
||||
yum update
|
||||
yum install tesseract
|
||||
yum install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For Fedora 31 run the following as root:**
|
||||
```
|
||||
dnf config-manager --add-repo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/Fedora_31/home:Alexander_Pozdnyakov.repo
|
||||
dnf install tesseract
|
||||
dnf install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For Fedora 30 run the following as root:**
|
||||
```
|
||||
dnf config-manager --add-repo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/Fedora_30/home:Alexander_Pozdnyakov.repo
|
||||
dnf install tesseract
|
||||
dnf install tesseract-langpack-deu
|
||||
```
|
||||
|
||||
**For openSUSE Tumbleweed run the following as root:**
|
||||
```
|
||||
zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Tumbleweed/home:Alexander_Pozdnyakov.repo
|
||||
zypper refresh
|
||||
zypper install tesseract-ocr
|
||||
zypper install tesseract-ocr-traineddata-german
|
||||
```
|
||||
|
||||
**For openSUSE Leap 15.0 run the following as root:**
|
||||
```
|
||||
zypper addrepo https://download.opensuse.org/repositories/home:Alexander_Pozdnyakov/openSUSE_Leap_15.0/home:Alexander_Pozdnyakov.repo
|
||||
zypper refresh
|
||||
zypper install tesseract-ocr
|
||||
zypper install tesseract-ocr-traineddata-german
|
||||
```
|
||||
|
||||
### FOR EXPERTS ONLY.
|
||||
|
||||
If you are experimenting with OCR Engine modes, you will need to manually install language training data beyond what is available in your Linux distribution.
|
||||
|
||||
[Various types of training data](https://github.com/tesseract-ocr/tesseract/wiki/Data-Files) can be found on [GitHub](https://github.com/tesseract-ocr/). Unpack and copy the .traineddata file into a 'tessdata' directory. The exact directory will depend both on the type of training data, and your Linux distribution. Possibilities are `/usr/share/tesseract-ocr/tessdata` or `/usr/share/tessdata` or `/usr/share/tesseract-ocr/4.00/tessdata`.
|
||||
|
||||
Training data for obsolete Tesseract versions [=< 3.02](https://sourceforge.net/projects/tesseract-ocr-alt/files/?source=navbar) reside in another location.
|
||||
|
||||
If Tesseract is not available for your distribution, or you want to use a newer version than they offer, you can [compile your own](Compiling).
|
||||
|
||||
## macOS
|
||||
|
||||
You can install Tesseract using either [MacPorts](https://www.macports.org/) or [Homebrew](http://brew.sh).
|
||||
|
||||
A macOS wrapper for the Tesseract API is also available at [Tesseract macOS](https://github.com/scott0123/Tesseract-macOS).
|
||||
|
||||
### MacPorts
|
||||
To install Tesseract run this command:
|
||||
```
|
||||
sudo port install tesseract
|
||||
```
|
||||
To install any language data, run:
|
||||
```
|
||||
sudo port install tesseract-<langcode>
|
||||
```
|
||||
List of available langcodes can be found on [MacPorts tesseract page](https://www.macports.org/ports.php?by=name&substr=tesseract-).
|
||||
|
||||
### Homebrew
|
||||
To install Tesseract run this command:
|
||||
```
|
||||
brew install tesseract
|
||||
```
|
||||
|
||||
Training directories can be found using `brew list tesseract`
|
||||
Possible location can be `/usr/local/Cellar/tesseract/3.05.02/share/tessdata/`
|
||||
|
||||
## Windows
|
||||
|
||||
Installer for Windows for Tesseract 3.05, Tesseract 4 and development version 5.00 Alpha are available from [Tesseract at UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki). These include the training tools. Both 32-bit and 64-bit installers are available.
|
||||
|
||||
An installer for the **OLD version 3.02** is available for Windows from our [download](Downloads) page. This includes the English training data. If you want to use another language, [download the appropriate training data](
|
||||
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files), unpack it using [7-zip](http://www.7-zip.org), and copy the .traineddata file into the 'tessdata' directory, probably `C:\Program Files\Tesseract-OCR\tessdata`.
|
||||
|
||||
To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables, probably `C:\Program Files\Tesseract-OCR`.
|
||||
|
||||
Experts can also get binaries build with Visual Studio from the build artifacts of the [Appveyor Continuous Integration](https://ci.appveyor.com/project/zdenop/tesseract/history).
|
||||
|
||||
### Cygwin
|
||||
|
||||
Released version >= 3.02 of tesseract-ocr [are part of ](https://mirrors.kernel.org/sourceware/cygwin/x86_64/release/tesseract-ocr/) [Cygwin](https://www.cygwin.com/)
|
||||
|
||||
The latest version available is 4.1.0. Please see [announcement](https://www.cygwin.com/ml/cygwin-announce/2019-07/msg00009.html).
|
||||
|
||||
### MSYS2
|
||||
|
||||
Install tesseract-OCR:
|
||||
|
||||
```
|
||||
pacman -S mingw-w64-{i686,x86_64}-tesseract-ocr
|
||||
```
|
||||
|
||||
and the data files:
|
||||
|
||||
```
|
||||
pacman -S mingw-w64-{i686,x86_64}-tesseract-data-eng
|
||||
```
|
||||
|
||||
In the above command, "eng" may be replaced with the [ISO 639 3-letter language code](http://www.loc.gov/standards/iso639-2/php/code_list.php) for supported languages. For a list of available language packages use:
|
||||
|
||||
```
|
||||
pacman -Ss tesseract-data
|
||||
```
|
||||
|
||||
## Other Platforms
|
||||
|
||||
Tesseract may work on more exotic platforms too. You can either try [compiling it yourself](Compiling), or take a look at the list of [other projects using Tesseract](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty).
|
||||
|
||||
|
||||
# Running Tesseract
|
||||
|
||||
Tesseract is a command-line program, so first open a terminal or command prompt. The command is used like this:
|
||||
|
||||
```
|
||||
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
|
||||
```
|
||||
|
||||
So basic usage to do OCR on an image called 'myscan.png' and save the result to 'out.txt' would be:
|
||||
|
||||
```
|
||||
tesseract myscan.png out
|
||||
```
|
||||
|
||||
Or to do the same with German:
|
||||
|
||||
```
|
||||
tesseract myscan.png out -l deu
|
||||
```
|
||||
|
||||
It can even be used with multiple languages traineddata at a time eg. English and German:
|
||||
|
||||
```
|
||||
tesseract myscan.png out -l eng+deu
|
||||
```
|
||||
|
||||
Tesseract also includes a hOCR mode, which produces a special HTML file with the coordinates of each word. This can be used to create a searchable pdf, using a tool such as [Hocr2PDF](https://exactcode.com/opensource/exactimage/). To use it, use the 'hocr' config option, like this:
|
||||
|
||||
```
|
||||
tesseract myscan.png out hocr
|
||||
```
|
||||
|
||||
You can also create a searchable pdf directly from tesseract ( versions >=3.03):
|
||||
|
||||
```
|
||||
tesseract myscan.png out pdf
|
||||
```
|
||||
|
||||
More information about the various options is available in the [Tesseract manpage](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc).
|
||||
|
||||
# Other Languages
|
||||
|
||||
Tesseract has been trained for [many languages](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages), check for your language in the [Tessdata repository](https://github.com/tesseract-ocr/tessdata).
|
||||
|
||||
It can also be trained to support other languages and scripts; for more details see [TrainingTesseract](TrainingTesseract).
|
||||
|
||||
# Development
|
||||
|
||||
Tesseract can also be used in your own project, under the terms of the [Apache License 2.0.](http://www.apache.org/licenses/LICENSE-2.0) It has a fully featured API, and can be compiled for a variety of targets including Android and the iPhone. See the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) page for a sample of what has been done with it. Note that as yet there are very few 3rdParty Tesseract OCR projects [being developed for Mac](https://machow2.com/ocr-for-mac-best-software/#Tesseract_Freesoftware/) (with the only one being [Tesseract macOS](https://github.com/scott0123/Tesseract-macOS)), although there are several online OCR services that can be used on Mac that may use Tesseract as their OCR engine.
|
||||
|
||||
Also, it is free software, so if you want to pitch in and help, please do!
|
||||
If you find a bug and fix it yourself, the best thing to do is to attach the patch to your bug report in the [Issues List](https://github.com/tesseract-ocr/tesseract/issues)
|
||||
|
||||
# Support
|
||||
|
||||
First read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](http://groups.google.com/group/tesseract-ocr) or the [Tesseract developer forum](http://groups.google.com/group/tesseract-dev), and if you still can't find what you need, please ask us there.
|
||||
|
11
ImproveQuality.md
Normal file
11
ImproveQuality.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/ImproveQuality**
|
||||
|
3
Make-Box-Files.asciidoc
Normal file
3
Make-Box-Files.asciidoc
Normal file
@ -0,0 +1,3 @@
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0[Example of Box Tiff pair for LSTM 4.0]
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-Make-Box-Files[Training - Make Box Files]
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/AddOns[Training - Box Editors]
|
11
Making-Box-Files---4.0.md
Normal file
11
Making-Box-Files---4.0.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Making-Box-Files---4.0**
|
||||
|
11
NeuralNetsInTesseract4.00.md
Normal file
11
NeuralNetsInTesseract4.00.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/NeuralNetsInTesseract4.00**
|
||||
|
11
Planning.md
Normal file
11
Planning.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Planning**
|
||||
|
11
ReadMe.md
Normal file
11
ReadMe.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
Please find README at **[README.md](../blob/master/README.md)**
|
||||
|
11
ReleaseNotes.md
Normal file
11
ReleaseNotes.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/ReleaseNotes**
|
||||
|
BIN
ScrollView.jar
Normal file
BIN
ScrollView.jar
Normal file
Binary file not shown.
11
Technical-Documentation.md
Normal file
11
Technical-Documentation.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Technical-Documentation**
|
||||
|
11
TesseractOpenCL.md
Normal file
11
TesseractOpenCL.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/TesseractOpenCL**
|
||||
|
11
TestingTesseract.md
Normal file
11
TestingTesseract.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/TestingTesseract**
|
||||
|
11
The-Hallucination-Effect.md
Normal file
11
The-Hallucination-Effect.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/The-Hallucination-Effect**
|
||||
|
11
Training-Tesseract-3.00–3.02.md
Normal file
11
Training-Tesseract-3.00–3.02.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-3.00–3.02**
|
||||
|
11
Training-Tesseract-3.03–3.05.md
Normal file
11
Training-Tesseract-3.03–3.05.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-3.03–3.05**
|
||||
|
11
Training-Tesseract-–-Make-Box-Files.md
Normal file
11
Training-Tesseract-–-Make-Box-Files.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-–-Make-Box-Files**
|
||||
|
11
Training-Tesseract-–-tesstrain.sh.md
Normal file
11
Training-Tesseract-–-tesstrain.sh.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-–-tesstrain.sh**
|
||||
|
11
Training-Tesseract.md
Normal file
11
Training-Tesseract.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract**
|
||||
|
11
TrainingTesseract-4.00---Finetune.md
Normal file
11
TrainingTesseract-4.00---Finetune.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00---Finetune**
|
||||
|
11
TrainingTesseract-4.00.md
Normal file
11
TrainingTesseract-4.00.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00**
|
||||
|
4
TrainingTesseract.asciidoc
Normal file
4
TrainingTesseract.asciidoc
Normal file
@ -0,0 +1,4 @@
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00[Training Tesseract 4.0]
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05/[Training Tesseract 3.03, 3.04, 3.05]
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02[Training Tesseract 3.00, 3.01, 3.02]
|
||||
* https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract2[Training Tesseract 2]
|
11
TrainingTesseract2.md
Normal file
11
TrainingTesseract2.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/TrainingTesseract2**
|
||||
|
11
UNLV-Testing-of-Tesseract.md
Normal file
11
UNLV-Testing-of-Tesseract.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/UNLV-Testing-of-Tesseract**
|
||||
|
11
User-App-Example.md
Normal file
11
User-App-Example.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/User-App-Example**
|
||||
|
11
User-Projects-–-3rdParty.md
Normal file
11
User-Projects-–-3rdParty.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/User-Projects-–-3rdParty**
|
||||
|
11
VGSLSpecs.md
Normal file
11
VGSLSpecs.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/VGSLSpecs**
|
||||
|
11
ViewerDebugging.md
Normal file
11
ViewerDebugging.md
Normal file
@ -0,0 +1,11 @@
|
||||
We've Moved!
|
||||
------------
|
||||
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).
|
||||
|
||||
The latest documentation is available at https://tesseract-ocr.github.io/.
|
||||
|
||||
**Please find this page in its new home: https://tesseract-ocr.github.io/tessdoc/ViewerDebugging**
|
||||
|
1
_Footer.md
Normal file
1
_Footer.md
Normal file
@ -0,0 +1 @@
|
||||
Old wiki - no longer maintained. The pages were [moved](https://github.com/tesseract-ocr/tessdoc), see the [new documentation](https://tesseract-ocr.github.io/).
|
8
_Sidebar.md
Normal file
8
_Sidebar.md
Normal file
@ -0,0 +1,8 @@
|
||||
## As of 02/02/2020
|
||||
- - -
|
||||
**These wiki pages are no longer maintained.**
|
||||
|
||||
**All pages were moved to [tesseract-ocr/tessdoc](https://github.com/tesseract-ocr/tessdoc).**
|
||||
|
||||
**The latest documentation is available at https://tesseract-ocr.github.io/.**
|
||||
- - -
|
BIN
api_abi_changes.png
Normal file
BIN
api_abi_changes.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 17 KiB |
BIN
binarisation.png
Normal file
BIN
binarisation.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 711 KiB |
BIN
borders.png
Normal file
BIN
borders.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 212 KiB |
BIN
noise.png
Normal file
BIN
noise.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 123 KiB |
BIN
skew-linedetection.png
Normal file
BIN
skew-linedetection.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 511 KiB |
BIN
tesseract_debug.png
Normal file
BIN
tesseract_debug.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
Loading…
Reference in New Issue
Block a user