mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-30 23:49:05 +08:00
Revert 5a4098acc2d356aa7656eae2efb80384455dac3f...45f06157d6f09291cc233716a268ced8ae62b07b on 4.0 with LSTM
parent
96f31d4003
commit
92f4cdcca5
@ -1 +1,74 @@
|
||||
shit shit and shit
|
||||
## 4.0 +
|
||||
|
||||
Tesseract 4.0 **+** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
|
||||
|
||||
## Documentation
|
||||
* [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00)
|
||||
* [VGSLSpecs](https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs)
|
||||
* [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
|
||||
* [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
|
||||
Slides
|
||||
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
|
||||
[#6](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/6ModernizationEfforts.pdf),
|
||||
[#7](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf)
|
||||
have information about LSTM integration in Tesseract 4.0.
|
||||
|
||||
* [4.0 Accuracy and Performance](https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance)
|
||||
|
||||
## Training Tesseract LSTM engine
|
||||
|
||||
* [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
|
||||
|
||||
* [tess4training - LSTM Training Tutorial for Tesseract 4](https://github.com/Shreeshrii/tess4training)
|
||||
|
||||
* [tesstrain- formerly ocrd-train](https://github.com/tesseract-ocr/tesstrain)
|
||||
|
||||
## 4.x ppa
|
||||
|
||||
Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:
|
||||
* https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
|
||||
|
||||
Leptonica 1.74.1 package for Debian:
|
||||
* https://packages.debian.org/sid/libleptonica-dev
|
||||
|
||||
|
||||
## 4.x for Windows
|
||||
|
||||
Unofficial experimental binaries of tesseract-ocr 4.x are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:
|
||||
|
||||
* [Windows Installer made with MinGW-w64](http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe) from [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki)
|
||||
* [zip file with cppan generated .dll and .exe files](https://www.dropbox.com/s/obiqvrt4m53pmoz/tesseract-4.0.0-alpha.zip?dl=1), You have to install VC2015 x86 redist from microsoft.com in order to run them.
|
||||
* [Win64 build of tesseract 4.0.0 alpha, leptonica 1.74.1, and charlesw/tesseract .Net wrapper](https://github.com/tdhintz/tesseract4win64) - built using CPPAN for Visual Studio 2017.
|
||||
|
||||
## 4.x with GUI frontend
|
||||
|
||||
### VietOCR
|
||||
[Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR](https://sourceforge.net/projects/vietocr/files/vietocr/) from
|
||||
|
||||
* [VietOCR5.0alpha](https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/)
|
||||
|
||||
* [Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe](https://www.microsoft.com/en-us/download/details.aspx?id=48145) is REQUIRED for VietOCR to run correctly.
|
||||
|
||||
VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.
|
||||
|
||||
|
||||
### gImageReader
|
||||
|
||||
[Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader](https://github.com/manisandro/gImageReader/releases) from
|
||||
* [gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe](https://github.com/manisandro/gImageReader/releases/download/v3.2.1/gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe)
|
||||
* [gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe](https://github.com/manisandro/gImageReader/releases/download/v3.2.1/gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe)
|
||||
|
||||
Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:
|
||||
|
||||
https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata
|
||||
*
|
||||
|
||||
## 3.05
|
||||
|
||||
The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.
|
||||
|
||||
An installer for Tesseract 3.05 for Windows is available from [Tesseract at UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki). This includes the training tools.
|
||||
|
||||
## Current official release
|
||||
|
||||
The current official release is [4.1.0](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.0).
|
||||
|
Loading…
Reference in New Issue
Block a user