Revert 5a4098acc2d356aa7656eae2efb80384455dac3f...45f06157d6f09291cc233716a268ced8ae62b07b on 4.0 with LSTM

Shreeshrii 2019-11-24 08:08:04 +05:30
parent 96f31d4003
commit 92f4cdcca5

@ -1 +1,74 @@
shit shit and shit
## 4.0 +
Tesseract 4.0 **+** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
## Documentation
* [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00)
* [VGSLSpecs](https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs)
* [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
* [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
Slides
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
[#6](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/6ModernizationEfforts.pdf),
[#7](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf)
have information about LSTM integration in Tesseract 4.0.
* [4.0 Accuracy and Performance](https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance)
## Training Tesseract LSTM engine
* [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
* [tess4training - LSTM Training Tutorial for Tesseract 4](https://github.com/Shreeshrii/tess4training)
* [tesstrain- formerly ocrd-train](https://github.com/tesseract-ocr/tesstrain)
## 4.x ppa
Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:
* https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
Leptonica 1.74.1 package for Debian:
* https://packages.debian.org/sid/libleptonica-dev
## 4.x for Windows
Unofficial experimental binaries of tesseract-ocr 4.x are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:
* [Windows Installer made with MinGW-w64](http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe) from [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki)
* [zip file with cppan generated .dll and .exe files](https://www.dropbox.com/s/obiqvrt4m53pmoz/tesseract-4.0.0-alpha.zip?dl=1), You have to install VC2015 x86 redist from microsoft.com in order to run them.
* [Win64 build of tesseract 4.0.0 alpha, leptonica 1.74.1, and charlesw/tesseract .Net wrapper](https://github.com/tdhintz/tesseract4win64) - built using CPPAN for Visual Studio 2017.
## 4.x with GUI frontend
### VietOCR
[Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for VietOCR](https://sourceforge.net/projects/vietocr/files/vietocr/) from
* [VietOCR5.0alpha](https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/)
* [Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe](https://www.microsoft.com/en-us/download/details.aspx?id=48145) is REQUIRED for VietOCR to run correctly.
VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.
### gImageReader
[Windows binaries of tesseract-ocr 4.0.0-alpha with GUI interface are available for gImageReader](https://github.com/manisandro/gImageReader/releases) from
* [gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe](https://github.com/manisandro/gImageReader/releases/download/v3.2.1/gImageReader_3.2.1_qt5_i686_tesseract4.0.0.git2f10be5.exe)
* [gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe](https://github.com/manisandro/gImageReader/releases/download/v3.2.1/gImageReader_3.2.1_qt5_x86_64_tesseract4.0.0.git2f10be5.exe)
Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:
https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata
*
## 3.05
The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.
An installer for Tesseract 3.05 for Windows is available from [Tesseract at UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki). This includes the training tools.
## Current official release
The current official release is [4.1.0](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.0).