mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-12-03 00:49:01 +08:00
Updated 4.0 with LSTM (markdown)
parent
585ae54e57
commit
10f69d90ba
@ -1,10 +1,11 @@
|
|||||||
## 4.0
|
## 4.0 +
|
||||||
|
|
||||||
Tesseract 4.0 **rc** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the [tessdata repository](https://github.com/tesseract-ocr/tessdata).
|
Tesseract 4.0 **+** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
* [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00)
|
* [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00)
|
||||||
* [VGSLSpecs](https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs)
|
* [VGSLSpecs](https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs)
|
||||||
|
* [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
|
||||||
* [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
|
* [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
|
||||||
Slides
|
Slides
|
||||||
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
|
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
|
||||||
@ -18,11 +19,13 @@ have information about LSTM integration in Tesseract 4.0.
|
|||||||
|
|
||||||
* [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
|
* [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
|
||||||
|
|
||||||
3.0 version of box files can be converted for use with LSTM training by adding a tab character at end of each line and boxes with space after each word. `Mark EOL` and `Mark EOL Bulk` functions under `Edit` in `Box Editor` tab of latest version of [jTessBoxEditor - jTessBoxEditor-2.0-Beta](https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/) can be used to add the EOL tabs automatically. Insert mode can be used on last letter of each word to add a box with space. There is no automated way to do this.
|
* [tess4training - LSTM Training Tutorial for Tesseract 4](https://github.com/Shreeshrii/tess4training)
|
||||||
|
|
||||||
## 4.0.0-alpha ppa
|
* [tessttrain - formerly ocrd-train](https://github.com/tesseract-ocr/tesstrain)
|
||||||
|
|
||||||
Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:
|
## 4.x ppa
|
||||||
|
|
||||||
|
Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:
|
||||||
* https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
|
* https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
|
||||||
|
|
||||||
Leptonica 1.74.1 package for Debian:
|
Leptonica 1.74.1 package for Debian:
|
||||||
|
Loading…
Reference in New Issue
Block a user