mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-12-03 00:49:01 +08:00
Updated 4.0 with LSTM (markdown)
parent
585ae54e57
commit
10f69d90ba
@ -1,10 +1,11 @@
|
||||
## 4.0
|
||||
## 4.0 +
|
||||
|
||||
Tesseract 4.0 **rc** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the [tessdata repository](https://github.com/tesseract-ocr/tessdata).
|
||||
Tesseract 4.0 **+** source code is available in the 'master' branch of the [repository](https://github.com/tesseract-ocr/tesseract). It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in [tessdata](https://github.com/tesseract-ocr/tessdata), [tessdata_best](https://github.com/tesseract-ocr/tessdata_best), [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast) repositories.
|
||||
|
||||
## Documentation
|
||||
* [NeuralNetsInTesseract4.00](NeuralNetsInTesseract4.00)
|
||||
* [VGSLSpecs](https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs)
|
||||
* [VGSLSpecs info from Tensorflow](https://github.com/mldbai/tensorflow-models/blob/master/street/g3doc/vgslspecs.md)
|
||||
* [DAS 2016 tutorial slides](https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016)
|
||||
Slides
|
||||
[#2](https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf),
|
||||
@ -18,11 +19,13 @@ have information about LSTM integration in Tesseract 4.0.
|
||||
|
||||
* [TrainingTesseract 4.00](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00)
|
||||
|
||||
3.0 version of box files can be converted for use with LSTM training by adding a tab character at end of each line and boxes with space after each word. `Mark EOL` and `Mark EOL Bulk` functions under `Edit` in `Box Editor` tab of latest version of [jTessBoxEditor - jTessBoxEditor-2.0-Beta](https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/) can be used to add the EOL tabs automatically. Insert mode can be used on last letter of each word to add a box with space. There is no automated way to do this.
|
||||
* [tess4training - LSTM Training Tutorial for Tesseract 4](https://github.com/Shreeshrii/tess4training)
|
||||
|
||||
## 4.0.0-alpha ppa
|
||||
* [tessttrain - formerly ocrd-train](https://github.com/tesseract-ocr/tesstrain)
|
||||
|
||||
Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:
|
||||
## 4.x ppa
|
||||
|
||||
Ubuntu PPAs for Tesseract 4.x & Leptonica 1.7x:
|
||||
* https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
|
||||
|
||||
Leptonica 1.74.1 package for Debian:
|
||||
|
Loading…
Reference in New Issue
Block a user