mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-27 20:59:36 +08:00
0f9d507740
The last contribution from Google was in 2018
(see commit ce88adbf32
).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
46 lines
971 B
Plaintext
46 lines
971 B
Plaintext
DAWG2WORDLIST(1)
|
|
================
|
|
:doctype: manpage
|
|
|
|
NAME
|
|
----
|
|
dawg2wordlist - convert a Tesseract DAWG to a wordlist
|
|
|
|
SYNOPSIS
|
|
--------
|
|
*dawg2wordlist* 'UNICHARSET' 'DAWG' 'WORDLIST'
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
dawg2wordlist(1) converts a Tesseract Directed Acyclic Word
|
|
Graph (DAWG) to a list of words using a unicharset as key.
|
|
|
|
OPTIONS
|
|
-------
|
|
'UNICHARSET'
|
|
The unicharset of the language. This is the unicharset
|
|
generated by mftraining(1).
|
|
|
|
'DAWG'
|
|
The input DAWG, created by wordlist2dawg(1)
|
|
|
|
'WORDLIST'
|
|
Plain text (output) file in UTF-8, one word per line
|
|
|
|
SEE ALSO
|
|
--------
|
|
tesseract(1), mftraining(1), wordlist2dawg(1), unicharset(5),
|
|
combine_tessdata(1)
|
|
|
|
<https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html>
|
|
|
|
COPYING
|
|
-------
|
|
Copyright \(C) 2012 Google, Inc.
|
|
Licensed under the Apache License, Version 2.0
|
|
|
|
AUTHOR
|
|
------
|
|
The Tesseract OCR engine was written by Ray Smith and his research groups
|
|
at Hewlett Packard (1985-1995) and Google (2006-2018).
|