tesseract/doc/wordlist2dawg.1.asc

42 lines
812 B
Plaintext
Raw Normal View History

WORDLIST2DAWG(1)
================
NAME
----
wordlist2dawg - convert a wordlist to a DAWG for Tesseract
SYNOPSIS
--------
*wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset'
DESCRIPTION
-----------
wordlist2dawg(1) converts a wordlist to a Directed Acyclic
Word Graph (DAWG) for use with Tesseract.
The wordlists are split into two: one with high frequency
words, and one with the rest.
OPTIONS
-------
'WORDLIST'
A plain text file in UTF-8, one word per line
'DAWG'
The output DAWG to write
'lang.unicharset'
The unicharset of the language. This is the unicharset
generated by mftraining(1)
SEE ALSO
--------
tesseract(1), mftraining(1)
<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
COPYING
-------
Copyright (c) 2006 Google, Inc.
Licensed under the Apache License, Version 2.0