SYNOPSIS
wordlist2dawg WORDLIST DAWG lang.unicharset
DESCRIPTION
wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract.
The wordlists are split into two: one with high frequency words, and one with the rest.
OPTIONS
WORDLIST A plain text file in UTF-8, one word per line
DAWG The output DAWG to write
lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1)
SEE ALSO
tesseract(1), mftraining(1)
COPYING
Copyright (c) 2006 Google, Inc. Licensed under the Apache License, Version 2.0