SYNOPSIS

wordlist2dawg WORDLIST DAWG lang.unicharset

DESCRIPTION

wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract.

The wordlists are split into two: one with high frequency words, and one with the rest.

OPTIONS

WORDLIST A plain text file in UTF-8, one word per line

DAWG The output DAWG to write

lang.unicharset The unicharset of the language. This is the unicharset generated by mftraining(1)

SEE ALSO

tesseract(1), mftraining(1)

COPYING

Copyright (c) 2006 Google, Inc. Licensed under the Apache License, Version 2.0