WORDLIST2DAWG(1) ================ NAME ---- wordlist2dawg - convert a wordlist to a DAWG for Tesseract SYNOPSIS -------- *wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset' DESCRIPTION ----------- wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph (DAWG) for use with Tesseract. The wordlists are split into two: one with high frequency words, and one with the rest. OPTIONS ------- 'WORDLIST' A plain text file in UTF-8, one word per line 'DAWG' The output DAWG to write 'lang.unicharset' The unicharset of the language. This is the unicharset generated by mftraining(1) SEE ALSO -------- tesseract(1), mftraining(1) COPYING ------- Copyright (c) 2006 Google, Inc. Licensed under the Apache License, Version 2.0