tesseract/doc/wordlist2dawg.1.asc
joregan 5575d8db02 last one
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@483 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-09-30 02:18:45 +00:00

42 lines
812 B
Plaintext

WORDLIST2DAWG(1)
================
NAME
----
wordlist2dawg - convert a wordlist to a DAWG for Tesseract
SYNOPSIS
--------
*wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset'
DESCRIPTION
-----------
wordlist2dawg(1) converts a wordlist to a Directed Acyclic
Word Graph (DAWG) for use with Tesseract.
The wordlists are split into two: one with high frequency
words, and one with the rest.
OPTIONS
-------
'WORDLIST'
A plain text file in UTF-8, one word per line
'DAWG'
The output DAWG to write
'lang.unicharset'
The unicharset of the language. This is the unicharset
generated by mftraining(1)
SEE ALSO
--------
tesseract(1), mftraining(1)
<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
COPYING
-------
Copyright (c) 2006 Google, Inc.
Licensed under the Apache License, Version 2.0