mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-24 02:59:07 +08:00
5575d8db02
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@483 d0cd1f9f-072b-0410-8dd7-cf729c803f20
42 lines
812 B
Plaintext
42 lines
812 B
Plaintext
WORDLIST2DAWG(1)
|
|
================
|
|
|
|
NAME
|
|
----
|
|
wordlist2dawg - convert a wordlist to a DAWG for Tesseract
|
|
|
|
SYNOPSIS
|
|
--------
|
|
*wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset'
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
wordlist2dawg(1) converts a wordlist to a Directed Acyclic
|
|
Word Graph (DAWG) for use with Tesseract.
|
|
|
|
The wordlists are split into two: one with high frequency
|
|
words, and one with the rest.
|
|
|
|
OPTIONS
|
|
-------
|
|
'WORDLIST'
|
|
A plain text file in UTF-8, one word per line
|
|
|
|
'DAWG'
|
|
The output DAWG to write
|
|
|
|
'lang.unicharset'
|
|
The unicharset of the language. This is the unicharset
|
|
generated by mftraining(1)
|
|
|
|
SEE ALSO
|
|
--------
|
|
tesseract(1), mftraining(1)
|
|
|
|
<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
|
|
|
|
COPYING
|
|
-------
|
|
Copyright (c) 2006 Google, Inc.
|
|
Licensed under the Apache License, Version 2.0
|