mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-01-08 10:57:48 +08:00
42 lines
812 B
Plaintext
42 lines
812 B
Plaintext
|
WORDLIST2DAWG(1)
|
||
|
================
|
||
|
|
||
|
NAME
|
||
|
----
|
||
|
wordlist2dawg - convert a wordlist to a DAWG for Tesseract
|
||
|
|
||
|
SYNOPSIS
|
||
|
--------
|
||
|
*wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset'
|
||
|
|
||
|
DESCRIPTION
|
||
|
-----------
|
||
|
wordlist2dawg(1) converts a wordlist to a Directed Acyclic
|
||
|
Word Graph (DAWG) for use with Tesseract.
|
||
|
|
||
|
The wordlists are split into two: one with high frequency
|
||
|
words, and one with the rest.
|
||
|
|
||
|
OPTIONS
|
||
|
-------
|
||
|
'WORDLIST'
|
||
|
A plain text file in UTF-8, one word per line
|
||
|
|
||
|
'DAWG'
|
||
|
The output DAWG to write
|
||
|
|
||
|
'lang.unicharset'
|
||
|
The unicharset of the language. This is the unicharset
|
||
|
generated by mftraining(1)
|
||
|
|
||
|
SEE ALSO
|
||
|
--------
|
||
|
tesseract(1), mftraining(1)
|
||
|
|
||
|
<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
|
||
|
|
||
|
COPYING
|
||
|
-------
|
||
|
Copyright (c) 2006 Google, Inc.
|
||
|
Licensed under the Apache License, Version 2.0
|