tesseract/doc/dawg2wordlist.1.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<?asciidoc-toc?>
<?asciidoc-numbered?>
<refentry lang="en">
<refentryinfo>
    <title>DAWG2WORDLIST(1)</title>
</refentryinfo>
<refmeta>
<refentrytitle>dawg2wordlist</refentrytitle>
<manvolnum>1</manvolnum>
<refmiscinfo class="source">&#160;</refmiscinfo>
<refmiscinfo class="manual">&#160;</refmiscinfo>
</refmeta>
<refnamediv>
    <refname>dawg2wordlist</refname>
    <refpurpose>convert a Tesseract DAWG to a wordlist</refpurpose>
</refnamediv>
<refsynopsisdiv id="_synopsis">
<simpara><emphasis role="strong">dawg2wordlist</emphasis> <emphasis>UNICHARSET</emphasis> <emphasis>DAWG</emphasis> <emphasis>WORDLIST</emphasis></simpara>
</refsynopsisdiv>
<refsect1 id="_description">
<title>DESCRIPTION</title>
<simpara>dawg2wordlist(1) converts a Tesseract Directed Acyclic Word
Graph (DAWG) to a list of words using a unicharset as key.</simpara>
</refsect1>
<refsect1 id="_options">
<title>OPTIONS</title>
<simpara><emphasis>UNICHARSET</emphasis>
        The unicharset of the language. This is the unicharset
        generated by mftraining(1).</simpara>
<simpara><emphasis>DAWG</emphasis>
        The input DAWG, created by wordlist2dawg(1)</simpara>
<simpara><emphasis>WORDLIST</emphasis>
        Plain text (output) file in UTF-8, one word per line</simpara>
</refsect1>
<refsect1 id="_see_also">
<title>SEE ALSO</title>
<simpara>tesseract(1), mftraining(1), wordlist2dawg(1), unicharset(5),
combine_tessdata(1)</simpara>
<simpara><ulink url="https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract">https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract</ulink></simpara>
</refsect1>
<refsect1 id="_copying">
<title>COPYING</title>
<simpara>Copyright (C) 2012 Google, Inc.
Licensed under the Apache License, Version 2.0</simpara>
</refsect1>
<refsect1 id="_author">
<title>AUTHOR</title>
<simpara>The Tesseract OCR engine was written by Ray Smith and his research groups
at Hewlett Packard (1985-1995) and Google (2006-present).</simpara>
</refsect1>
</refentry>
Update man pages for Tesseract 3.02. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@670 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2012-02-10 06:55:47 +08:00			`<?xml version="1.0" encoding="UTF-8"?>`
			`<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">`
			`<?asciidoc-toc?>`
			`<?asciidoc-numbered?>`
			`<refentry lang="en">`
fix links in doc; autotools requires README 2015-06-13 06:08:05 +08:00			`<refentryinfo>`
			`<title>DAWG2WORDLIST(1)</title>`
			`</refentryinfo>`
Update man pages for Tesseract 3.02. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@670 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2012-02-10 06:55:47 +08:00			`<refmeta>`
			`<refentrytitle>dawg2wordlist</refentrytitle>`
			`<manvolnum>1</manvolnum>`
fix links in doc; autotools requires README 2015-06-13 06:08:05 +08:00			`<refmiscinfo class="source"> </refmiscinfo>`
			`<refmiscinfo class="manual"> </refmiscinfo>`
Update man pages for Tesseract 3.02. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@670 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2012-02-10 06:55:47 +08:00			`</refmeta>`
			`<refnamediv>`
			`<refname>dawg2wordlist</refname>`
			`<refpurpose>convert a Tesseract DAWG to a wordlist</refpurpose>`
			`</refnamediv>`
			`<refsynopsisdiv id="_synopsis">`
			`<simpara><emphasis role="strong">dawg2wordlist</emphasis> <emphasis>UNICHARSET</emphasis> <emphasis>DAWG</emphasis> <emphasis>WORDLIST</emphasis></simpara>`
			`</refsynopsisdiv>`
			`<refsect1 id="_description">`
			`<title>DESCRIPTION</title>`
			`<simpara>dawg2wordlist(1) converts a Tesseract Directed Acyclic Word`
			`Graph (DAWG) to a list of words using a unicharset as key.</simpara>`
			`</refsect1>`
			`<refsect1 id="_options">`
			`<title>OPTIONS</title>`
			`<simpara><emphasis>UNICHARSET</emphasis>`
			`The unicharset of the language. This is the unicharset`
			`generated by mftraining(1).</simpara>`
			`<simpara><emphasis>DAWG</emphasis>`
			`The input DAWG, created by wordlist2dawg(1)</simpara>`
			`<simpara><emphasis>WORDLIST</emphasis>`
			`Plain text (output) file in UTF-8, one word per line</simpara>`
			`</refsect1>`
			`<refsect1 id="_see_also">`
			`<title>SEE ALSO</title>`
			`<simpara>tesseract(1), mftraining(1), wordlist2dawg(1), unicharset(5),`
			`combine_tessdata(1)</simpara>`
fix links in doc; autotools requires README 2015-06-13 06:08:05 +08:00			`<simpara><ulink url="https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract">https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract</ulink></simpara>`
Update man pages for Tesseract 3.02. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@670 d0cd1f9f-072b-0410-8dd7-cf729c803f20 2012-02-10 06:55:47 +08:00			`</refsect1>`
			`<refsect1 id="_copying">`
			`<title>COPYING</title>`
			`<simpara>Copyright (C) 2012 Google, Inc.`
			`Licensed under the Apache License, Version 2.0</simpara>`
			`</refsect1>`
			`<refsect1 id="_author">`
			`<title>AUTHOR</title>`
			`<simpara>The Tesseract OCR engine was written by Ray Smith and his research groups`
			`at Hewlett Packard (1985-1995) and Google (2006-present).</simpara>`
			`</refsect1>`
			`</refentry>`