2010-09-30 10:06:29 +08:00
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<?asciidoc-toc?>
<?asciidoc-numbered?>
<refentry lang= "en" >
<refmeta >
<refentrytitle > mftraining</refentrytitle>
<manvolnum > 1</manvolnum>
<refmiscinfo class= "source" > </refmiscinfo>
<refmiscinfo class= "manual" > </refmiscinfo>
</refmeta>
<refnamediv >
<refname > mftraining</refname>
<refpurpose > feature training for Tesseract</refpurpose>
</refnamediv>
<refsynopsisdiv id= "_synopsis" >
<simpara > mftraining -U <emphasis > unicharset</emphasis> -O <emphasis > lang.unicharset</emphasis> <emphasis > FILE</emphasis> … </simpara>
</refsynopsisdiv>
<refsect1 id= "_description" >
<title > DESCRIPTION</title>
<simpara > mftraining takes a list of .tr files, from which it generates the
2012-02-10 06:55:47 +08:00
files <emphasis role= "strong" > inttemp</emphasis> (the shape prototypes), <emphasis role= "strong" > shapetable</emphasis> , and <emphasis role= "strong" > pffmtable</emphasis>
(the number of expected features for each character). (A fourth file
called Microfeat is also written by this program, but it is not used.)</simpara>
2010-09-30 10:06:29 +08:00
</refsect1>
<refsect1 id= "_options" >
<title > OPTIONS</title>
2012-02-10 06:55:47 +08:00
<variablelist >
<varlistentry >
<term >
-U <emphasis > FILE</emphasis>
</term>
<listitem >
<simpara >
(Input) The unicharset generated by unicharset_extractor(1)
</simpara>
</listitem>
</varlistentry>
<varlistentry >
<term >
-F <emphasis > font_properties_file</emphasis>
</term>
<listitem >
<simpara >
(Input) font properties file, each line is of the following form, where each field other than the font name is 0 or 1:
</simpara>
<literallayout class= "monospaced" > *font_name* *italic* *bold* *fixed_pitch* *serif* *fraktur*</literallayout>
</listitem>
</varlistentry>
<varlistentry >
<term >
-X <emphasis > xheights_file</emphasis>
</term>
<listitem >
<simpara >
(Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]
</simpara>
<literallayout class= "monospaced" > *font_name* *xheight*</literallayout>
</listitem>
</varlistentry>
<varlistentry >
<term >
-D <emphasis > dir</emphasis>
</term>
<listitem >
<simpara >
Directory to write output files to.
</simpara>
</listitem>
</varlistentry>
<varlistentry >
<term >
-O <emphasis > FILE</emphasis>
</term>
<listitem >
<simpara >
(Output) The output unicharset that will be given to combine_tessdata(1)
</simpara>
</listitem>
</varlistentry>
</variablelist>
2010-09-30 10:06:29 +08:00
</refsect1>
<refsect1 id= "_see_also" >
<title > SEE ALSO</title>
2012-02-10 06:55:47 +08:00
<simpara > tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1),
shapeclustering(1), unicharset(5)</simpara>
<simpara > <ulink url= "http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3" > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3</ulink> </simpara>
2010-09-30 10:06:29 +08:00
</refsect1>
<refsect1 id= "_copying" >
<title > COPYING</title>
2012-02-10 06:55:47 +08:00
<simpara > Copyright (C) Hewlett-Packard Company, 1988
2010-09-30 10:06:29 +08:00
Licensed under the Apache License, Version 2.0</simpara>
</refsect1>
2012-02-10 06:55:47 +08:00
<refsect1 id= "_author" >
<title > AUTHOR</title>
<simpara > The Tesseract OCR engine was written by Ray Smith and his research groups
at Hewlett Packard (1985-1995) and Google (2006-present).</simpara>
</refsect1>
2010-09-30 10:06:29 +08:00
</refentry>