2010-09-28 10:11:19 +08:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
|
2010-09-30 00:46:04 +08:00
|
|
|
<?asciidoc-toc?>
|
|
|
|
<?asciidoc-numbered?>
|
2010-09-28 10:11:19 +08:00
|
|
|
<refentry lang="en">
|
|
|
|
<refmeta>
|
|
|
|
<refentrytitle>combine_tessdata</refentrytitle>
|
|
|
|
<manvolnum>1</manvolnum>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refmiscinfo class="source"> </refmiscinfo>
|
|
|
|
<refmiscinfo class="manual"> </refmiscinfo>
|
2010-09-28 10:11:19 +08:00
|
|
|
</refmeta>
|
|
|
|
<refnamediv>
|
|
|
|
<refname>combine_tessdata</refname>
|
|
|
|
<refpurpose>combine/extract/overwrite Tesseract data</refpurpose>
|
|
|
|
</refnamediv>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsynopsisdiv id="_synopsis">
|
2010-09-28 10:11:19 +08:00
|
|
|
<simpara><emphasis role="strong">combine_tessdata</emphasis> [<emphasis>OPTION</emphasis>] <emphasis>FILE</emphasis>…</simpara>
|
|
|
|
</refsynopsisdiv>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_description">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>DESCRIPTION</title>
|
|
|
|
<simpara>combine_tessdata(1) is the main program to combine/extract/overwrite
|
|
|
|
tessdata components in [lang].traineddata files.</simpara>
|
|
|
|
<simpara>To combine all the individual tessdata components (unicharset, DAWGs,
|
|
|
|
classifier templates, ambiguities, language configs) located at, say,
|
|
|
|
/home/$USER/temp/eng.* run:</simpara>
|
|
|
|
<literallayout class="monospaced">combine_tessdata /home/$USER/temp/eng.</literallayout>
|
|
|
|
<simpara>The result will be a combined tessdata file /home/$USER/temp/eng.traineddata</simpara>
|
|
|
|
<simpara>Specify option -e if you would like to extract individual components
|
|
|
|
from a combined traineddata file. For example, to extract language config
|
|
|
|
file and the unicharset from tessdata/eng.traineddata run:</simpara>
|
|
|
|
<literallayout class="monospaced">combine_tessdata -e tessdata/eng.traineddata
|
|
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset</literallayout>
|
|
|
|
<simpara>The desired config file and unicharset will be written to
|
|
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset</simpara>
|
|
|
|
<simpara>Specify option -o to overwrite individual components of the given
|
|
|
|
[lang].traineddata file. For example, to overwrite language config
|
|
|
|
and unichar ambiguities files in tessdata/eng.traineddata use:</simpara>
|
|
|
|
<literallayout class="monospaced">combine_tessdata -o tessdata/eng.traineddata
|
|
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharambigs</literallayout>
|
|
|
|
<simpara>As a result, tessdata/eng.traineddata will contain the new language config
|
|
|
|
and unichar ambigs, plus all the original DAWGs, classifier templates, etc.</simpara>
|
|
|
|
<simpara>Note: the file names of the files to extract to and to overwrite from should
|
|
|
|
have the appropriate file suffixes (extensions) indicating their tessdata
|
|
|
|
component type (.unicharset for the unicharset, .unicharambigs for unichar
|
|
|
|
ambigs, etc). See k*FileSuffix variable in ccutil/tessdatamanager.h.</simpara>
|
2010-09-30 00:46:04 +08:00
|
|
|
<simpara>Specify option -u to unpack all the components to the specified path:</simpara>
|
2010-09-28 10:11:19 +08:00
|
|
|
<literallayout class="monospaced">combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.</literallayout>
|
|
|
|
<simpara>This will create /home/$USER/temp/eng.* files with individual tessdata
|
|
|
|
components from tessdata/eng.traineddata.</simpara>
|
|
|
|
</refsect1>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_options">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>OPTIONS</title>
|
2010-09-30 00:46:04 +08:00
|
|
|
<simpara><emphasis role="strong">-e</emphasis> <emphasis>.traineddata</emphasis> <emphasis>FILE</emphasis>…:
|
|
|
|
Extracts the specified components from the .traineddata file</simpara>
|
|
|
|
<simpara><emphasis role="strong">-o</emphasis> <emphasis>.traineddata</emphasis> <emphasis>FILE</emphasis>…:
|
|
|
|
Overwrites the specified components of the .traineddata file
|
|
|
|
with those provided on the comand line.</simpara>
|
2010-09-30 18:13:09 +08:00
|
|
|
<simpara><emphasis role="strong">-u</emphasis> <emphasis>.traineddata</emphasis> <emphasis>PATHPREFIX</emphasis>
|
2010-09-30 00:46:04 +08:00
|
|
|
Unpacks the .traineddata using the provided prefix.</simpara>
|
2010-09-28 10:11:19 +08:00
|
|
|
</refsect1>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_caveats">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>CAVEATS</title>
|
|
|
|
<simpara><emphasis>Prefix</emphasis> refers to the full file prefix, including period (.)</simpara>
|
|
|
|
</refsect1>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_history">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>HISTORY</title>
|
|
|
|
<simpara>combine_tessdata(1) first appeared in version 3.00 of Tesseract</simpara>
|
|
|
|
</refsect1>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_see_also">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>SEE ALSO</title>
|
|
|
|
<simpara>tesseract(1)</simpara>
|
|
|
|
</refsect1>
|
2010-09-30 00:46:04 +08:00
|
|
|
<refsect1 id="_copying">
|
2010-09-28 10:11:19 +08:00
|
|
|
<title>COPYING</title>
|
|
|
|
<simpara>Copyright (C) 2009, Google Inc.
|
|
|
|
Licensed under the Apache License, Version 2.0</simpara>
|
|
|
|
</refsect1>
|
|
|
|
</refentry>
|