tesseract/doc/combine_tessdata.1
2010-09-30 10:13:09 +00:00

117 lines
4.1 KiB
Groff

'\" t
.\" Title: combine_tessdata
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2 <http://docbook.sf.net/>
.\" Date: 09/30/2010
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "COMBINE_TESSDATA" "1" "09/30/2010" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
combine_tessdata \- combine/extract/overwrite Tesseract data
.SH "SYNOPSIS"
.sp
\fBcombine_tessdata\fR [\fIOPTION\fR] \fIFILE\fR\&...
.SH "DESCRIPTION"
.sp
combine_tessdata(1) is the main program to combine/extract/overwrite tessdata components in [lang]\&.traineddata files\&.
.sp
To combine all the individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs) located at, say, /home/$USER/temp/eng\&.* run:
.sp
.if n \{\
.RS 4
.\}
.nf
combine_tessdata /home/$USER/temp/eng\&.
.fi
.if n \{\
.RE
.\}
.sp
The result will be a combined tessdata file /home/$USER/temp/eng\&.traineddata
.sp
Specify option \-e if you would like to extract individual components from a combined traineddata file\&. For example, to extract language config file and the unicharset from tessdata/eng\&.traineddata run:
.sp
.if n \{\
.RS 4
.\}
.nf
combine_tessdata \-e tessdata/eng\&.traineddata
/home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharset
.fi
.if n \{\
.RE
.\}
.sp
The desired config file and unicharset will be written to /home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharset
.sp
Specify option \-o to overwrite individual components of the given [lang]\&.traineddata file\&. For example, to overwrite language config and unichar ambiguities files in tessdata/eng\&.traineddata use:
.sp
.if n \{\
.RS 4
.\}
.nf
combine_tessdata \-o tessdata/eng\&.traineddata
/home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharambigs
.fi
.if n \{\
.RE
.\}
.sp
As a result, tessdata/eng\&.traineddata will contain the new language config and unichar ambigs, plus all the original DAWGs, classifier templates, etc\&.
.sp
Note: the file names of the files to extract to and to overwrite from should have the appropriate file suffixes (extensions) indicating their tessdata component type (\&.unicharset for the unicharset, \&.unicharambigs for unichar ambigs, etc)\&. See k*FileSuffix variable in ccutil/tessdatamanager\&.h\&.
.sp
Specify option \-u to unpack all the components to the specified path:
.sp
.if n \{\
.RS 4
.\}
.nf
combine_tessdata \-u tessdata/eng\&.traineddata /home/$USER/temp/eng\&.
.fi
.if n \{\
.RE
.\}
.sp
This will create /home/$USER/temp/eng\&.* files with individual tessdata components from tessdata/eng\&.traineddata\&.
.SH "OPTIONS"
.sp
\fB\-e\fR \fI\&.traineddata\fR \fIFILE\fR\&...: Extracts the specified components from the \&.traineddata file
.sp
\fB\-o\fR \fI\&.traineddata\fR \fIFILE\fR\&...: Overwrites the specified components of the \&.traineddata file with those provided on the comand line\&.
.sp
\fB\-u\fR \fI\&.traineddata\fR \fIPATHPREFIX\fR Unpacks the \&.traineddata using the provided prefix\&.
.SH "CAVEATS"
.sp
\fIPrefix\fR refers to the full file prefix, including period (\&.)
.SH "HISTORY"
.sp
combine_tessdata(1) first appeared in version 3\&.00 of Tesseract
.SH "SEE ALSO"
.sp
tesseract(1)
.SH "COPYING"
.sp
Copyright (C) 2009, Google Inc\&. Licensed under the Apache License, Version 2\&.0