'\" t .\" Title: combine_tessdata .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.75.2 .\" Date: 09/30/2010 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "COMBINE_TESSDATA" "1" "09/30/2010" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" combine_tessdata \- combine/extract/overwrite Tesseract data .SH "SYNOPSIS" .sp \fBcombine_tessdata\fR [\fIOPTION\fR] \fIFILE\fR\&... .SH "DESCRIPTION" .sp combine_tessdata(1) is the main program to combine/extract/overwrite tessdata components in [lang]\&.traineddata files\&. .sp To combine all the individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, language configs) located at, say, /home/$USER/temp/eng\&.* run: .sp .if n \{\ .RS 4 .\} .nf combine_tessdata /home/$USER/temp/eng\&. .fi .if n \{\ .RE .\} .sp The result will be a combined tessdata file /home/$USER/temp/eng\&.traineddata .sp Specify option \-e if you would like to extract individual components from a combined traineddata file\&. For example, to extract language config file and the unicharset from tessdata/eng\&.traineddata run: .sp .if n \{\ .RS 4 .\} .nf combine_tessdata \-e tessdata/eng\&.traineddata /home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharset .fi .if n \{\ .RE .\} .sp The desired config file and unicharset will be written to /home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharset .sp Specify option \-o to overwrite individual components of the given [lang]\&.traineddata file\&. For example, to overwrite language config and unichar ambiguities files in tessdata/eng\&.traineddata use: .sp .if n \{\ .RS 4 .\} .nf combine_tessdata \-o tessdata/eng\&.traineddata /home/$USER/temp/eng\&.config /home/$USER/temp/eng\&.unicharambigs .fi .if n \{\ .RE .\} .sp As a result, tessdata/eng\&.traineddata will contain the new language config and unichar ambigs, plus all the original DAWGs, classifier templates, etc\&. .sp Note: the file names of the files to extract to and to overwrite from should have the appropriate file suffixes (extensions) indicating their tessdata component type (\&.unicharset for the unicharset, \&.unicharambigs for unichar ambigs, etc)\&. See k*FileSuffix variable in ccutil/tessdatamanager\&.h\&. .sp Specify option \-u to unpack all the components to the specified path: .sp .if n \{\ .RS 4 .\} .nf combine_tessdata \-u tessdata/eng\&.traineddata /home/$USER/temp/eng\&. .fi .if n \{\ .RE .\} .sp This will create /home/$USER/temp/eng\&.* files with individual tessdata components from tessdata/eng\&.traineddata\&. .SH "OPTIONS" .sp \fB\-e\fR \fI\&.traineddata\fR \fIFILE\fR\&...: Extracts the specified components from the \&.traineddata file .sp \fB\-o\fR \fI\&.traineddata\fR \fIFILE\fR\&...: Overwrites the specified components of the \&.traineddata file with those provided on the comand line\&. .sp \fB\-u\fR \fI\&.traineddata\fR \fIPATHPREFIX\fR Unpacks the \&.traineddata using the provided prefix\&. .SH "CAVEATS" .sp \fIPrefix\fR refers to the full file prefix, including period (\&.) .SH "HISTORY" .sp combine_tessdata(1) first appeared in version 3\&.00 of Tesseract .SH "SEE ALSO" .sp tesseract(1) .SH "COPYING" .sp Copyright (C) 2009, Google Inc\&. Licensed under the Apache License, Version 2\&.0