mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-01-07 10:17:50 +08:00
3f3f9931e2
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@487 d0cd1f9f-072b-0410-8dd7-cf729c803f20
86 lines
2.6 KiB
Plaintext
86 lines
2.6 KiB
Plaintext
COMBINE_TESSDATA(1)
|
|
===================
|
|
|
|
NAME
|
|
----
|
|
combine_tessdata - combine/extract/overwrite Tesseract data
|
|
|
|
SYNOPSIS
|
|
--------
|
|
*combine_tessdata* ['OPTION'] 'FILE'...
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
combine_tessdata(1) is the main program to combine/extract/overwrite
|
|
tessdata components in [lang].traineddata files.
|
|
|
|
To combine all the individual tessdata components (unicharset, DAWGs,
|
|
classifier templates, ambiguities, language configs) located at, say,
|
|
/home/$USER/temp/eng.* run:
|
|
|
|
combine_tessdata /home/$USER/temp/eng.
|
|
|
|
The result will be a combined tessdata file /home/$USER/temp/eng.traineddata
|
|
|
|
Specify option -e if you would like to extract individual components
|
|
from a combined traineddata file. For example, to extract language config
|
|
file and the unicharset from tessdata/eng.traineddata run:
|
|
|
|
combine_tessdata -e tessdata/eng.traineddata
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset
|
|
|
|
The desired config file and unicharset will be written to
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharset
|
|
|
|
Specify option -o to overwrite individual components of the given
|
|
[lang].traineddata file. For example, to overwrite language config
|
|
and unichar ambiguities files in tessdata/eng.traineddata use:
|
|
|
|
combine_tessdata -o tessdata/eng.traineddata
|
|
/home/$USER/temp/eng.config /home/$USER/temp/eng.unicharambigs
|
|
|
|
As a result, tessdata/eng.traineddata will contain the new language config
|
|
and unichar ambigs, plus all the original DAWGs, classifier templates, etc.
|
|
|
|
Note: the file names of the files to extract to and to overwrite from should
|
|
have the appropriate file suffixes (extensions) indicating their tessdata
|
|
component type (.unicharset for the unicharset, .unicharambigs for unichar
|
|
ambigs, etc). See k*FileSuffix variable in ccutil/tessdatamanager.h.
|
|
|
|
Specify option -u to unpack all the components to the specified path:
|
|
|
|
combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
|
|
|
|
This will create /home/$USER/temp/eng.* files with individual tessdata
|
|
components from tessdata/eng.traineddata.
|
|
|
|
OPTIONS
|
|
-------
|
|
*-e* '.traineddata' 'FILE'...:
|
|
Extracts the specified components from the .traineddata file
|
|
|
|
*-o* '.traineddata' 'FILE'...:
|
|
Overwrites the specified components of the .traineddata file
|
|
with those provided on the comand line.
|
|
|
|
*-u* '.traineddata' 'PATHPREFIX'
|
|
Unpacks the .traineddata using the provided prefix.
|
|
|
|
|
|
CAVEATS
|
|
-------
|
|
'Prefix' refers to the full file prefix, including period (.)
|
|
|
|
HISTORY
|
|
-------
|
|
combine_tessdata(1) first appeared in version 3.00 of Tesseract
|
|
|
|
SEE ALSO
|
|
--------
|
|
tesseract(1)
|
|
|
|
COPYING
|
|
-------
|
|
Copyright \(C) 2009, Google Inc.
|
|
Licensed under the Apache License, Version 2.0
|