tidy tesseract(1) adding missing options

Together with:
- fix "C\++"
- align executable --print-parameters message
This commit is contained in:
Chris Mayo 2017-03-23 20:02:50 +00:00
parent 6c3d8fad17
commit b231aee212
5 changed files with 157 additions and 22 deletions

View File

@ -169,7 +169,7 @@ void PrintHelpMessage(const char* program) {
" --help-oem Show OCR Engine modes.\n"
" -v, --version Show version information.\n"
" --list-langs List available languages for tesseract engine.\n"
" --print-parameters Print tesseract parameters to stdout.\n";
" --print-parameters Print tesseract parameters.\n";
printf("\n%s", single_options);
}

View File

@ -2,12 +2,12 @@
.\" Title: tesseract
.\" Author: [see the "AUTHOR" section]
.\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
.\" Date: 06/28/2015
.\" Date: 03/23/2017
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "TESSERACT" "1" "06/28/2015" "\ \&" "\ \&"
.TH "TESSERACT" "1" "03/23/2017" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
@ -84,7 +84,7 @@ Set value for control parameter\&. Multiple \-c arguments are allowed\&.
The language to use\&. If none is specified, English is assumed\&. Multiple languages may be specified, separated by plus characters\&. Tesseract uses 3\-character ISO 639\-2 language codes\&. (See LANGUAGES)
.RE
.PP
\fI\--psm N\fR
\fI\-\-psm N\fR
.RS 4
Set Tesseract to only run a subset of layout analysis and assume a certain form of image\&. The options for
\fBN\fR
@ -111,6 +111,26 @@ are:
.\}
.RE
.PP
\fI\-\-oem N\fR
.RS 4
Specify OCR Engine mode\&. The options for
\fBN\fR
are:
.sp
.if n \{\
.RS 4
.\}
.nf
0 = Original Tesseract only\&.
1 = Neural nets LSTM only\&.
2 = Tesseract + LSTM\&.
3 = Default, based on what is available\&.
.fi
.if n \{\
.RE
.\}
.RE
.PP
\fIconfigfile\fR
.RS 4
The name of a config to use\&. A config is a plaintext file which contains a list of variables and their values, one per line, with a space separating variable from value\&. Interesting config files include:
@ -139,22 +159,37 @@ pdf \- Output in pdf instead of a text file\&.
.RE
.RE
.sp
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\--psm N\fR must occur before any \fIconfigfile\fR\&.
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\-\-psm N\fR must occur before any \fIconfigfile\fR\&.
.SH "SINGLE OPTIONS"
.PP
\fI\-v\fR
\fI\-h, \-\-help\fR
.RS 4
Show help message\&.
.RE
.PP
\fI\-\-help\-psm\fR
.RS 4
Show page segmentation modes\&.
.RE
.PP
\fI\-\-help\-oem\fR
.RS 4
Show OCR Engine modes\&.
.RE
.PP
\fI\-v, \-\-version\fR
.RS 4
Returns the current version of the tesseract(1) executable\&.
.RE
.PP
\fI\-\-list\-langs\fR
.RS 4
list available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
List available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
.RE
.PP
\fI\-\-print\-parameters\fR
.RS 4
print tesseract parameters to the stdout\&.
Print tesseract parameters\&.
.RE
.SH "LANGUAGES"
.sp
@ -220,7 +255,7 @@ user_patterns_suffix user\-patterns
Now, if you pass the word \fIbazaar\fR as a trailing command line parameter to Tesseract, Tesseract will not bother loading the system dictionary nor the dictionary of frequent words and will load and use the eng\&.user\-words and eng\&.user\-patterns files you provided\&. The former is a simple word list, one per line\&. The format of the latter is documented in dict/trie\&.h on read_pattern_list()\&.
.SH "HISTORY"
.sp
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C\e++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
.sp
Version 2\&.00 brought Unicode (UTF\-8) support, six languages, and the ability to train Tesseract\&.
.sp

View File

@ -70,6 +70,14 @@ OPTIONS
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
'--oem N'::
Specify OCR Engine mode. The options for *N* are:
0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.
'configfile'::
The name of a config to use. A config is a plaintext file which
contains a list of variables and their values, one per line, with a
@ -84,14 +92,23 @@ before any 'configfile'.
SINGLE OPTIONS
--------------
'-v'::
'-h, --help'::
Show help message.
'--help-psm'::
Show page segmentation modes.
'--help-oem'::
Show OCR Engine modes.
'-v, --version'::
Returns the current version of the tesseract(1) executable.
'--list-langs'::
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
'--print-parameters'::
print tesseract parameters to the stdout.
Print tesseract parameters.
@ -268,7 +285,7 @@ The engine was developed at Hewlett Packard Laboratories Bristol and at
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C\+\+izing in 1998. A
lot of the code was written in C, and then some more was written in C\+\+.
The C\+\+ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.

View File

@ -870,6 +870,21 @@ at Google since then.</p></div>
</div></div>
</dd>
<dt class="hdlist1">
<em>--oem N</em>
</dt>
<dd>
<p>
Specify OCR Engine mode. The options for <strong>N</strong> are:
</p>
<div class="literalblock">
<div class="content">
<pre><code>0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.</code></pre>
</div></div>
</dd>
<dt class="hdlist1">
<em>configfile</em>
</dt>
<dd>
@ -902,7 +917,31 @@ before any <em>configfile</em>.</p></div>
<div class="sectionbody">
<div class="dlist"><dl>
<dt class="hdlist1">
<em>-v</em>
<em>-h, --help</em>
</dt>
<dd>
<p>
Show help message.
</p>
</dd>
<dt class="hdlist1">
<em>--help-psm</em>
</dt>
<dd>
<p>
Show page segmentation modes.
</p>
</dd>
<dt class="hdlist1">
<em>--help-oem</em>
</dt>
<dd>
<p>
Show OCR Engine modes.
</p>
</dd>
<dt class="hdlist1">
<em>-v, --version</em>
</dt>
<dd>
<p>
@ -914,7 +953,7 @@ before any <em>configfile</em>.</p></div>
</dt>
<dd>
<p>
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
</p>
</dd>
<dt class="hdlist1">
@ -922,7 +961,7 @@ before any <em>configfile</em>.</p></div>
</dt>
<dd>
<p>
print tesseract parameters to the stdout.
Print tesseract parameters.
</p>
</dd>
</dl></div>
@ -1099,7 +1138,7 @@ on read_pattern_list().</p></div>
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C++izing in 1998. A
lot of the code was written in C, and then some more was written in C++.
The C\++ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.</p></div>
@ -1156,7 +1195,7 @@ Lloyd, Shobhit Saxena, and Thomas Kielbus.</p></div>
<div id="footnotes"><hr /></div>
<div id="footer">
<div id="footer-text">
Last updated 2015-06-28 22:23:47 CEST
Last updated 2017-03-23 19:56:19 GMT
</div>
</div>
</body>

View File

@ -152,6 +152,20 @@ at Google since then.</simpara>
</varlistentry>
<varlistentry>
<term>
<emphasis>--oem N</emphasis>
</term>
<listitem>
<simpara>
Specify OCR Engine mode. The options for <emphasis role="strong">N</emphasis> are:
</simpara>
<literallayout class="monospaced">0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.</literallayout>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>configfile</emphasis>
</term>
<listitem>
@ -184,7 +198,37 @@ before any <emphasis>configfile</emphasis>.</simpara>
<variablelist>
<varlistentry>
<term>
<emphasis>-v</emphasis>
<emphasis>-h, --help</emphasis>
</term>
<listitem>
<simpara>
Show help message.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>--help-psm</emphasis>
</term>
<listitem>
<simpara>
Show page segmentation modes.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>--help-oem</emphasis>
</term>
<listitem>
<simpara>
Show OCR Engine modes.
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis>-v, --version</emphasis>
</term>
<listitem>
<simpara>
@ -198,7 +242,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
</term>
<listitem>
<simpara>
list available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with --tessdata-dir.
</simpara>
</listitem>
</varlistentry>
@ -208,7 +252,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
</term>
<listitem>
<simpara>
print tesseract parameters to the stdout.
Print tesseract parameters.
</simpara>
</listitem>
</varlistentry>
@ -377,7 +421,7 @@ on read_pattern_list().</simpara>
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some C++izing in 1998. A
lot of the code was written in C, and then some more was written in C++.
The C\++ code makes heavy use of a list system using macros. This predates
The C++ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.</simpara>