From b231aee21215550bb25bd3eac6a4c080be397758 Mon Sep 17 00:00:00 2001 From: Chris Mayo Date: Thu, 23 Mar 2017 20:02:50 +0000 Subject: [PATCH] tidy tesseract(1) adding missing options Together with: - fix "C\++" - align executable --print-parameters message --- api/tesseractmain.cpp | 2 +- doc/tesseract.1 | 51 +++++++++++++++++++++++++++++++++++------- doc/tesseract.1.asc | 25 +++++++++++++++++---- doc/tesseract.1.html | 49 +++++++++++++++++++++++++++++++++++----- doc/tesseract.1.xml | 52 +++++++++++++++++++++++++++++++++++++++---- 5 files changed, 157 insertions(+), 22 deletions(-) diff --git a/api/tesseractmain.cpp b/api/tesseractmain.cpp index 9604d72a..9e6d35d4 100644 --- a/api/tesseractmain.cpp +++ b/api/tesseractmain.cpp @@ -169,7 +169,7 @@ void PrintHelpMessage(const char* program) { " --help-oem Show OCR Engine modes.\n" " -v, --version Show version information.\n" " --list-langs List available languages for tesseract engine.\n" - " --print-parameters Print tesseract parameters to stdout.\n"; + " --print-parameters Print tesseract parameters.\n"; printf("\n%s", single_options); } diff --git a/doc/tesseract.1 b/doc/tesseract.1 index 89107f03..fdf7cdac 100644 --- a/doc/tesseract.1 +++ b/doc/tesseract.1 @@ -2,12 +2,12 @@ .\" Title: tesseract .\" Author: [see the "AUTHOR" section] .\" Generator: DocBook XSL Stylesheets v1.78.1 -.\" Date: 06/28/2015 +.\" Date: 03/23/2017 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "TESSERACT" "1" "06/28/2015" "\ \&" "\ \&" +.TH "TESSERACT" "1" "03/23/2017" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- @@ -84,7 +84,7 @@ Set value for control parameter\&. Multiple \-c arguments are allowed\&. The language to use\&. If none is specified, English is assumed\&. Multiple languages may be specified, separated by plus characters\&. Tesseract uses 3\-character ISO 639\-2 language codes\&. (See LANGUAGES) .RE .PP -\fI\--psm N\fR +\fI\-\-psm N\fR .RS 4 Set Tesseract to only run a subset of layout analysis and assume a certain form of image\&. The options for \fBN\fR @@ -111,6 +111,26 @@ are: .\} .RE .PP +\fI\-\-oem N\fR +.RS 4 +Specify OCR Engine mode\&. The options for +\fBN\fR +are: +.sp +.if n \{\ +.RS 4 +.\} +.nf +0 = Original Tesseract only\&. +1 = Neural nets LSTM only\&. +2 = Tesseract + LSTM\&. +3 = Default, based on what is available\&. +.fi +.if n \{\ +.RE +.\} +.RE +.PP \fIconfigfile\fR .RS 4 The name of a config to use\&. A config is a plaintext file which contains a list of variables and their values, one per line, with a space separating variable from value\&. Interesting config files include: @@ -139,22 +159,37 @@ pdf \- Output in pdf instead of a text file\&. .RE .RE .sp -\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\--psm N\fR must occur before any \fIconfigfile\fR\&. +\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\-\-psm N\fR must occur before any \fIconfigfile\fR\&. .SH "SINGLE OPTIONS" .PP -\fI\-v\fR +\fI\-h, \-\-help\fR +.RS 4 +Show help message\&. +.RE +.PP +\fI\-\-help\-psm\fR +.RS 4 +Show page segmentation modes\&. +.RE +.PP +\fI\-\-help\-oem\fR +.RS 4 +Show OCR Engine modes\&. +.RE +.PP +\fI\-v, \-\-version\fR .RS 4 Returns the current version of the tesseract(1) executable\&. .RE .PP \fI\-\-list\-langs\fR .RS 4 -list available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&. +List available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&. .RE .PP \fI\-\-print\-parameters\fR .RS 4 -print tesseract parameters to the stdout\&. +Print tesseract parameters\&. .RE .SH "LANGUAGES" .sp @@ -220,7 +255,7 @@ user_patterns_suffix user\-patterns Now, if you pass the word \fIbazaar\fR as a trailing command line parameter to Tesseract, Tesseract will not bother loading the system dictionary nor the dictionary of frequent words and will load and use the eng\&.user\-words and eng\&.user\-patterns files you provided\&. The former is a simple word list, one per line\&. The format of the latter is documented in dict/trie\&.h on read_pattern_list()\&. .SH "HISTORY" .sp -The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C\e++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&. +The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&. .sp Version 2\&.00 brought Unicode (UTF\-8) support, six languages, and the ability to train Tesseract\&. .sp diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index 312aae07..6832ea0c 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -70,6 +70,14 @@ OPTIONS 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character. +'--oem N':: + Specify OCR Engine mode. The options for *N* are: + + 0 = Original Tesseract only. + 1 = Neural nets LSTM only. + 2 = Tesseract + LSTM. + 3 = Default, based on what is available. + 'configfile':: The name of a config to use. A config is a plaintext file which contains a list of variables and their values, one per line, with a @@ -84,14 +92,23 @@ before any 'configfile'. SINGLE OPTIONS -------------- -'-v':: +'-h, --help':: + Show help message. + +'--help-psm':: + Show page segmentation modes. + +'--help-oem':: + Show OCR Engine modes. + +'-v, --version':: Returns the current version of the tesseract(1) executable. '--list-langs':: - list available languages for tesseract engine. Can be used with --tessdata-dir. + List available languages for tesseract engine. Can be used with --tessdata-dir. '--print-parameters':: - print tesseract parameters to the stdout. + Print tesseract parameters. @@ -268,7 +285,7 @@ The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C\+\+izing in 1998. A lot of the code was written in C, and then some more was written in C\+\+. -The C\+\+ code makes heavy use of a list system using macros. This predates +The C++ code makes heavy use of a list system using macros. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug. diff --git a/doc/tesseract.1.html b/doc/tesseract.1.html index d0addae6..d9dbcc0b 100644 --- a/doc/tesseract.1.html +++ b/doc/tesseract.1.html @@ -870,6 +870,21 @@ at Google since then.

+--oem N +
+
+

+ Specify OCR Engine mode. The options for N are: +

+
+
+
0 = Original Tesseract only.
+1 = Neural nets LSTM only.
+2 = Tesseract + LSTM.
+3 = Default, based on what is available.
+
+
+
configfile
@@ -902,7 +917,31 @@ before any configfile.

--v +-h, --help +
+
+

+ Show help message. +

+
+
+--help-psm +
+
+

+ Show page segmentation modes. +

+
+
+--help-oem +
+
+

+ Show OCR Engine modes. +

+
+
+-v, --version

@@ -914,7 +953,7 @@ before any configfile.

- list available languages for tesseract engine. Can be used with --tessdata-dir. + List available languages for tesseract engine. Can be used with --tessdata-dir.

@@ -922,7 +961,7 @@ before any configfile.

- print tesseract parameters to the stdout. + Print tesseract parameters.

@@ -1099,7 +1138,7 @@ on read_pattern_list().

Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. A lot of the code was written in C, and then some more was written in C++. -The C\++ code makes heavy use of a list system using macros. This predates +The C++ code makes heavy use of a list system using macros. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug.

@@ -1156,7 +1195,7 @@ Lloyd, Shobhit Saxena, and Thomas Kielbus.


diff --git a/doc/tesseract.1.xml b/doc/tesseract.1.xml index 8ddce87c..941caa5b 100644 --- a/doc/tesseract.1.xml +++ b/doc/tesseract.1.xml @@ -152,6 +152,20 @@ at Google since then. +--oem N + + + + Specify OCR Engine mode. The options for N are: + +0 = Original Tesseract only. +1 = Neural nets LSTM only. +2 = Tesseract + LSTM. +3 = Default, based on what is available. + + + + configfile @@ -184,7 +198,37 @@ before any configfile. --v +-h, --help + + + + Show help message. + + + + + +--help-psm + + + + Show page segmentation modes. + + + + + +--help-oem + + + + Show OCR Engine modes. + + + + + +-v, --version @@ -198,7 +242,7 @@ before any configfile. - list available languages for tesseract engine. Can be used with --tessdata-dir. + List available languages for tesseract engine. Can be used with --tessdata-dir. @@ -208,7 +252,7 @@ before any configfile. - print tesseract parameters to the stdout. + Print tesseract parameters. @@ -377,7 +421,7 @@ on read_pattern_list(). Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. A lot of the code was written in C, and then some more was written in C++. -The C\++ code makes heavy use of a list system using macros. This predates +The C++ code makes heavy use of a list system using macros. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug.