This commit is contained in:
Shree 2019-03-16 14:30:24 +00:00
commit af7a97e33e
37 changed files with 208 additions and 334 deletions

View File

@ -33,8 +33,9 @@ EXTRA_DIST = $(man_MANS) Doxyfile
.PHONY: html
html: ${man_MANS:%=%.html}
pdf: ${man_MANS:%=%.pdf}
SUFFIXES = .asc .html
SUFFIXES = .asc .html .pdf
.asc:
-asciidoc -b docbook -d manpage -o - $< | \
@ -43,6 +44,10 @@ SUFFIXES = .asc .html
.asc.html:
asciidoc -b html5 -o $@ $<
.asc.pdf:
asciidoc -b docbook -d manpage -o $*.dbk $<
docbook2pdf $*.dbk
MAINTAINERCLEANFILES = $(man_MANS) Doxyfile
endif

View File

@ -8,7 +8,7 @@ tesseract - command-line OCR engine
SYNOPSIS
--------
*tesseract* 'imagename'|'listname'|'stdin' 'outputbase'|'stdout' [options...] [configfile...]
*tesseract* 'FILE' 'OUTPUTBASE' ['OPTIONS']... ['CONFIGFILE']...
DESCRIPTION
-----------
@ -20,128 +20,139 @@ at Google since then.
IN/OUT ARGUMENTS
----------------
'imagename'::
The name of the input image. Most image file formats (anything
readable by Leptonica) are supported.
'FILE'::
The name of the input file.
This can either be an image file or a text file. +
Most image file formats (anything readable by Leptonica) are supported. +
A text file lists the names of all input images (one image name per line).
The results will be combined in a single file for each output file format
(txt, pdf, hocr, xml). +
If 'FILE' is `stdin` or `-` then the standard input is used.
'listname'::
The name of a text file which lists the names of all input images
(one image name per line). The results will be combined in a
single file for each output file format (txt, pdf, hocr).
'stdin'::
Instruction to read data from standard input.
'outputbase'::
The basename of the output file (to which the appropriate extension
will be appended). By default the output will be a text file
with `.txt` added to the basename unless there are one or more
parameters set which explicitly specify the desired output.
'stdout'::
Instruction to send output data to standard output.
'OUTPUTBASE'::
The basename of the output file (to which the appropriate extension
will be appended). By default the output will be a text file
with `.txt` added to the basename unless there are one or more
parameters set which explicitly specify the desired output. +
If 'OUTPUTBASE' is `stdout` or `-` then the standard output is used.
[[TESSDATADIR]]
OPTIONS
-------
'--tessdata-dir /path'::
Specify the location of tessdata path.
*-c* 'CONFIGVAR=VALUE'::
Set value for parameter 'CONFIGVAR' to VALUE. Multiple *-c* arguments are allowed.
'--user-words /path/to/file'::
Specify the location of user words file.
*--dpi* 'N'::
Specify the resolution 'N' in DPI for the input image(s).
A typical value for 'N' is `300`. Without this option,
the resolution is read from the metadata included in the image.
If an image does not include that information, Tesseract tries to guess it.
'--user-patterns /path/to/file'::
Specify the location of user patterns file.
*-l* 'LANG'::
*-l* 'SCRIPT'::
The language or script to use.
If none is specified, `eng` (English) is assumed.
Multiple languages may be specified, separated by plus characters.
Tesseract uses 3-character ISO 639-2 language codes
(see <<LANGUAGES,*LANGUAGES AND SCRIPTS*>>).
'-c configvar=value'::
Set value for parameter 'configvar'. Multiple -c arguments are allowed.
*--psm* 'N'::
Set Tesseract to only run a subset of layout analysis and assume
a certain form of image. The options for 'N' are:
'-l lang'::
The language to use. If none is specified, English is assumed.
Multiple languages may be specified, separated by plus characters.
Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
11 = Sparse text. Find as much text as possible in no particular order.
12 = Sparse text with OSD.
13 = Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
'--psm N'::
Set Tesseract to only run a subset of layout analysis and assume
a certain form of image. The options for *N* are:
*--oem* 'N'::
Specify OCR Engine mode. The options for 'N' are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.
'--oem N'::
Specify OCR Engine mode. The options for *N* are:
*--tessdata-dir* 'PATH'::
Specify the location of tessdata path.
0 = Original Tesseract only.
1 = Neural nets LSTM only.
2 = Tesseract + LSTM.
3 = Default, based on what is available.
*--user-patterns* 'FILE'::
Specify the location of user patterns file.
'configfile'::
The name of a config to use. The name can be a file in tessdata/configs
or tessdata/tessconfigs, or an absolute or relative file path.
A config is a plain text file which contains a list of parameters and
their values, one per line, with a space separating parameter from value. +
Interesting config files include:
*--user-words* 'FILE'::
Specify the location of user words file.
* `alto` - Output in ALTO format ('outputbase'`.xml`).
* `hocr` - Output in hOCR format ('outputbase'`.hocr`).
* `pdf` - Output PDF ('outputbase'`.pdf`).
* `tsv` - Output TSV ('outputbase'`.tsv`).
* `txt` - Output plain text ('outputbase'`.txt`).
* `get.images` - Write processed input images to file (`tessinput.tif`).
* `logfile` - Redirect debug messages to file (`tesseract.log`).
* `lstm.train` - Output files used by LSTM training ('outputbase'`.lstmf`).
* `makebox` - Write box file ('outputbase'`.box`).
* `quiet` - Redirect debug messages to /dev/null.
[[CONFIGFILE]]
'CONFIGFILE'::
The name of a config to use. The name can be a file in `tessdata/configs`
or `tessdata/tessconfigs`, or an absolute or relative file path.
A config is a plain text file which contains a list of parameters and
their values, one per line, with a space separating parameter from value. +
Interesting config files include:
* *alto* -- Output in ALTO format ('OUTPUTBASE'`.xml`).
* *hocr* -- Output in hOCR format ('OUTPUTBASE'`.hocr`).
* *pdf* -- Output PDF ('OUTPUTBASE'`.pdf`).
* *tsv* -- Output TSV ('OUTPUTBASE'`.tsv`).
* *txt* -- Output plain text ('OUTPUTBASE'`.txt`).
* *get.images* -- Write processed input images to file (`tessinput.tif`).
* *logfile* -- Redirect debug messages to file (`tesseract.log`).
* *lstm.train* -- Output files used by LSTM training ('OUTPUTBASE'`.lstmf`).
* *makebox* -- Write box file ('OUTPUTBASE'`.box`).
* *quiet* -- Redirect debug messages to '/dev/null'.
It is possible to select several config files, for example
`tesseract image.png demo hocr pdf txt` will create three output files
`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
`tesseract image.png demo alto hocr pdf txt` will create four output files
`demo.alto`, `demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
*Nota Bene:* The options `-l lang` and `--psm N` must occur
before any 'configfile'.
*Nota bene:* The options *-l* 'LANG', *-l* 'SCRIPT' and *--psm* 'N'
must occur before any 'CONFIGFILE'.
SINGLE OPTIONS
--------------
'-h, --help'::
Show help message.
*-h, --help*::
Show help message.
'--help-extra'::
Show extra help for advanced users.
*--help-extra*::
Show extra help for advanced users.
'--help-psm'::
Show page segmentation modes.
*--help-psm*::
Show page segmentation modes.
'--help-oem'::
Show OCR Engine modes.
*--help-oem*::
Show OCR Engine modes.
'-v, --version'::
Returns the current version of the tesseract(1) executable.
*-v, --version*::
Returns the current version of the tesseract(1) executable.
'--list-langs'::
List available languages for tesseract engine. Can be used with `--tessdata-dir`.
'--print-parameters'::
Print tesseract parameters.
*--list-langs*::
List available languages for tesseract engine.
Can be used with *--tessdata-dir* 'PATH'.
*--print-parameters*::
Print tesseract parameters.
[[LANGUAGES]]
LANGUAGES AND SCRIPTS
---------------------
To recognize some text with Tesseract, it is normally necessary to specify
the language(s) or script of the text (unless it is English text which is
supported by default) using `-l lang`.
the language(s) or script(s) of the text (unless it is English text which is
supported by default) using *-l* 'LANG' or *-l* 'SCRIPT'.
Selecting a language automatically also selects the language specific
character set and dictionary (word list).
@ -153,6 +164,9 @@ In most cases, a script also supports English.
So it is possible to recognize a language that has not been specifically
trained for by using traineddata for the script it is written in.
More than one language or script may be specified by using `+`.
Example: `tesseract myimage.png myimage -l eng+deu+fra`.
https://github.com/tesseract-ocr/tessdata_fast provides fast language and
script models which are also part of Linux distributions.
@ -174,16 +188,16 @@ following languages:
*cat* (Catalan; Valencian),
*ceb* (Cebuano),
*ces* (Czech),
*chi_sim* (Chinese - Simplified),
*chi_tra* (Chinese - Traditional),
*chi_sim* (Chinese simplified),
*chi_tra* (Chinese traditional),
*chr* (Cherokee),
*cym* (Welsh),
*dan* (Danish),
*deu* (German),
*dzo* (Dzongkha),
*ell* (Greek, Modern (1453-)),
*ell* (Greek, Modern, 1453-),
*eng* (English),
*enm* (English, Middle (1100-1500)),
*enm* (English, Middle, 1100-1500),
*epo* (Esperanto),
*equ* (Math / equation detection module),
*est* (Estonian),
@ -192,10 +206,10 @@ following languages:
*fin* (Finnish),
*fra* (French),
*frk* (Frankish),
*frm* (French, Middle (ca.1400-1600)),
*frm* (French, Middle, ca.1400-1600),
*gle* (Irish),
*glg* (Galician),
*grc* (Greek, Ancient (to 1453)),
*grc* (Greek, Ancient, to 1453),
*guj* (Gujarati),
*hat* (Haitian; Haitian Creole),
*heb* (Hebrew),
@ -215,9 +229,9 @@ following languages:
*kaz* (Kazakh),
*khm* (Central Khmer),
*kir* (Kirghiz; Kyrgyz),
*kmr* (Kurdish Kurmanji),
*kor* (Korean),
*kor_vert* (Korean (vertical)),
*kmr* (Kurdish (Kurmanji)),
*kor_vert* (Korean vertical),
*kur* (Kurdish),
*lao* (Lao),
*lat* (Latin),
@ -235,7 +249,7 @@ following languages:
*nep* (Nepali),
*nld* (Dutch; Flemish),
*nor* (Norwegian),
*oci* (Occitan (post 1500)),
*oci* (Occitan post 1500),
*ori* (Oriya),
*osd* (Orientation and script detection module),
*pan* (Panjabi; Punjabi),
@ -277,51 +291,51 @@ following languages:
*yid* (Yiddish),
*yor* (Yoruba)
To use a non-standard language pack named *foo.traineddata*, set the
*TESSDATA_PREFIX* environment variable so the file can be found at
*TESSDATA_PREFIX*/tessdata/*foo*.traineddata and give Tesseract the
argument `-l foo`.
To use a non-standard language pack named `foo.traineddata`, set the
`TESSDATA_PREFIX` environment variable so the file can be found at
`TESSDATA_PREFIX/tessdata/foo.traineddata` and give Tesseract the
argument *-l* `foo`.
For Tesseract 4, `tessdata_fast` includes traineddata files for the
following scripts:
Arabic,
Armenian,
Bengali,
Canadian Aboriginal,
Cherokee,
Cyrillic,
Devanagari,
Ethiopic,
Fraktur,
Georgian,
Greek,
Gujarati,
Gurmukhi,
Han - Simplified,
Han - Simplified (vertical),
Han - Traditional,
Han - Traditional (vertical),
Hangul,
Hangul (vertical),
Hebrew,
Japanese,
Japanese (vertical),
Kannada,
Khmer,
Lao,
Latin,
Malayalam,
Myanmar,
Oriya (Odia),
Sinhala,
Syriac,
Tamil,
Telugu,
Thaana,
Thai,
Tibetan,
Vietnamese.
*Arabic*,
*Armenian*,
*Bengali*,
*Canadian_Aboriginal*,
*Cherokee*,
*Cyrillic*,
*Devanagari*,
*Ethiopic*,
*Fraktur*,
*Georgian*,
*Greek*,
*Gujarati*,
*Gurmukhi*,
*HanS* (Han simplified),
*HanS_vert* (Han simplified, vertical),
*HanT* (Han traditional),
*HanT_vert* (Han traditional, vertical),
*Hangul*,
*Hangul_vert* (Hangul vertical),
*Hebrew*,
*Japanese*,
*Japanese_vert* (Japanese vertical),
*Kannada*,
*Khmer*,
*Lao*,
*Latin*,
*Malayalam*,
*Myanmar*,
*Oriya* (Odia),
*Sinhala*,
*Syriac*,
*Tamil*,
*Telugu*,
*Thaana*,
*Thai*,
*Tibetan*,
*Vietnamese*.
The same languages and scripts are available from
https://github.com/tesseract-ocr/tessdata_best.
@ -343,8 +357,8 @@ Tesseract config files consist of lines with parameter-value pairs (space
separated). The parameters are documented as flags in the source code like
the following one in tesseractclass.h:
STRING_VAR_H(tessedit_char_blacklist, "",
"Blacklist of chars not to recognize");
`STRING_VAR_H(tessedit_char_blacklist, "",
"Blacklist of chars not to recognize");`
These parameters may enable or disable various features of the engine, and
may cause it to load (or not load) various data. For instance, let's suppose
@ -352,10 +366,10 @@ you want to OCR in English, but suppress the normal dictionary and load an
alternative word list and an alternative list of patterns -- these two files
are the most commonly used extra data files.
If your language pack is in /path/to/eng.traineddata and the hocr config
is in /path/to/configs/hocr then create three new files:
If your language pack is in '/path/to/eng.traineddata' and the hocr config
is in '/path/to/configs/hocr' then create three new files:
/path/to/eng.user-words:
'/path/to/eng.user-words':
[verse]
the
quick
@ -363,25 +377,39 @@ brown
fox
jumped
/path/to/eng.user-patterns:
'/path/to/eng.user-patterns':
[verse]
1-\d\d\d-GOOG-411
www.\n\\\*.com
/path/to/configs/bazaar:
'/path/to/configs/bazaar':
[verse]
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
user_patterns_suffix user-patterns
Now, if you pass the word 'bazaar' as a 'configfile' to Tesseract,
Tesseract will not bother loading the system dictionary nor
the dictionary of frequent words and will load and use the eng.user-words
and eng.user-patterns files you provided. The former is a simple word list,
one per line. The format of the latter is documented in dict/trie.h
on read_pattern_list().
Now, if you pass the word 'bazaar' as a <<CONFIGFILE,'CONFIGFILE'>> to
Tesseract, Tesseract will not bother loading the system dictionary nor
the dictionary of frequent words and will load and use the 'eng.user-words'
and 'eng.user-patterns' files you provided. The former is a simple word list,
one per line. The format of the latter is documented in 'dict/trie.h'
on 'read_pattern_list()'.
ENVIRONMENT VARIABLES
---------------------
*`TESSDATA_PREFIX`*::
If the `TESSDATA_PREFIX` is set to a path, then that path is used to
find the `tessdata` directory with language and script recognition
models and config files.
Using <<TESSDATADIR,*--tessdata-dir* 'PATH'>> is the recommended alternative.
*`OMP_THREAD_LIMIT`*::
If the `tesseract` executable was built with multithreading support,
it will normally use four CPU cores for the OCR process. While this
can be faster for a single image, it gives bad performance if the host
computer provides less than four CPU cores or if OCR is made for many images.
Only a single CPU core is used with `OMP_THREAD_LIMIT=1`.
HISTORY
@ -391,7 +419,7 @@ Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
changes made in 1996 to port to Windows, and some $$C++$$izing in 1998. A
lot of the code was written in C, and then some more was written in $$C++$$.
The $$C++$$ code makes heavy use of a list system using macros. This predates
stl, was portable before stl, and is more efficient than stl lists, but has
STL, was portable before STL, and is more efficient than STL lists, but has
the big negative that if you do get a segmentation violation, it is hard to
debug.
@ -399,7 +427,8 @@ Version 2.00 brought Unicode (UTF-8) support, six languages, and the ability
to train Tesseract.
Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy.
See <https://github.com/tesseract-ocr/docs/blob/master/AT-1995.pdf>. With Tesseract 2.00,
See <https://github.com/tesseract-ocr/docs/blob/master/AT-1995.pdf>.
Since Tesseract 2.00,
scripts are now included to allow anyone to reproduce some of these tests.
See <https://github.com/tesseract-ocr/tesseract/wiki/TestingTesseract> for more
details.

View File

@ -4,11 +4,6 @@
* File: blobs.cpp (Formerly blobs.c)
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 27 15:39:52 1989
* Modified: Thu Mar 28 15:33:26 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: blobs.h (Formerly blobs.h)
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 27 15:39:52 1989
* Modified: Thu Mar 28 15:33:38 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
* File: blobs.h
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: matrix.cpp (Formerly matrix.c)
* Description: Ratings matrix code. (Used by associator)
* Author: Mark Seaman, OCR Technology
* Created: Wed May 16 13:18:47 1990
* Modified: Wed Mar 20 09:44:47 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1990, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,13 +2,7 @@
********************************************************************************
*
* File: seam.cpp (Formerly seam.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Fri May 17 16:30:13 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,8 @@
/* -*-C-*-
********************************************************************************
*
* File: seam.h (Formerly seam.h)
* Description:
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Thu May 16 17:05:52 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
* File: seam.h
* Author: Mark Seaman, SW Productivity
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,13 +2,7 @@
********************************************************************************
*
* File: split.cpp (Formerly split.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Fri May 17 16:27:49 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: vecfuncs.cpp (Formerly vecfuncs.c)
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 27 15:39:52 1989
* Modified: Tue Jul 9 17:44:12 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: vecfuncs.h (Formerly vecfuncs.h)
* Description: Vector calculations
* Author: Mark Seaman, OCR Technology
* Created: Wed Dec 20 09:37:18 1989
* Modified: Tue Jul 9 17:44:37 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
* File: vecfuncs.h
* Description: Vector calculations
* Author: Mark Seaman, OCR Technology
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,7 +2,6 @@
// File: genericvector.h
// Description: Generic vector class
// Author: Daria Antonova
// Created: Mon Jun 23 11:26:43 PDT 2008
//
// (C) Copyright 2007, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,10 +4,6 @@
* File: helpers.h
* Description: General utility functions
* Author: Daria Antonova
* Created: Wed Apr 8 14:37:00 2009
* Language: C++
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 2009, Google Inc.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: cutil.h
* Description: General utility functions
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Wed Dec 5 15:40:26 1990 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: structures.cpp (Formerly structures.c)
* Description: Allocate all the different types of structures.
* Author: Mark Seaman, OCR Technology
* Created: Wed May 30 10:27:26 1990
* Modified: Mon Jul 15 10:39:18 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1990, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: dawg.cpp (Formerly dawg.c)
* Description: Use a Directed Acyclic Word Graph
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Wed Jul 24 16:59:16 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -5,11 +5,6 @@
* Description: Definition of a class that represents Directed Acyclic Word
* Graph (DAWG), functions to build and manipulate the DAWG.
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Wed Jun 19 16:50:24 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: permdawg.cpp (Formerly permdawg.c)
* Description: Scale word choices by a dictionary
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Tue Jul 9 15:43:18 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: trie.cpp (Formerly trie.c)
* Description: Functions to build a trie data structure.
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Fri Jul 26 12:18:10 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: trie.h (Formerly trie.h)
* Description: Functions to build a trie data structure.
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Fri Jul 26 11:26:34 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
* File: trie.h
* Description: Functions to build a trie data structure.
* Author: Mark Seaman, SW Productivity
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -3,7 +3,6 @@
// Description: Beam search to decode from the re-encoded CJK as a sequence of
// smaller numbers in place of a single large code.
// Author: Ray Smith
// Created: Fri Mar 13 09:39:01 PDT 2015
//
// (C) Copyright 2015, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,13 +1,8 @@
/******************************************************************************
*
* File: blkocc.h (Formerly blockocc.h)
* File: blkocc.h (Formerly blockocc.h)
* Description: Block Occupancy routines
* Author: Chris Newton
* Created: Fri Nov 8
* Modified:
* Language: C++
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1991, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -5,7 +5,6 @@
// that were found in the dictionary followed by the words
// that are ambiguous to them.
// Author: Rika Antonova
// Created: Fri Oct 21 11:26:43 PDT 2011
//
// (C) Copyright 2011, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,7 +2,6 @@
// File: wordlist2dawg.cpp
// Description: Program to generate a DAWG from a word list file
// Author: Thomas Kielbus
// Created: Thu May 10 18:11:42 PDT 2007
//
// (C) Copyright 2006, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,13 +2,7 @@
******************************************************************************
*
* File: chop.cpp (Formerly chop.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Tue Jul 30 16:41:11 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
* Author: Mark Seaman, OCR Technology
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,8 @@
/* -*-C-*-
********************************************************************************
*
* File: chop.h (Formerly chop.h)
* Description:
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Wed Jul 10 14:47:37 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
* File: chop.h
* Author: Mark Seaman, SW Productivity
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,13 +2,7 @@
********************************************************************************
*
* File: findseam.cpp (Formerly findseam.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Tue Jul 30 15:44:59 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: findseam.h (Formerly findseam.h)
* File: findseam.h
* Description:
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Thu May 16 17:05:17 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: gradechop.cpp (Formerly gradechop.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Tue Jul 30 16:06:27 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: measure.h (Formerly measure.h)
* File: measure.h
* Description: Statistics for a group of single measurements
* Author: Mark Seaman, SW Productivity
* Created: Fri Oct 16 14:37:00 1987
* Modified: Mon Apr 8 09:42:28 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: outlines.cpp (Formerly outlines.c)
* Description: Combinatorial Splitter
* Author: Mark Seaman, OCR Technology
* Created: Thu Jul 27 08:59:01 1989
* Modified: Wed Jul 10 14:56:49 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: outlines.h
* Description: Combinatorial Splitter
* Author: Mark Seaman, OCR Technology
* Created: Thu Jul 27 11:27:55 1989
* Modified: Wed May 15 17:28:47 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: pieces.cpp (Formerly pieces.c)
* Description:
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 16 14:37:00 1987
* Modified: Mon May 20 12:12:35 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Reusable Software Component
*
* (c) Copyright 1987, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: plotedges.cpp (Formerly plotedges.c)
* Description: Graphics routines for "Edges" and "Outlines" windows
* Author: Mark Seaman, OCR Technology
* Created: Fri Jul 28 13:14:48 1989
* Modified: Tue Jul 9 17:22:22 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: plotedges.h
* Description: Convert the various data type into line lists
* Author: Mark Seaman, OCR Technology
* Created: Fri Jul 28 13:14:48 1989
* Modified: Mon May 13 09:34:51 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -4,11 +4,6 @@
* File: render.cpp (Formerly render.c)
* Description: Convert the various data type into line lists
* Author: Mark Seaman, OCR Technology
* Created: Fri Jul 28 13:14:48 1989
* Modified: Mon Jul 15 10:23:37 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -1,14 +1,9 @@
/* -*-C-*-
********************************************************************************
*
* File: render.h (Formerly render.h)
* File: render.h
* Description: Convert the various data type into line lists
* Author: Mark Seaman, OCR Technology
* Created: Fri Jul 28 13:14:48 1989
* Modified: Fri Apr 26 09:59:45 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");

View File

@ -2,7 +2,6 @@
// File: segsearch.cpp
// Description: Segmentation search functions.
// Author: Daria Antonova
// Created: Mon Jun 23 11:26:43 PDT 2008
//
// (C) Copyright 2009, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");