mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-24 02:59:07 +08:00
Minor edits to Readme
This commit is contained in:
parent
f8ebff262e
commit
a36a5f96d0
66
README.md
66
README.md
@ -1,32 +1,35 @@
|
||||
Note that this is a text-only and possibly out-of-date version of the
|
||||
wiki ReadMe, which is located at:
|
||||
|
||||
https://github.com/tesseract-ocr/tesseract/blob/master/README
|
||||
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This package contains the Tesseract Open Source OCR Engine.
|
||||
Originally developed at Hewlett Packard Laboratories Bristol and
|
||||
at Hewlett Packard Co, Greeley Colorado, all the code
|
||||
Originally developed at Hewlett-Packard Laboratories Bristol and
|
||||
at Hewlett-Packard Co, Greeley Colorado, all the code
|
||||
in this distribution is now licensed under the Apache License:
|
||||
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
||||
|
||||
Dependencies and Licenses
|
||||
=========================
|
||||
|
||||
Leptonica is required. (www.leptonica.com). Tesseract no longer compiles
|
||||
without Leptonica.
|
||||
[Leptonica](http://www.leptonica.com) is required. Tesseract no longer
|
||||
compiles without Leptonica.
|
||||
|
||||
Libtiff is no longer required as a direct dependency.
|
||||
|
||||
|
||||
@ -34,15 +37,16 @@ Installing and Running Tesseract
|
||||
--------------------------------
|
||||
|
||||
All Users Do NOT Ignore!
|
||||
|
||||
The tarballs are split into pieces.
|
||||
|
||||
tesseract-x.xx.tar.gz contains all the source code.
|
||||
|
||||
tesseract-x.xx.<lang>.tar.gz contains the language data files for <lang>.
|
||||
tesseract-x.xx.`<lang>`.tar.gz contains the language data files for `<lang>`.
|
||||
You need at least one of these or Tesseract will not work.
|
||||
|
||||
Note that tesseract-x.xx.tar.gz unpacks to the tesseract-ocr directory.
|
||||
tesseract-x.xx.<lang>.tar.gz unpacks to the tessdata directory which
|
||||
tesseract-x.xx.`<lang>`.tar.gz unpacks to the tessdata directory which
|
||||
belongs inside your tesseract-ocr directory. It is therefore best to
|
||||
download them into your tesseract-x.xx directory, so you can use unpack
|
||||
here or equivalent. You can unpack as many of the language packs as you
|
||||
@ -52,7 +56,7 @@ before you run make install. If you unpack them as root to the
|
||||
destination directory of make install, then the user ids and access
|
||||
permissions might be messed up.
|
||||
|
||||
boxtiff-2.xx.<lang>.tar.gz contains data that was used in training for
|
||||
boxtiff-2.xx.`<lang>`.tar.gz contains data that was used in training for
|
||||
those that want to do their own training. Most users should NOT download
|
||||
these files.
|
||||
|
||||
@ -63,8 +67,8 @@ Tesseract wiki https://github.com/tesseract-ocr/tesseract/wiki
|
||||
Windows
|
||||
-------
|
||||
|
||||
Please use installer (for 3.00 and above). Tesseract is library with
|
||||
command line interface. If you need GUI, please check AddOns wiki page
|
||||
Please use the installer (for 3.00 and above). Tesseract is a library with a
|
||||
command line interface. If you need a GUI, please check the AddOns wiki page.
|
||||
|
||||
TODO-UPDATE-WIKI-LINKS
|
||||
|
||||
@ -74,7 +78,7 @@ If you are building from the sources, the recommended build platform is
|
||||
VC++ Express 2008 (optionally 2010).
|
||||
|
||||
The executables are built with static linking, so they stand more chance
|
||||
of working out of the box on more windows systems.
|
||||
of working out of the box on more Windows systems.
|
||||
|
||||
The executable must reside in the same directory as the tessdata
|
||||
directory or you need to set up environment variable TESSDATA_PREFIX.
|
||||
@ -82,7 +86,7 @@ Installer will set it up for you.
|
||||
|
||||
The command line is:
|
||||
|
||||
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
|
||||
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
|
||||
|
||||
If you need interface to other applications, please check wrapper section
|
||||
on AddOns wiki page:
|
||||
@ -98,19 +102,19 @@ Non-Windows (or Cygwin)
|
||||
You have to tell Tesseract through a standard unix mechanism where to
|
||||
find its data directory. You must either:
|
||||
|
||||
./autogen.sh
|
||||
./configure
|
||||
make
|
||||
make install
|
||||
sudo ldconfig
|
||||
./autogen.sh
|
||||
./configure
|
||||
make
|
||||
make install
|
||||
sudo ldconfig
|
||||
|
||||
to move the data files to the standard place, or:
|
||||
|
||||
export TESSDATA_PREFIX="directory in which your tessdata resides/"
|
||||
export TESSDATA_PREFIX="directory in which your tessdata resides/"
|
||||
|
||||
In either case the command line is:
|
||||
|
||||
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
|
||||
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
|
||||
|
||||
New there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for
|
||||
the help.) It might work with your OS if you know how to do that.
|
||||
@ -126,8 +130,8 @@ instead of `./configure` above.
|
||||
|
||||
History
|
||||
=======
|
||||
The engine was developed at Hewlett Packard Laboratories Bristol and
|
||||
at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some
|
||||
The engine was developed at Hewlett-Packard Laboratories Bristol and
|
||||
at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some
|
||||
more changes made in 1996 to port to Windows, and some C++izing in 1998.
|
||||
A lot of the code was written in C, and then some more was written in C++.
|
||||
Since then all the code has been converted to at least compile with a C++
|
||||
@ -138,7 +142,7 @@ lists, but has the big negative that if you do get a segmentation violation,
|
||||
it is hard to debug.
|
||||
|
||||
The most recent change is that Tesseract can now recognize 39 languages,
|
||||
including Arabic, Hindi, Vietnamese, plus 3 Fraktur variants
|
||||
including Arabic, Hindi, Vietnamese, plus 3 Fraktur variants,
|
||||
is fully UTF8 capable, and is fully trainable. See TrainingTesseract for
|
||||
more information on training.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user