diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b28e57e7..d2e3e678 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,8 +9,8 @@ If you think you found a bug in Tesseract, please create an issue. Use the [users mailing-list](https://groups.google.com/d/forum/tesseract-ocr) instead of creating an Issue if ... * You have problems using Tesseract and need some help. * You have problems installing the software. -* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) wiki page. -* You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the **official** guides [[1]](https://github.com/tesseract-ocr/tesseract/wiki) or [[2]](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) found in the project wiki. +* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) documentation. +* You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the **official** guides [[1]](https://tesseract-ocr.github.io/tessdoc/) or [[2]](https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html) found in the project documentation. * You have a general question. An issue should only be reported if the platform you are using is one of these: @@ -26,7 +26,7 @@ Search through open and closed issues to see if similar issue has been reported Similarly, before you post your question in the forum, search through past threads to see if similar question has been asked already. -Read the [wiki](https://github.com/tesseract-ocr/tesseract/wiki) before you report your issue or ask a question in the forum. +Read the [documentation](https://tesseract-ocr.github.io/tessdoc/) before you report your issue or ask a question in the forum. Only report an issue in the latest official release. Optionally, try to check if the issue is not already solved in the latest snapshot in the git repository. @@ -71,7 +71,7 @@ You should always make sure your changes build and run successfully. For that, your clone needs to have all submodules (`abseil`, `googletest`, `test`) included. To do so, either specify `--recurse-submodules` during the initial clone, or run `git submodule update --init --recursive NAME` for each `NAME` later. If `configure` already created those directories (blocking the clone), remove them first (or `make distclean`), then clone and reconfigure. -Have a look at [the README](./README.md) and [testing README](./test/testing/README.md) and the [wiki page](https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#unit-test-builds) on installation. +Have a look at [the README](./README.md) and [testing README](./test/testing/README.md) and the [documentation](https://tesseract-ocr.github.io/tessdoc/Compiling-%E2%80%93-GitInstallation.html#unit-test-builds) on installation. In short, after running `configure` from the build directory of your choice, to build the library and CLI, run `make`. To test it, run `make check`. To build the training tools, run `make training`. diff --git a/ChangeLog b/ChangeLog index c6c3a5eb..df755184 100644 --- a/ChangeLog +++ b/ChangeLog @@ -30,7 +30,7 @@ * Enabled OpenMP support. * Parameter unlv_tilde_crunching change to false. * Miscellaneous Fixes. - * Detailed Changelog can be found at https://github.com/tesseract-ocr/tesseract/wiki/4.0x-Changelog and https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400 + * Detailed Changelog can be found at https://tesseract-ocr.github.io/tessdoc/4.0x-Changelog.html and https://tesseract-ocr.github.io/tessdoc/ReleaseNotes.html#tesseract-release-notes-oct-29-2018---v400 2017-02-16 - V3.05.00 * Made some fine tuning to the hOCR output. diff --git a/INSTALL.GIT.md b/INSTALL.GIT.md index 50c98075..dc01004c 100644 --- a/INSTALL.GIT.md +++ b/INSTALL.GIT.md @@ -48,7 +48,7 @@ Just run: $ make ScrollView.jar -and follow the instruction on [Viewer Debugging wiki](https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging). +and follow the instruction on [Viewer Debugging](https://tesseract-ocr.github.io/tessdoc/ViewerDebugging.html). # CMAKE @@ -64,4 +64,4 @@ There is alternative build system based on multiplatform [cmake](https://cmake.o ## WINDOWS -See [Wiki](https://github.com/tesseract-ocr/tesseract/wiki) for more information on this. +See the [documentation](https://tesseract-ocr.github.io/tessdoc/) for more information on this. diff --git a/README.md b/README.md index 0e5c1c19..bf4a1c18 100644 --- a/README.md +++ b/README.md @@ -29,11 +29,14 @@ Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 lan Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output. -You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract. +You should note that in many cases, in order to get better OCR results, +you'll need to **[improve the quality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) of the image** you are giving Tesseract. -This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. +This project **does not include a GUI application**. +If you need one, please see the [3rdParty](https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html) documentation. -Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. +Tesseract **can be trained to recognize other languages**. +See [Tesseract Training](https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html) for more information. ## Brief history @@ -45,15 +48,18 @@ In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google. The latest (LSTM based) stable version is **[4.1.1](https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1)**, released on December 26, 2019. Latest source code is available from [master branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/master). Open issues can be found in [issue tracker](https://github.com/tesseract-ocr/tesseract/issues), -and [Planning wiki](https://github.com/tesseract-ocr/tesseract/wiki/Planning). +and [planning documentation](https://tesseract-ocr.github.io/tessdoc/Planning.html). -The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). There is no development for this version, but it can be used for special cases (e.g. see [Regression of features from 3.0x](https://github.com/tesseract-ocr/tesseract/wiki/Planning#regression-of-features-from-30x)). +The latest 3.0x version is **[3.05.02](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.02)**, released on June 19, 2018. Latest source code for 3.05 is available from [3.05 branch on GitHub](https://github.com/tesseract-ocr/tesseract/tree/3.05). +There is no development for this version, but it can be used for special cases (e.g. see [Regression of features from 3.0x](https://tesseract-ocr.github.io/tessdoc/Planning.html#regression-of-features-from-30x)). -See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases. +See **[Release Notes](https://tesseract-ocr.github.io/tessdoc/ReleaseNotes.html)** +and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases. ## Installing Tesseract -You can either [Install Tesseract via pre-built binary package](https://github.com/tesseract-ocr/tesseract/wiki) or [build it from source](https://github.com/tesseract-ocr/tesseract/wiki/Compiling). +You can either [Install Tesseract via pre-built binary package](https://tesseract-ocr.github.io/tessdoc/) +or [build it from source](https://tesseract-ocr.github.io/tessdoc/Compiling.html). Supported Compilers are: @@ -65,20 +71,20 @@ Other compilers might work, but are not officially supported. ## Running Tesseract -Basic **[command line usage](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)**: +Basic **[command line usage](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html)**: tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] For more information about the various command line options use `tesseract --help` or `man tesseract`. -Examples can be found in the [wiki](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#simplest-invocation-to-ocr-an-image). +Examples can be found in the [documentatin](https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simplest-invocation-to-ocr-an-image). ## For developers Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the -[wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. +[wrapper](https://tesseract-ocr.github.io/tessdoc/AddOns.html#tesseract-wrappers) section in the AddOns documentation. Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](https://tesseract-ocr.github.io/). @@ -86,7 +92,9 @@ Documentation of Tesseract generated from source code by doxygen can be found on Before you submit an issue, please review **[the guidelines for this repository](https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md)**. -For support, first read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists. +For support, first read the [documentation](https://tesseract-ocr.github.io/tessdoc/), +particularly the [FAQ](https://tesseract-ocr.github.io/tessdoc/FAQ.html) to see if your problem is addressed there. +If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists. Mailing-lists: * [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users. diff --git a/doc/cntraining.1.asc b/doc/cntraining.1.asc index ef98112e..0bbeb95f 100644 --- a/doc/cntraining.1.asc +++ b/doc/cntraining.1.asc @@ -24,7 +24,7 @@ SEE ALSO -------- tesseract(1), shapeclustering(1), mftraining(1) - + COPYING ------- diff --git a/doc/combine_lang_model.1.asc b/doc/combine_lang_model.1.asc index cbe4b925..fc64b219 100644 --- a/doc/combine_lang_model.1.asc +++ b/doc/combine_lang_model.1.asc @@ -54,7 +54,7 @@ combine_lang_model(1) was first made available for tesseract4.00.00alpha. RESOURCES --------- Main web site: + -Information on training tesseract LSTM: +Information on training tesseract LSTM: SEE ALSO -------- diff --git a/doc/combine_tessdata.1.asc b/doc/combine_tessdata.1.asc index 04d2487f..8bdefd04 100644 --- a/doc/combine_tessdata.1.asc +++ b/doc/combine_tessdata.1.asc @@ -83,9 +83,9 @@ COMPONENTS The components in a Tesseract lang.traineddata file as of Tesseract 4.0 are briefly described below; For more information on many of these files, see - + and - + lang.config:: (Optional) Language-specific overrides to default config variables. diff --git a/doc/dawg2wordlist.1.asc b/doc/dawg2wordlist.1.asc index 93594d61..cbe18d89 100644 --- a/doc/dawg2wordlist.1.asc +++ b/doc/dawg2wordlist.1.asc @@ -32,7 +32,7 @@ SEE ALSO tesseract(1), mftraining(1), wordlist2dawg(1), unicharset(5), combine_tessdata(1) - + COPYING ------- diff --git a/doc/lstmeval.1.asc b/doc/lstmeval.1.asc index 202d2401..94c9dcb6 100644 --- a/doc/lstmeval.1.asc +++ b/doc/lstmeval.1.asc @@ -38,7 +38,7 @@ lstmeval(1) was first made available for tesseract4.00.00alpha. RESOURCES --------- Main web site: + -Information on training tesseract LSTM: +Information on training tesseract LSTM: SEE ALSO -------- diff --git a/doc/lstmtraining.1.asc b/doc/lstmtraining.1.asc index ea47e31e..a5e77678 100644 --- a/doc/lstmtraining.1.asc +++ b/doc/lstmtraining.1.asc @@ -19,7 +19,8 @@ SYNOPSIS DESCRIPTION ----------- -lstmtraining(1) trains LSTM-based networks using a list of lstmf files and starter traineddata file as the main input. Training from scratch is not recommended to be done by users. Finetuning (example command shown in synopsis above) or replacing a layer options can be used instead. Different options apply to different types of training. Read [Training Wiki page](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00) for details. +lstmtraining(1) trains LSTM-based networks using a list of lstmf files and starter traineddata file as the main input. Training from scratch is not recommended to be done by users. Finetuning (example command shown in synopsis above) or replacing a layer options can be used instead. Different options apply to different types of training. +Read the [training documentation](https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html) for details. OPTIONS ------- @@ -100,7 +101,7 @@ lstmtraining(1) was first made available for tesseract4.00.00alpha. RESOURCES --------- Main web site: + -Information on training tesseract LSTM: +Information on training tesseract LSTM: SEE ALSO -------- diff --git a/doc/merge_unicharsets.1.asc b/doc/merge_unicharsets.1.asc index 5e4d1112..cb3e1833 100644 --- a/doc/merge_unicharsets.1.asc +++ b/doc/merge_unicharsets.1.asc @@ -34,7 +34,7 @@ merge_unicharsets(1) was first made available for tesseract4.00.00alpha. RESOURCES --------- Main web site: + -Information on training tesseract LSTM: +Information on training tesseract LSTM: SEE ALSO -------- diff --git a/doc/mftraining.1.asc b/doc/mftraining.1.asc index 43fe533a..c80b8626 100644 --- a/doc/mftraining.1.asc +++ b/doc/mftraining.1.asc @@ -43,7 +43,7 @@ SEE ALSO tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1), shapeclustering(1), unicharset(5) - + COPYING ------- diff --git a/doc/set_unicharset_properties.1.asc b/doc/set_unicharset_properties.1.asc index e86911a5..793631e7 100644 --- a/doc/set_unicharset_properties.1.asc +++ b/doc/set_unicharset_properties.1.asc @@ -33,7 +33,7 @@ set_unicharset_properties(1) was first made available for tesseract version 3.03 RESOURCES --------- Main web site: + -Information on training: +Information on training: SEE ALSO -------- diff --git a/doc/shapeclustering.1.asc b/doc/shapeclustering.1.asc index 0a1bfb03..23d2df23 100644 --- a/doc/shapeclustering.1.asc +++ b/doc/shapeclustering.1.asc @@ -46,7 +46,7 @@ SEE ALSO tesseract(1), cntraining(1), unicharset_extractor(1), combine_tessdata(1), unicharset(5) - + COPYING ------- diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index d8dc479d..55bdc9ab 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -430,7 +430,7 @@ Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy. See . Since Tesseract 2.00, scripts are now included to allow anyone to reproduce some of these tests. -See for more +See for more details. Tesseract 3.00 added a number of new languages, including Chinese, Japanese, @@ -447,16 +447,16 @@ Tesseract 3 is enabled by `--oem 0`. This also needs traineddata files which support the legacy engine, for example those from the tessdata repository (https://github.com/tesseract-ocr/tessdata). -For further details, see the release notes in the Tesseract wiki -(). +For further details, see the release notes in the Tesseract documentation +(). RESOURCES --------- Main web site: + User forum: + -Wiki: + -Information on training: +Documentation: + +Information on training: SEE ALSO -------- diff --git a/doc/text2image.1.asc b/doc/text2image.1.asc index 2a689b5f..46c53b64 100644 --- a/doc/text2image.1.asc +++ b/doc/text2image.1.asc @@ -151,7 +151,7 @@ text2image(1) was first made available for tesseract 3.03. RESOURCES --------- Main web site: + -Information on training tesseract LSTM: +Information on training tesseract LSTM: SEE ALSO -------- diff --git a/doc/unicharambigs.5.asc b/doc/unicharambigs.5.asc index 6981128f..4ff2bd16 100644 --- a/doc/unicharambigs.5.asc +++ b/doc/unicharambigs.5.asc @@ -81,7 +81,7 @@ letters in the unicharset. SEE ALSO -------- tesseract(1), unicharset(5) -https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05#the-unicharambigs-file +https://tesseract-ocr.github.io/tessdoc/Training-Tesseract-3.03%E2%80%933.05.html#the-unicharambigs-file AUTHOR ------ diff --git a/doc/unicharset.5.asc b/doc/unicharset.5.asc index 5b859daa..0a9a48f5 100644 --- a/doc/unicharset.5.asc +++ b/doc/unicharset.5.asc @@ -124,7 +124,7 @@ SEE ALSO -------- tesseract(1), combine_tessdata(1), unicharset_extractor(1) - + AUTHOR diff --git a/doc/unicharset_extractor.1.asc b/doc/unicharset_extractor.1.asc index 2918350c..b2ec55a3 100644 --- a/doc/unicharset_extractor.1.asc +++ b/doc/unicharset_extractor.1.asc @@ -30,7 +30,7 @@ SEE ALSO -------- tesseract(1), unicharset(5) - + HISTORY ------- diff --git a/doc/wordlist2dawg.1.asc b/doc/wordlist2dawg.1.asc index b4f84ad5..ad3bcbc3 100644 --- a/doc/wordlist2dawg.1.asc +++ b/doc/wordlist2dawg.1.asc @@ -56,7 +56,7 @@ SEE ALSO -------- tesseract(1), combine_tessdata(1), dawg2wordlist(1) - + COPYING ------- diff --git a/src/training/lstmtraining.cpp b/src/training/lstmtraining.cpp index 6196f559..407c8e83 100644 --- a/src/training/lstmtraining.cpp +++ b/src/training/lstmtraining.cpp @@ -79,7 +79,7 @@ int main(int argc, char **argv) { return EXIT_FAILURE; } if (FLAGS_traineddata.empty()) { - tprintf("Must provide a --traineddata see training wiki\n"); + tprintf("Must provide a --traineddata see training documentation\n"); return EXIT_FAILURE; } diff --git a/src/training/tesstrain.py b/src/training/tesstrain.py index 357e1afc..e827c981 100755 --- a/src/training/tesstrain.py +++ b/src/training/tesstrain.py @@ -14,7 +14,7 @@ # # This script provides an easy way to execute various phases of training # Tesseract. For a detailed description of the phases, see -# https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract +# https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html. import logging import os diff --git a/src/training/tesstrain.sh b/src/training/tesstrain.sh index fec155b6..2fe39d0c 100755 --- a/src/training/tesstrain.sh +++ b/src/training/tesstrain.sh @@ -12,7 +12,7 @@ # # This script provides an easy way to execute various phases of training # Tesseract. For a detailed description of the phases, see -# https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract +# https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html. # display_usage() { diff --git a/src/training/tesstrain_utils.py b/src/training/tesstrain_utils.py index 1877fc92..a88c0997 100644 --- a/src/training/tesstrain_utils.py +++ b/src/training/tesstrain_utils.py @@ -11,7 +11,7 @@ # limitations under the License. # # For a detailed description of the phases, see -# https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract +# https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html. # import argparse diff --git a/src/training/tesstrain_utils.sh b/src/training/tesstrain_utils.sh index 2735bfa2..f5080e43 100644 --- a/src/training/tesstrain_utils.sh +++ b/src/training/tesstrain_utils.sh @@ -12,7 +12,7 @@ # # This script defines functions that are used by tesstrain.sh # For a detailed description of the phases, see -# https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract +# https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html. # # USAGE: source tesstrain_utils.sh