tesseract/CONTRIBUTING.md

# Contributing

**Please follow these rules and advice**.

## Creating an Issue or Using the Forum

If you think you found a bug in Tesseract, please create an issue.

Use the [user forum](https://groups.google.com/g/tesseract-ocr) instead of creating an issue if ...

* You have problems using Tesseract and need some help.
* You have problems installing the software.
* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) documentation.
* You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the **official** guides [[1]](https://tesseract-ocr.github.io/tessdoc/) or [[2]](https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html) found in the project documentation.
* You have a general question.

An issue should only be reported if the platform you are using is one of these:
  * Linux (but not a version that is more than 4 years old)
  * Windows (Windows 7 or newer version)
  * macOS (last 3 releases)

For older versions or other operating systems, use the Tesseract forum.

When creating an issue, please report your operating system, including its specific version: "Ubuntu 16.04", "Windows 10", "Mac OS X 10.11" etc.

Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved).

Similarly, before you post your question in the forum, search through past threads to see if similar question has been asked already.

Read the [documentation](https://tesseract-ocr.github.io/tessdoc/) before you report your issue or ask a question in the forum.

Only report an issue in the latest official release. Optionally, try to check if the issue is not already solved in the latest snapshot in the git repository.

Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.

Each version of Tesseract has its own language data you need to obtain. You **must** obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:
`tesseract --list-langs`.

Post example files to demonstrate the problem.
BUT don't post files with private info (about yourself or others).

When attaching a file to the issue report / forum ...
  * Do not post a file larger than 20 MB.
  * GitHub supports only few file name extensions like `.png` or `.txt`. If GitHub rejects your files, you can compress them using a program that can produce a zip archive and then load this zip file to GitHub.

Do not attach programs or libraries to your issues/posts.

For large files or for programs, add a link to a location where they can be downloaded (your site, Git repo, Google Drive, Dropbox etc.)

Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images.

Copy the error message from the console instead of sending a screenshot of it.

Use the toolbar above the comment edit area to format your comment.

Add three backticks before and after a code sample or output of a command to format it (The `Insert code` button can help you doing it).

If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`).

Use `Preview` before you send your issue. Read it again before sending.

Note that most of the people that respond to issues and answer questions are either other 'regular' users or **volunteers** developers. Please be nice to them :-)

The [tesseract developers](https://groups.google.com/g/tesseract-dev) forum should be used to discuss Tesseract development: bug fixes, enhancements, add-ons for Tesseract.

Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that
your question has been asked (and has been answered) many times before...

## For Developers: Creating a Pull Request

You should always make sure your changes build and run successfully.

For that, your clone needs to have all submodules (`googletest`, `test`) included. To do so, either specify `--recurse-submodules` during the initial clone, or run `git submodule update --init --recursive NAME` for each `NAME` later. If `configure` already created those directories (blocking the clone), remove them first (or `make distclean`), then clone and reconfigure.

Have a look at [the README](./README.md) and [testing README](https://github.com/tesseract-ocr/test/blob/main/README.md) and the [documentation](https://tesseract-ocr.github.io/tessdoc/Compiling-%E2%80%93-GitInstallation.html#unit-test-builds) on installation.

In short, after running `configure` from the build directory of your choice, to build the library and CLI, run `make`. To test it, run `make check`. To build the training tools, run `make training`.

As soon as your changes are building and tests are succeeding, you can publish them. If you have not already, please [fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) tesseract (somewhere) on GitHub, and push your changes to that fork (in a new branch). Then [submit as PR](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).

Please also keep track of reports from CI (automated build status) and Coverity/CodeQL (quality scan). When the indicators show deterioration after your changes, further action may be required to improve them.
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00			`# Contributing`

			`Please follow these rules and advice.`

			`## Creating an Issue or Using the Forum`

			`If you think you found a bug in Tesseract, please create an issue.`

Update URLs for Google groups Signed-off-by: Stefan Weil <sw@weilnetz.de> 2021-04-11 16:43:28 +08:00			`Use the [user forum](https://groups.google.com/g/tesseract-ocr) instead of creating an issue if ...`
Fix small code style issues (reported by Codacy) Signed-off-by: Stefan Weil <sw@weilnetz.de> 2024-06-10 12:58:42 +08:00
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00			`* You have problems using Tesseract and need some help.`
			`* You have problems installing the software.`
Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de> 2020-02-03 18:37:41 +08:00			`* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html) documentation.`
Fix broken links in CONTRIBUTING.md 2022-10-26 20:04:27 +08:00			`* You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the official guides [[1]](https://tesseract-ocr.github.io/tessdoc/) or [[2]](https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html) found in the project documentation.`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`* You have a general question.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
			`An issue should only be reported if the platform you are using is one of these:`
			`* Linux (but not a version that is more than 4 years old)`
			`* Windows (Windows 7 or newer version)`
Change Mac OS X -> macOS The official name of Apple's desktop / laptop operating system is macOS, not Mac OS X. 2017-06-03 03:50:17 +08:00			`* macOS (last 3 releases)`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
			`For older versions or other operating systems, use the Tesseract forum.`

use correct name for Mac OS X, correct link to training wiki; fix #818 2017-05-23 15:30:49 +08:00			`When creating an issue, please report your operating system, including its specific version: "Ubuntu 16.04", "Windows 10", "Mac OS X 10.11" etc.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved).`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Fix typos 2017-01-29 20:11:25 +08:00			`Similarly, before you post your question in the forum, search through past threads to see if similar question has been asked already.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de> 2020-02-03 18:37:41 +08:00			`Read the [documentation](https://tesseract-ocr.github.io/tessdoc/) before you report your issue or ask a question in the forum.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
			`Only report an issue in the latest official release. Optionally, try to check if the issue is not already solved in the latest snapshot in the git repository.`

			`Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.`

Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`Each version of Tesseract has its own language data you need to obtain. You must obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00			`tesseract --list-langs`.

Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`Post example files to demonstrate the problem.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00			`BUT don't post files with private info (about yourself or others).`

			`When attaching a file to the issue report / forum ...`
			`* Do not post a file larger than 20 MB.`
			* GitHub supports only few file name extensions like `.png` or `.txt`. If GitHub rejects your files, you can compress them using a program that can produce a zip archive and then load this zip file to GitHub.

			`Do not attach programs or libraries to your issues/posts.`

Fix typos 2017-01-29 20:11:25 +08:00			`For large files or for programs, add a link to a location where they can be downloaded (your site, Git repo, Google Drive, Dropbox etc.)`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
			`Copy the error message from the console instead of sending a screenshot of it.`

			`Use the toolbar above the comment edit area to format your comment.`

Minor formatting proposals 2018-07-26 23:00:14 +08:00			Add three backticks before and after a code sample or output of a command to format it (The `Insert code` button can help you doing it).
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`).
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Minor formatting proposals 2018-07-26 23:00:14 +08:00			Use `Preview` before you send your issue. Read it again before sending.
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
			`Note that most of the people that respond to issues and answer questions are either other 'regular' users or volunteers developers. Please be nice to them :-)`

Update URLs for Google groups Signed-off-by: Stefan Weil <sw@weilnetz.de> 2021-04-11 16:43:28 +08:00			`The [tesseract developers](https://groups.google.com/g/tesseract-dev) forum should be used to discuss Tesseract development: bug fixes, enhancements, add-ons for Tesseract.`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that`
Create CONTRIBUTING.md 2016-05-29 03:43:44 +08:00			`your question has been asked (and has been answered) many times before...`

			`## For Developers: Creating a Pull Request`

Remove space at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2019-05-17 02:47:02 +08:00			`You should always make sure your changes build and run successfully.`
added minimal description to dev rules 2019-03-10 19:41:23 +08:00
unittest: Remove dependency on absl::StripAsciiWhitespace() This removes the last dependency on Abseil, so that submodule is now removed completely. Signed-off-by: Stefan Weil <sw@weilnetz.de> 2021-08-07 02:30:42 +08:00			For that, your clone needs to have all submodules (`googletest`, `test`) included. To do so, either specify `--recurse-submodules` during the initial clone, or run `git submodule update --init --recursive NAME` for each `NAME` later. If `configure` already created those directories (blocking the clone), remove them first (or `make distclean`), then clone and reconfigure.
added minimal description to dev rules 2019-03-10 19:41:23 +08:00
Fix broken links in CONTRIBUTING.md 2022-10-26 20:04:27 +08:00			`Have a look at [the README](./README.md) and [testing README](https://github.com/tesseract-ocr/test/blob/main/README.md) and the [documentation](https://tesseract-ocr.github.io/tessdoc/Compiling-%E2%80%93-GitInstallation.html#unit-test-builds) on installation.`
added link to testing/README.md to dev rules 2019-03-14 18:54:10 +08:00
			In short, after running `configure` from the build directory of your choice, to build the library and CLI, run `make`. To test it, run `make check`. To build the training tools, run `make training`.
added minimal description to dev rules 2019-03-10 19:41:23 +08:00
Fix broken links in CONTRIBUTING.md 2022-10-26 20:04:27 +08:00			`As soon as your changes are building and tests are succeeding, you can publish them. If you have not already, please [fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) tesseract (somewhere) on GitHub, and push your changes to that fork (in a new branch). Then [submit as PR](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork).`
added minimal description to dev rules 2019-03-10 19:41:23 +08:00
Remove remaining references to deprecated LGTM (fix for #3958) Signed-off-by: Stefan Weil <sw@weilnetz.de> 2022-11-10 14:54:17 +08:00			`Please also keep track of reports from CI (automated build status) and Coverity/CodeQL (quality scan). When the indicators show deterioration after your changes, further action may be required to improve them.`