mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2025-07-20 19:16:19 +08:00
use github for image urls
parent
7cb006a569
commit
6420fc74fc
@ -16,28 +16,28 @@ You can see how Tesseract has processed the image by using the [configuration va
|
||||
|
||||
### Binarisation
|
||||
|
||||

|
||||

|
||||
|
||||
This is converting an image to black and white. Tesseract does this internally, but it can make mistakes, particularly if the page background is of uneven darkness.
|
||||
|
||||
|
||||
### Noise
|
||||
|
||||

|
||||

|
||||
|
||||
Noise is random variation of brightness or colour in an image, that can make the text of the image more difficult to read. Certain types of noise cannot be removed by Tesseract in the binarisation step, which can cause accuracy rates to drop.
|
||||
|
||||
|
||||
### Orientation / Skew
|
||||
|
||||

|
||||

|
||||
|
||||
This is when an page has been scanned when not straight. The quality of Tesseract's line segmentation reduces significantly if a page is too skewed, which severely impacts the quality of the OCR. To address this rotating the page image so that the text lines are horizontal.
|
||||
|
||||
|
||||
### Borders
|
||||
|
||||

|
||||

|
||||
|
||||
Scanned pages often have dark borders around them. These can be erroneously picked up as extra characters, especially if they vary in shape and gradation.
|
||||
|
||||
|
@ -39,7 +39,7 @@ tesseract phototest.tif test1 segdemo inter
|
||||
|
||||
You should see something like this:
|
||||
|
||||

|
||||

|
||||
|
||||
The words found in the image are represented as blue rectangles. There are 3 menus:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user