Setting the page segmentation mode to 6 ("Assume a single uniform block
of text") typically improves the layout detection for such texts, but
should not be done in the config file.
unlvtests/runtestset.sh adds `--psm 6` explicitly, so test results
won't change when using that script.
This is similar to commit ecfee53bac.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Revert fd429c32, 43834da7, 05de195e.
See #49, #59.
The code in this commit solves the issue in a more elegant way, IMHO.
Now you can use:
* `tesseract eurotext.tif eurotext txt pdf`
* `tesseract eurotext.tif eurotext txt hocr`
* `tesseract eurotext.tif eurotext txt hocr pdf`
NOTE:
With `tesseract eurotext.tif eurotext`
or `tesseract eurotext.tif eurotext txt`
the psm will be set to '3', but...
With `tesseract eurotext.tif eurotext txt pdf`
or `tesseract eurotext.tif eurotext txt hocr`
the psm will be set to '1'.