tesseract/tessdata
Stefan Weil ecfee53bac Don't set page segmentation mode for hocr, pdf and tsv configs
Setting the page segmentation mode in those config files gives unexpected
results: the text recognized when no config or only txt is given changes
if both txt and any of hocr, pdf or tsv is chosen.

In a test set of nearly 200 pages from historical books, using
segmentation mode 1 is typically slightly better than the default,
but there are also cases where it is much worse. Therefore the user
should be able to decide which page segmentation mode is best.

Old results for hocr, pdf or tsv now need an explicit `--psm 1` for
reproduction.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 12:05:49 +02:00
..
configs Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
tessconfigs Fix file endings 2018-04-25 19:35:33 +02:00
eng.user-patterns fix issue 755; add example config files from tesseract manpage 2013-10-20 20:20:10 +00:00
eng.user-words fix issue 755; add example config files from tesseract manpage 2013-10-20 20:20:10 +00:00
Makefile.am More makefile changes to remove cube 2016-12-14 11:17:06 -08:00
pdf.ttf fix #712: Ghostscript mangling Tesseract-produced PDFs 2017-02-15 17:09:37 +01:00