tesseract/tessdata/configs
Stefan Weil ecfee53bac Don't set page segmentation mode for hocr, pdf and tsv configs
Setting the page segmentation mode in those config files gives unexpected
results: the text recognized when no config or only txt is given changes
if both txt and any of hocr, pdf or tsv is chosen.

In a test set of nearly 200 pages from historical books, using
segmentation mode 1 is typically slightly better than the default,
but there are also cases where it is much worse. Therefore the user
should be able to decide which page segmentation mode is best.

Old results for hocr, pdf or tsv now need an explicit `--psm 1` for
reproduction.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 12:05:49 +02:00
..
ambigs.train Cube trained data for fra, ita, rus, spa 2012-02-02 03:08:26 +00:00
api_config fix filemode; 2014-08-14 23:37:17 +02:00
bazaar fix issue 755; add example config files from tesseract manpage 2013-10-20 20:20:10 +00:00
bigram New config for testing bigram correction. 2012-02-02 18:46:19 +00:00
box.train Fixes #64 - tessedit_create_txt 0 blocks box training 2015-07-25 22:49:55 +02:00
box.train.stderr Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
digits Added Hindi traineddata 2011-03-21 21:57:08 +00:00
get.image Fix file endings 2018-04-25 19:35:33 +02:00
hocr Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
inter fix filemode; 2014-08-14 23:37:17 +02:00
kannada Major internationalization improvements 2008-02-01 00:23:05 +00:00
linebox 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
logfile Added logfile config 2009-08-20 16:24:59 +00:00
lstm.train Remove execute permission from config file (#1263) 2018-01-10 16:43:02 +01:00
makebox If there is no explicit renderer(s), default to TessTextRenderer 2015-12-11 19:06:49 +02:00
Makefile.am Update Makefile.am (add 'lstm.train') 2017-04-02 17:06:12 +09:00
pdf Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
quiet fix --enable-multiple-libraries; implement quite mode (issue 580) 2012-03-03 11:48:59 +00:00
rebox 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
strokewidth 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
tsv Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
txt If there is no explicit renderer(s), default to TessTextRenderer 2015-12-11 19:06:49 +02:00
unlv If there is no explicit renderer(s), default to TessTextRenderer 2015-12-11 19:06:49 +02:00