tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-11-30 23:49:05 +08:00

Author	SHA1	Message	Date
Stefan Weil	ecfee53bac	Don't set page segmentation mode for hocr, pdf and tsv configs Setting the page segmentation mode in those config files gives unexpected results: the text recognized when no config or only txt is given changes if both txt and any of hocr, pdf or tsv is chosen. In a test set of nearly 200 pages from historical books, using segmentation mode 1 is typically slightly better than the default, but there are also cases where it is much worse. Therefore the user should be able to decide which page segmentation mode is best. Old results for hocr, pdf or tsv now need an explicit `--psm 1` for reproduction. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 12:05:49 +02:00
Tom Morris	e3e1fe0e20	Document hocr_font_info in config	2016-02-14 16:49:00 -05:00
amitdo	c2f5e9b849	If there is no explicit renderer(s), default to TessTextRenderer Revert `fd429c32`, `43834da7`, `05de195e`. See #49, #59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.	2015-12-11 19:06:49 +02:00
Jim O'Regan	43834da7a2	disable text creation when creating hOCR (issue #49 )	2015-07-18 08:56:21 +01:00
theraysmith@gmail.com	91d2265429	More minor fixes from issues and cleanup git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@974 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-01-10 01:38:00 +00:00
zdenop@gmail.com	fa4d4589cb	fixed hocr (escape special special characters; thank to aizvorski) + hocr config) git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@515 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-10-29 19:03:06 +00:00

6 Commits