tesseract/tessdata/configs
Jan Kamlah 577e8a8b93 Add PAGE XML renderer / export (#4214)
Add PAGE XML export and documentation.
To generate PAGE XML output just add 'page' to the tesseract command.

The output is outputname + '.page.xml' to avoid conflicts with ALTO export.

The output can be customized with the flags:
tessedit_create_page_polygon and tessedit_create_page_wordlevel.

Co-authored-by: Stefan Weil <sw@weilnetz.de>
2024-04-19 21:12:39 +02:00
..
alto Add support for ALTO output 2018-11-30 06:09:36 +01:00
ambigs.train Cube trained data for fra, ita, rus, spa 2012-02-02 03:08:26 +00:00
api_config fix filemode; 2014-08-14 23:37:17 +02:00
bazaar fix issue 755; add example config files from tesseract manpage 2013-10-20 20:20:10 +00:00
bigram New config for testing bigram correction. 2012-02-02 18:46:19 +00:00
box.train Removed unused parameters 2019-10-03 09:18:29 +02:00
box.train.stderr Removed unused parameters 2019-10-03 09:18:29 +02:00
digits Added Hindi traineddata 2011-03-21 21:57:08 +00:00
get.images Rename get.image config to get.images and install 2019-02-05 19:57:53 +00:00
hocr Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
inter fix filemode; 2014-08-14 23:37:17 +02:00
kannada Major internationalization improvements 2008-02-01 00:23:05 +00:00
linebox 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
logfile Added logfile config 2009-08-20 16:24:59 +00:00
lstm.train remove legacy parameter disable_character_fragments from lstm.train 2019-10-23 13:15:16 +02:00
lstmbox Add a new renderer to create box files from images for LSTM training 2019-02-05 14:03:29 +00:00
lstmdebug Add debug configuration for LSTM 2018-10-27 08:04:45 +02:00
makebox If there is no explicit renderer(s), default to TessTextRenderer 2015-12-11 19:06:49 +02:00
Makefile.am Add PAGE XML renderer / export (#4214) 2024-04-19 21:12:39 +02:00
page Add PAGE XML renderer / export (#4214) 2024-04-19 21:12:39 +02:00
pdf Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
quiet fix --enable-multiple-libraries; implement quite mode (issue 580) 2012-03-03 11:48:59 +00:00
rebox 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
strokewidth 3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process 2010-11-23 18:34:14 +00:00
tsv Don't set page segmentation mode for hocr, pdf and tsv configs 2018-10-04 12:05:49 +02:00
txt Fixed 2 errors 2022-10-06 03:53:11 -07:00
unlv set unlv_tilde_crunching to false; fixes #1449 #948 2018-10-23 09:26:32 +02:00
wordstrbox Add renderer to create WordStr box files from images 2019-02-10 19:59:17 +00:00