tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-11-28 05:39:35 +08:00

Author	SHA1	Message	Date
Tom Morris	6700edd8bc	Cleanup TSV renderer Remove all references to hocr, hocr.tsv, etc. Remove dead code for font info, input filename, HTML escapes. Improved comments. Fixed indentation.	2016-03-01 13:41:19 -05:00
Sundar M. Vaidya	738fe4f757	Adds BoolParam tessedit_create_hocrtsv in class Tesseract.	2016-03-01 12:30:39 -05:00
amitdo	c2f5e9b849	If there is no explicit renderer(s), default to TessTextRenderer Revert `fd429c32`, `43834da7`, `05de195e`. See #49, #59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.	2015-12-11 19:06:49 +02:00
Stefan Weil	318b88daa6	ccmain: Fix typos in comments and strings Most of them were found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-09-14 21:59:16 +02:00
Zdenko Podobný	41478fd5a1	implement build without cube (-DNO_CUBE_BUILD)	2015-07-24 11:51:44 +02:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	36883b4faf	preserve interword spaces patch - Issue 1409	2015-01-27 22:58:04 +01:00
Ray Smith	f927728169	Fixed issue 1207	2014-10-09 13:28:03 -07:00
Zdenko Podobný	d0cb1071b2	remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285)	2014-10-07 23:37:34 +02:00
Ray Smith	55d11ad3c2	Moved params from global in page layout to tesseractclass, improved single column layout analysis	2014-10-07 09:31:00 -07:00
Zdenko Podobný	9e8629d9ef	allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ )	2014-09-19 23:33:47 +02:00
Zdenko Podobný	ff87944171	fix typo	2014-09-07 18:23:47 +02:00
Zdenko Podobný	d1aa61c110	fix issue 1285: reimplement option to select pdf compression	2014-09-06 09:32:22 +02:00
Ray Smith	09b439b05a	Fixed issue 1241, but disabled due to making accuracy worse	2014-08-13 13:33:10 -07:00
theraysmith@gmail.com	dbf6197471	Major refactor of control.cpp to enable line recognition git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-11 23:23:06 +00:00
zdenop	6941bffbd2	fix typo git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1135 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-09 17:53:57 +00:00
zdenop	bce2cd5f33	enable to select pdf compression type and jpeg quality (fix issue 1263) git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1134 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-08 21:18:44 +00:00
zdenop	1156098567	Add font info to hocr output - fix issue 1219 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-03 16:22:12 +00:00
theraysmith@gmail.com	d2ad450502	Added PDF renderer git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@957 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-01-09 17:47:34 +00:00
theraysmith@gmail.com	7ec4fd7a56	Refactorerd control functions to enable parallel blob classification git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-11-08 20:30:56 +00:00
theraysmith@gmail.com	2aafc9df24	Improved sub/superscript treatment git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@872 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-09-20 19:49:47 +00:00
theraysmith@gmail.com	3a998fe7ac	Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-02 02:59:49 +00:00
zdenop@gmail.com	da41b96f7f	removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2011-08-30 06:34:41 +00:00
theraysmith	3e8c0bc228	Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2011-03-21 21:44:05 +00:00
theraysmith	c8465252e4	Rewrite of DENORM git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@538 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-30 01:05:48 +00:00
zdenop@gmail.com	4523ce9f7d	3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-23 18:34:14 +00:00
theraysmith	96e8b51feb	More changes to ccmain for 3.00 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@287 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-07-11 02:07:25 +00:00

29 Commits