tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-11-30 23:49:05 +08:00

Author	SHA1	Message	Date
Stefan Weil	18c8f8833f	Remove deprecated parameters (#1418 ) They were deprecated nearly 3 years ago in commit `0e868ef377`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-03-25 17:35:29 +02:00
Stefan Weil	023e1b340e	Use POSIX data types and macros (#878 ) * api: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * cutil: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * training: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract data types by POSIX data types Now all Tesseract data types which are no longer needed can be removed from ccutil/host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * lstm: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Remove the macros which are now unused from ccutil/host.h. Remove also the obsolete history comments. Signed-off-by: Stefan Weil <sw@weilnetz.de> * Fix build error caused by ambiguous ClipToRange Error message vom Appveyor CI: C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj] C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj] c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or 'int' Signed-off-by: Stefan Weil <sw@weilnetz.de> * unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-03-13 21:36:30 +01:00
Stefan Weil	aa6eb6bd46	Remove Tesseract parameter "include_page_breaks" and use FF by default Now Tesseract adds a page break (normally form feed) by default. It is still possible to suppress page breaks by setting an empty page_separator. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2017-09-19 07:34:32 +02:00
jm	2a77d5ad69	returns the correct dictionary if lstm only used	2017-09-14 13:03:22 +02:00
Ray Smith	cec1037260	Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original	2017-07-19 16:28:26 -07:00
Ray Smith	7a116ce8bb	More formatting fixes from clang tidy	2017-04-28 13:38:32 -07:00
Ray Smith	1cc511188d	Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here.	2017-04-27 15:48:23 -07:00
Jeff Breidenbach	bd45b3ae4f	fix #537 : Error in pixClone: pixs not defined	2017-01-29 16:59:52 +01:00
Ray Smith	f566a45b30	clang-tidy changes from sync	2017-01-25 16:20:19 -08:00
Ray Smith	b453f74e01	Fixed issue #633 (multi-language mode	2017-01-25 15:58:39 -08:00
Wikinaut	c03299e2b4	Improve textonly_pdf parameter description	2017-01-21 16:18:53 +01:00
Zdenko Podobný	effa5741e6	Implement invisible text only for PDF	2017-01-20 21:26:34 +01:00
Wikinaut	f06ef543fc	typo correction "specific"	2017-01-13 04:24:16 +01:00
Simon Strandgaard	d38cffc332	Fixed typo	2016-12-15 14:58:53 +00:00
zdenop	da4c064c2e	Merge pull request #531 from stweil/guards Fix header file guards and replace reserved identifiers	2016-12-15 08:29:32 +01:00
Ray Smith	9f5ba9105f	Removed dependency on cube from the code	2016-12-14 10:55:15 -08:00
Ray Smith	13e46ae1c4	Made LSTM the default engine, pushed cube out	2016-12-13 14:37:40 -08:00
Ray Smith	5deebe6c27	Fixed multilang for LSTM, pushed cube to one side without actually deleting it	2016-12-05 14:41:43 -08:00
Stefan Weil	4897796d57	Replace reserved identifiers used in #define guards header files Use macro names as suggested by the Google C++ Style Guide (https://google.github.io/styleguide/cppguide.html#The__define_Guard). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2016-12-04 15:43:03 +01:00
Ray Smith	c1c1e426b3	Added new LSTM-based neural network line recognizer	2016-11-07 15:38:07 -08:00
Ray Smith	2c837dffc3	Result of clang tidy on recent merge	2016-11-07 10:46:33 -08:00
Tom Morris	6700edd8bc	Cleanup TSV renderer Remove all references to hocr, hocr.tsv, etc. Remove dead code for font info, input filename, HTML escapes. Improved comments. Fixed indentation.	2016-03-01 13:41:19 -05:00
Sundar M. Vaidya	738fe4f757	Adds BoolParam tessedit_create_hocrtsv in class Tesseract.	2016-03-01 12:30:39 -05:00
amitdo	c2f5e9b849	If there is no explicit renderer(s), default to TessTextRenderer Revert `fd429c32`, `43834da7`, `05de195e`. See #49, #59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.	2015-12-11 19:06:49 +02:00
Stefan Weil	318b88daa6	ccmain: Fix typos in comments and strings Most of them were found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-09-14 21:59:16 +02:00
Zdenko Podobný	41478fd5a1	implement build without cube (-DNO_CUBE_BUILD)	2015-07-24 11:51:44 +02:00
Ray Smith	78b5e1a77d	Fixed occurrence of small rotated blocks in loosely spaced text	2015-06-12 11:05:00 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	36883b4faf	preserve interword spaces patch - Issue 1409	2015-01-27 22:58:04 +01:00
Ray Smith	f927728169	Fixed issue 1207	2014-10-09 13:28:03 -07:00
Zdenko Podobný	d0cb1071b2	remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285)	2014-10-07 23:37:34 +02:00
Ray Smith	55d11ad3c2	Moved params from global in page layout to tesseractclass, improved single column layout analysis	2014-10-07 09:31:00 -07:00
Zdenko Podobný	9e8629d9ef	allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ )	2014-09-19 23:33:47 +02:00
Ray Smith	2f197cd653	Fixed issues 899/1220/1246 (mixed eng+ara)	2014-09-17 18:27:49 -07:00
Zdenko Podobný	ff87944171	fix typo	2014-09-07 18:23:47 +02:00
Zdenko Podobný	d1aa61c110	fix issue 1285: reimplement option to select pdf compression	2014-09-06 09:32:22 +02:00
Ray Smith	09b439b05a	Fixed issue 1241, but disabled due to making accuracy worse	2014-08-13 13:33:10 -07:00
theraysmith@gmail.com	dbf6197471	Major refactor of control.cpp to enable line recognition git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-11 23:23:06 +00:00
zdenop	6941bffbd2	fix typo git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1135 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-09 17:53:57 +00:00
zdenop	bce2cd5f33	enable to select pdf compression type and jpeg quality (fix issue 1263) git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1134 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-08 21:18:44 +00:00
zdenop	1156098567	Add font info to hocr output - fix issue 1219 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-03 16:22:12 +00:00
theraysmith@gmail.com	8364f24f4b	Added ability for box files to store spaces and newlines git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1060 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-04-23 22:52:05 +00:00
zdenop	790a3da22f	remove 'class IMAGE;' git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1045 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-02-03 23:32:23 +00:00
theraysmith@gmail.com	d2ad450502	Added PDF renderer git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@957 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-01-09 17:47:34 +00:00
theraysmith@gmail.com	7ec4fd7a56	Refactorerd control functions to enable parallel blob classification git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-11-08 20:30:56 +00:00
theraysmith@gmail.com	2aafc9df24	Improved sub/superscript treatment git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@872 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-09-20 19:49:47 +00:00
theraysmith@gmail.com	64c739c8af	Added sparse text mode, also fixed issue 653. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@820 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-01-03 19:06:41 +00:00

1 2

57 Commits