tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-12 07:29:07 +08:00

Author	SHA1	Message	Date
Stefan Weil	5b4ce2431d	ccmain: Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-04-22 17:42:34 +02:00
Noah Metzger	34efcd40be	Fixed a resource leak detected by Coverity Replaced the inheritance relation of BLOCK and PDBLK by a member relation. This avoids the necessity of a virtual destructor in PDBLK for the occuring upcasts. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-04-19 13:55:39 +02:00
Stefan Weil	023e1b340e	Use POSIX data types and macros (#878 ) * api: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * cutil: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * training: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract data types by POSIX data types Now all Tesseract data types which are no longer needed can be removed from ccutil/host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * lstm: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Remove the macros which are now unused from ccutil/host.h. Remove also the obsolete history comments. Signed-off-by: Stefan Weil <sw@weilnetz.de> * Fix build error caused by ambiguous ClipToRange Error message vom Appveyor CI: C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj] C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj] c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or 'int' Signed-off-by: Stefan Weil <sw@weilnetz.de> * unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-03-13 21:36:30 +01:00
Ray Smith	da03e4e910	Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion	2017-07-14 09:30:14 -07:00
Stefan Weil	5cc8c058fa	ccmain: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de>	2017-05-02 18:21:51 +02:00
Ray Smith	b453f74e01	Fixed issue #633 (multi-language mode	2017-01-25 15:58:39 -08:00
Ray Smith	9f5ba9105f	Removed dependency on cube from the code	2016-12-14 10:55:15 -08:00
Ray Smith	5deebe6c27	Fixed multilang for LSTM, pushed cube to one side without actually deleting it	2016-12-05 14:41:43 -08:00
Ray Smith	c1c1e426b3	Added new LSTM-based neural network line recognizer	2016-11-07 15:38:07 -08:00
Ray Smith	2c837dffc3	Result of clang tidy on recent merge	2016-11-07 10:46:33 -08:00
Stefan Weil	a5b61e2b35	ccmain: Remove unused constants In osdetect.cpp, a local definition of kMinCredibleResolution was identical to a global one, so the local one could be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2016-09-06 21:49:27 +02:00
Zdenko Podobný	1db94823a9	Add info for progress monitor, make it visible in doxygen doc; remove commented code	2016-01-05 17:21:53 +01:00
zdenop	c53add706e	Merge pull request #27 from tesseract-ocr/monitor Monitor	2016-01-05 16:28:42 +01:00
Stefan Weil	c714330d2f	ccmain: Remove unused local variables Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-11-04 09:44:52 +01:00
Stefan Weil	318b88daa6	ccmain: Fix typos in comments and strings Most of them were found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-09-14 21:59:16 +02:00
Zdenko Podobný	41478fd5a1	implement build without cube (-DNO_CUBE_BUILD)	2015-07-24 11:51:44 +02:00
Jim O'Regan	524a61452d	Doxygen Squashed commit from https://github.com/tesseract-ocr/tesseract/tree/more-doxygen closes #14 Commits: `6317305` doxygen `9f42f69` doxygen `0fc4d52` doxygen `37b4b55` fix typo `bded8f1` some more doxy `020eb00` slight tweak `524666d` doxygenify `2a36a3e` doxygenify `229d218` doxygenify `7fd28ae` doxygenify `a8c64bc` doxygenify `f5d21b6` fix `5d8ede8` doxygenify `a58a4e0` language_model.cpp `fa85709` lm_pain_points.cpp lm_state.cpp `6418da3` merge `06190ba` Merge branch 'old_doxygen_merge' into more-doxygen `84acf08` Merge branch 'master' into more-doxygen `50fe1ff` pagewalk.cpp cube_reco_context.cpp `2982583` change to relative `192a24a` applybox.cpp, take one `8eeb053` delete docs for obsolete params `52e4c77` modernise classify/ocrfeatures.cpp `2a1cba6` modernise cutil/emalloc.cpp `773e006` silence doxygen warning `aeb1731` silence doxygen warning `f18387f` silence doxygen; new params are unused? `15ad6bd` doxygenify cutil/efio.cpp `c8b5dad` doxygenify cutil/danerror.cpp `784450f` the globals and exceptions parts are obsolete; remove `8bca324` doxygen classify/normfeat.cpp `9bcbe16` doxygen classify/normmatch.cpp `aa9a971` doxygen ccmain/cube_control.cpp `c083ff2` doxygen ccmain/cube_reco_context.cpp `f842850` params changed `5c94f12` doxygen ccmain/cubeclassifier.cpp `15ba750` case sensitive `f5c71d4` case sensitive `f85655b` doxygen classify/intproto.cpp `4bbc7aa` partial doxygen classify/mfx.cpp `dbb6041` partial doxygen classify/intproto.cpp `2aa72db` finish doxygen classify/intproto.cpp `0b8de99` doxygen training/mftraining.cpp `0b5b35c` partial doxygen ccstruct/coutln.cpp `b81c766` partial doxygen ccstruct/coutln.cpp `40fc415` finished? doxygen ccstruct/coutln.cpp `6e4165c` doxygen classify/clusttool.cpp `0267dec` doxygen classify/cutoffs.cpp `7f0c70c` doxygen classify/fpoint.cpp `512f3bd` ignore ~ files `5668a52` doxygen classify/intmatcher.cpp `84788d4` doxygen classify/kdtree.cpp `29f36ca` doxygen classify/mfoutline.cpp `40b94b1` silence doxygen warnings `6c511b9` doxygen classify/mfx.cpp `f9b4080` doxygen classify/outfeat.cpp `aa1df05` doxygen classify/picofeat.cpp `cc5f466` doxygen training/cntraining.cpp `cce044f` doxygen training/commontraining.cpp `167e216` missing param `9498383` renamed params `37eeac2` renamed param `d87b5dd` case `c8ee174` renamed params `b858db8` typo `4c2a838` h2 context? `81a2c0c` fix some param names; add some missing params, no docs `bcf8a4c` add some missing params, no docs `af77f86` add some missing params, no docs; fix some param names `01df24e` fix some params `6161056` fix some params `68508b6` fix some params `285aeb6` doxygen complains here no matter what `529bcfa` rm some missing params, typos `cd21226` rm some missing params, add some new ones `48a4bc2` fix params `c844628` missing param `312ce37` missing param; rename one `ec2fdec` missing param `05e15e0` missing params `d515858` change "<" to < to make doxygen happy `b476a28` wrong place	2015-07-20 18:48:00 +01:00
Ray Smith	b1d99dfe23	Added a backup adaptive classifier to take over from primary when it fills on a large document	2015-06-12 11:10:53 -07:00
Zdenko Podobný	917e994caa	extend ETEXT_DESC by progress_callback	2015-05-17 21:56:40 +02:00
Ray Smith	1e3b671298	Fixes to make yesterday's changes compile	2015-05-13 09:58:59 -07:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	a441993100	Fixed issue 1301	2014-10-07 09:27:25 -07:00
Ray Smith	2f197cd653	Fixed issues 899/1220/1246 (mixed eng+ara)	2014-09-17 18:27:49 -07:00
Ray Smith	09b439b05a	Fixed issue 1241, but disabled due to making accuracy worse	2014-08-13 13:33:10 -07:00
theraysmith@gmail.com	dbf6197471	Major refactor of control.cpp to enable line recognition git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-08-11 23:23:06 +00:00
theraysmith@gmail.com	cda8e748b1	Fixed some formatting issues git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1083 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-04-25 01:25:42 +00:00
theraysmith@gmail.com	5d61f46332	Fixed issue 1112 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1079 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-04-24 20:13:38 +00:00
theraysmith@gmail.com	7f5e5264d3	Fixed issues 1093-1097 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1048 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-02-04 23:36:24 +00:00
theraysmith@gmail.com	7ec4fd7a56	Refactorerd control functions to enable parallel blob classification git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-11-08 20:30:56 +00:00
theraysmith@gmail.com	4d514d5a60	Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-09-23 15:26:50 +00:00
zdenop@gmail.com	10c1169d98	remove unused code (Windows related) git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@860 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-07-08 18:21:10 +00:00
zdenop@gmail.com	5958f01f5f	fix doxygen warnings git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@715 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-30 15:42:06 +00:00
david.eger@gmail.com	018f192fc2	Abolish populate_unichars(), fixing seg fault reported in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-15 01:37:00 +00:00
david.eger@gmail.com	78a8356a76	Put one last bigram correction debug statement behind a debug flag. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@669 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-09 20:08:17 +00:00
david.eger@gmail.com	56bc885721	Fix some debug messaging about bigram correction -- the two lists of alternates are not independent. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@664 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-03 19:43:25 +00:00
theraysmith@gmail.com	3a998fe7ac	Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-02 02:59:49 +00:00
theraysmith	3e8c0bc228	Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2011-03-21 21:44:05 +00:00
theraysmith	7121e51422	Deleted lots of dead code, including PBLOB git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@556 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2011-03-18 21:52:08 +00:00
theraysmith	137f4806b6	Added sub/superscript, small/dropcap detection git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@547 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-12-09 01:32:20 +00:00
theraysmith	c8465252e4	Rewrite of DENORM git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@538 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-30 01:05:48 +00:00
zdenop@gmail.com	4523ce9f7d	3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-23 18:34:14 +00:00
joregan	f2506871f9	move include of config_auto.h to not conflict with local types. Not finished git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@490 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-09-30 15:53:40 +00:00
joregan	b6e3cbea5a	more doxygen git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@445 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-07-27 16:39:45 +00:00
joregan	4acaabdb62	make some static git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@440 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-07-26 18:21:10 +00:00
joregan	522a8ccfc4	fix issue 332 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@429 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-07-20 10:31:49 +00:00
joregan	5c8ad7ee72	add config_auto.h anywhere #ifndef GRAPHICS_DISABLED is used git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@384 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-05-28 12:03:45 +00:00
theraysmith	109d1c8f21	Some changes in ccmain for 3.00 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@286 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-07-11 02:03:51 +00:00

1 2

57 Commits