tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-05 02:47:00 +08:00

Author	SHA1	Message	Date
Stefan Weil	16e00b59fe	dict: Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-04-22 17:42:35 +02:00
Stefan Weil	023e1b340e	Use POSIX data types and macros (#878 ) * api: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * cutil: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * training: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract data types by POSIX data types Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract data types by POSIX data types Now all Tesseract data types which are no longer needed can be removed from ccutil/host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccmain: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccstruct: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * classify: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * dict: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * lstm: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * textord: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * wordrec: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * ccutil: Replace Tesseract's MIN_INT, MAX_INT* by POSIX INT_MIN, INT_MAX Remove the macros which are now unused from ccutil/host.h. Remove also the obsolete history comments. Signed-off-by: Stefan Weil <sw@weilnetz.de> * Fix build error caused by ambiguous ClipToRange Error message vom Appveyor CI: C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj] C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj] c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char' C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or 'int' Signed-off-by: Stefan Weil <sw@weilnetz.de> * unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de> * arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-03-13 21:36:30 +01:00
Ray Smith	5deebe6c27	Fixed multilang for LSTM, pushed cube to one side without actually deleting it	2016-12-05 14:41:43 -08:00
Ray Smith	2c837dffc3	Result of clang tidy on recent merge	2016-11-07 10:46:33 -08:00
Jim O'Regan	524a61452d	Doxygen Squashed commit from https://github.com/tesseract-ocr/tesseract/tree/more-doxygen closes #14 Commits: `6317305` doxygen `9f42f69` doxygen `0fc4d52` doxygen `37b4b55` fix typo `bded8f1` some more doxy `020eb00` slight tweak `524666d` doxygenify `2a36a3e` doxygenify `229d218` doxygenify `7fd28ae` doxygenify `a8c64bc` doxygenify `f5d21b6` fix `5d8ede8` doxygenify `a58a4e0` language_model.cpp `fa85709` lm_pain_points.cpp lm_state.cpp `6418da3` merge `06190ba` Merge branch 'old_doxygen_merge' into more-doxygen `84acf08` Merge branch 'master' into more-doxygen `50fe1ff` pagewalk.cpp cube_reco_context.cpp `2982583` change to relative `192a24a` applybox.cpp, take one `8eeb053` delete docs for obsolete params `52e4c77` modernise classify/ocrfeatures.cpp `2a1cba6` modernise cutil/emalloc.cpp `773e006` silence doxygen warning `aeb1731` silence doxygen warning `f18387f` silence doxygen; new params are unused? `15ad6bd` doxygenify cutil/efio.cpp `c8b5dad` doxygenify cutil/danerror.cpp `784450f` the globals and exceptions parts are obsolete; remove `8bca324` doxygen classify/normfeat.cpp `9bcbe16` doxygen classify/normmatch.cpp `aa9a971` doxygen ccmain/cube_control.cpp `c083ff2` doxygen ccmain/cube_reco_context.cpp `f842850` params changed `5c94f12` doxygen ccmain/cubeclassifier.cpp `15ba750` case sensitive `f5c71d4` case sensitive `f85655b` doxygen classify/intproto.cpp `4bbc7aa` partial doxygen classify/mfx.cpp `dbb6041` partial doxygen classify/intproto.cpp `2aa72db` finish doxygen classify/intproto.cpp `0b8de99` doxygen training/mftraining.cpp `0b5b35c` partial doxygen ccstruct/coutln.cpp `b81c766` partial doxygen ccstruct/coutln.cpp `40fc415` finished? doxygen ccstruct/coutln.cpp `6e4165c` doxygen classify/clusttool.cpp `0267dec` doxygen classify/cutoffs.cpp `7f0c70c` doxygen classify/fpoint.cpp `512f3bd` ignore ~ files `5668a52` doxygen classify/intmatcher.cpp `84788d4` doxygen classify/kdtree.cpp `29f36ca` doxygen classify/mfoutline.cpp `40b94b1` silence doxygen warnings `6c511b9` doxygen classify/mfx.cpp `f9b4080` doxygen classify/outfeat.cpp `aa1df05` doxygen classify/picofeat.cpp `cc5f466` doxygen training/cntraining.cpp `cce044f` doxygen training/commontraining.cpp `167e216` missing param `9498383` renamed params `37eeac2` renamed param `d87b5dd` case `c8ee174` renamed params `b858db8` typo `4c2a838` h2 context? `81a2c0c` fix some param names; add some missing params, no docs `bcf8a4c` add some missing params, no docs `af77f86` add some missing params, no docs; fix some param names `01df24e` fix some params `6161056` fix some params `68508b6` fix some params `285aeb6` doxygen complains here no matter what `529bcfa` rm some missing params, typos `cd21226` rm some missing params, add some new ones `48a4bc2` fix params `c844628` missing param `312ce37` missing param; rename one `ec2fdec` missing param `05e15e0` missing params `d515858` change "<" to < to make doxygen happy `b476a28` wrong place	2015-07-20 18:48:00 +01:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
theraysmith@gmail.com	67f9af58b8	Removed dependence on IMAGE class git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@944 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2014-01-09 17:31:29 +00:00
theraysmith@gmail.com	4c3475ad2e	Fixed fmemopen portability problem git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@890 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-10-10 02:07:26 +00:00
theraysmith@gmail.com	4d514d5a60	Major refactor of beam search, elimination of dead code, misc bug fixes, updates to Makefile.am, Changelog etc. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@878 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2013-09-23 15:26:50 +00:00
david.eger@gmail.com	0aadbd0169	Save BLOB_CHOICE s for alternate choices saved during segmentation search so we have them when trying to replace words with alternates in the bigram correction pass. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@739 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-09-01 00:33:46 +00:00
david.eger@gmail.com	4f0ff358a7	Missing close bracket. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@714 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-29 06:15:33 +00:00
david.eger@gmail.com	4ddb3e5941	Good moming, Good aftemoon. During our initial chopping for each word, pay attention to whether a dangerous ambiguity (like rn <-> m) would lead us to a dictionary word. If so, make sure that blob gets chopped so that we can evaluate said dictionary word during the segmentation search. Large accuracy improvement, especially on English printed books (~9%). git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@713 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-28 21:02:54 +00:00
david.eger@gmail.com	0d5e8b5cb6	Recording segmentation state for a choice at LogNewChoice() time was a bad idea -- a VIABLE_CHOICE's Blob->NumChunks can be modified as we go by a call from Dict::LogNewSplit(). Relying on the auxilury segmentation_state makes alt choices sometimes reference the wrong blobs. git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@711 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-03-28 20:11:57 +00:00
david.eger@gmail.com	018f192fc2	Abolish populate_unichars(), fixing seg fault reported in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658634 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@675 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-15 01:37:00 +00:00
theraysmith@gmail.com	fdd4ffe85e	Fixed endian bug in dawg reader, Added word bigram correction, git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@649 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2012-02-02 02:56:18 +00:00
zdenop@gmail.com	4523ce9f7d	3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-11-23 18:34:14 +00:00
joregan	edf7e7694c	silence more useless warnings git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@432 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2010-07-21 15:11:19 +00:00
theraysmith	3a13d80d24	Changes to dict for 3.00 git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@293 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-07-11 02:20:33 +00:00
theraysmith	bea5e04b76	Fixed compilation with GRAPHICS_DISABLED git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@250 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2009-06-03 17:24:08 +00:00
theraysmith	520077bd41	Fixed name collision with jpeg library git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@164 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2008-04-22 00:42:51 +00:00
theraysmith	2a678305c6	Major internationalization improvements git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@133 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2008-02-01 00:21:49 +00:00
theraysmith	570af48b8b	Remaining changes for Unicodeization project git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@87 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-07-18 01:15:07 +00:00
theraysmith	bc769e29b2	Preparations for unicodization git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@32 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-05-16 00:44:44 +00:00
tmbdev	425d593ebe	top-skimming import from sf.net git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20	2007-03-07 20:03:40 +00:00

24 Commits