tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-11-28 13:49:35 +08:00

Author	SHA1	Message	Date
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	2eec979577	Makefile.am for fix to issue 1252	2015-05-12 15:25:00 -07:00
Ray Smith	53fc4456cc	Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify. Eliminated the flexfx scheme for calling global feature extractor functions through an array of function pointers. Deleted dead code I found as a by-product. This CL does not change BlobToTrainingSample or ExtractFeatures to be full members of Classify (the eventual goal) as that would make it even bigger, since there are a lot of callers to these functions. When ExtractFeatures and BlobToTrainingSample are members of Classify they will be able to access control parameters in Classify, which will greatly simplify developing variations to the feature extraction process.	2015-05-12 15:22:34 -07:00
Ray Smith	e735a9017b	Makefile.am change for Split/seam refactor	2015-05-12 15:05:56 -07:00
Ray Smith	25d0968d09	Major refactor to improve speed on difficut images, especially when running a heap checker. SEAM and SPLIT have been begging for a refactor for a LONG time. This change does most of the work of turning them into proper classes: Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions. Made the splits full data members of SEAM in an array instead of 3 separate pointers. This greatly reduces the amount of new/delete happening in the chopper, which is the main goal. Deleted redundant files: olutil., makechop. Brought other code into SEAM in order to keep its data members private with only priority having accessors.	2015-05-12 14:59:14 -07:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	09b0c91fc9	fix Issue 1398	2015-02-06 23:44:58 +01:00
Zdenko Podobný	15d48361b4	fix VS2010 build;	2015-02-05 17:27:18 +01:00
Zdenko Podobný	9bca55c73b	fix space issue in revision `36883b4faf`	2015-01-30 22:24:26 +01:00
Zdenko Podobný	36883b4faf	preserve interword spaces patch - Issue 1409	2015-01-27 22:58:04 +01:00
Zdenko Podobný	e0441d0c6b	fix typo/ issue 1397	2014-12-31 22:31:50 +01:00
Zdenko Podobný	473141c1de	fix bool in c-api	2014-12-28 17:55:56 +01:00
Zdenko Podobný	4da712d04d	Add paragraph info to C-API(fix issue 1388)	2014-12-07 14:07:14 +01:00
Zdenko Podobný	239f350a72	remove const from C API TessResultIteratorGetChoiceIterator (issue 1342)	2014-10-14 22:46:11 +02:00
Ray Smith	242b14ae7f	Reduced size of multi-renderer implementation from code review	2014-10-09 13:29:46 -07:00
Ray Smith	d9699c4099	Fixed bidi handling in PDF output	2014-10-09 13:29:01 -07:00
Ray Smith	f927728169	Fixed issue 1207	2014-10-09 13:28:03 -07:00
Zdenko Podobný	d0cb1071b2	remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285)	2014-10-07 23:37:34 +02:00
Ray Smith	55d11ad3c2	Moved params from global in page layout to tesseractclass, improved single column layout analysis	2014-10-07 09:31:00 -07:00
Ray Smith	a441993100	Fixed issue 1301	2014-10-07 09:27:25 -07:00
Ray Smith	f77d01eb7b	Fixed issue 1302	2014-10-07 09:25:53 -07:00
Ray Smith	26235d69e8	Fixed issue 1304	2014-10-07 09:24:24 -07:00
Ray Smith	bfd2cb83d5	Fixed issue 1303	2014-10-07 09:21:17 -07:00
Zdenko Podobný	4904afe65b	fix issue 1300 - patch from #35	2014-10-06 22:43:56 +02:00
Zdenko Podobný	4c01561b0f	fix issue 1300 - patch from #26	2014-10-02 21:19:17 +02:00
Zdenko Podobný	c0640a4bef	fix cygwin build (issue 1289)	2014-09-28 23:19:52 +02:00
Zdenko Podobný	f8613fab22	fix issue 1300 /patches from breidenbach	2014-09-21 16:38:24 +02:00
Zdenko Podobný	c44f3da353	Doxygen - improve strip path	2014-09-21 15:16:38 +02:00
Zdenko Podobný	9e8629d9ef	allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ )	2014-09-19 23:33:47 +02:00
Ray Smith	d3448c37ab	Fixed issue 1264	2014-09-17 18:29:32 -07:00
Ray Smith	2f197cd653	Fixed issues 899/1220/1246 (mixed eng+ara)	2014-09-17 18:27:49 -07:00
Ray Smith	e46b605469	Improved script consistency in numbers	2014-09-17 18:22:32 -07:00
Ray Smith	648e7ca311	Merge branch 'master' of https://code.google.com/p/tesseract-ocr Usual git need to merge if local is out of date.	2014-09-17 18:10:17 -07:00
Ray Smith	0256529c1f	Fixed issue 1243	2014-09-17 18:09:45 -07:00
Zdenko Podobný	93f7899a9e	fix tesstrain.sh/issue 1311. Patch from Mark Zealey <zealey@gmail.com> https://groups.google.com/forum/#!msg/tesseract-dev/uYTr1D656-M/xLXgjKy9fywJ	2014-09-14 15:10:25 +02:00
Jim O'Regan	c4b39bd89e	Merge branch 'master' of https://code.google.com/p/tesseract-ocr	2014-09-09 20:37:54 +01:00
Jim O'Regan	c0c719306a	update docs for TessBaseAPI::SetProbabilityInContextFunc based on Ray's email today	2014-09-09 20:37:27 +01:00
Zdenko Podobný	ff87944171	fix typo	2014-09-07 18:23:47 +02:00
Thomas G. (Syryos)	541e06c2b2	typo correction! fixes 1287	2014-09-06 10:57:56 +01:00
Zdenko Podobný	d1aa61c110	fix issue 1285: reimplement option to select pdf compression	2014-09-06 09:32:22 +02:00
Zdenko Podobný	298e31465a	require leptonica 1.71 for tesseract build	2014-08-18 23:04:08 +02:00
Zdenko Podobný	5755a5cecb	fix opencl build on OSX (issue 1272)	2014-08-18 09:37:21 +02:00
Zdenop	524ee27f01	increase version number based on baseapi.h	2014-08-16 21:02:41 +02:00
Zdenop	689c8e5667	fix VS2010 build	2014-08-15 23:00:20 +02:00
Zdenko Podobný	369fabb7fc	fix filemode; update autotools and distribution script to repository changes; ignore doxygen generated files and langauge data files;	2014-08-14 23:37:17 +02:00
Ray Smith	3c21c14949	Fixed issue 1245	2014-08-13 18:51:28 -07:00
Ray Smith	3adb03b5c8	Merge branch 'master' of https://code.google.com/p/tesseract-ocr Why? Isn't git easier? Just updating from remote.	2014-08-13 13:36:36 -07:00

1 2 3 4 5 ...

920 Commits