tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-11-23 18:49:08 +08:00

Author	SHA1	Message	Date
Jim O'Regan	87753e3646	add DAWG_TYPE_HFST, used by OCRicola	2015-05-18 16:08:32 +01:00
Jim Regan	05acff6253	Merge pull request #23 from tesseract-ocr/training-sh /usr/share/fonts is the wrong path on Mac	2015-05-18 14:05:44 +01:00
Jim O'Regan	16ac3b0a20	/usr/share/fonts is the wrong path on Mac	2015-05-18 09:53:14 +01:00
zdenop	e9f59351de	Merge pull request #19 from haf/feature/readme-improvement [infra] updating readme	2015-05-18 08:46:46 +02:00
Henrik Feldt	a0ea634e15	[infra] README -> README.md, links	2015-05-16 19:19:54 +02:00
Henrik Feldt	03c29f96d8	[infra] updating readme	2015-05-16 19:10:10 +02:00
Zdenko Podobný	59bcbc79b3	fix GIT_VER info in VS2010	2015-05-15 15:14:49 +02:00
Zdenko Podobný	e98849b482	rint error message when pdf.ttf is not found.	2015-05-15 15:14:00 +02:00
Jim O'Regan	e7b087ffe6	update Doxyfile	2015-05-14 13:43:07 +01:00
Zdenko Podobný	aec22a47ec	fix autotools c++11 issue with disabled training	2015-05-14 14:25:49 +02:00
Zdenko Podobný	1d6de86150	fix VS2010 linking error	2015-05-14 14:24:55 +02:00
Zdenko Podobný	035b324f0f	reflect the latest commits in VS2010 build	2015-05-14 10:52:54 +02:00
Ray Smith	941d87057e	Fixed training build	2015-05-13 17:46:58 -07:00
Ray Smith	81b67f7ed9	Removed debug logging that doesn't belong	2015-05-13 17:12:23 -07:00
Ray Smith	d91df9856b	Fixed crash on debugging classifier with a shapetable present	2015-05-13 17:10:23 -07:00
Ray Smith	4598061324	Fixed infinite loop in training due to poor clipping of the table filler	2015-05-13 17:09:35 -07:00
Ray Smith	5bb0d89291	Improved debug of class pruner	2015-05-13 17:07:11 -07:00
Ray Smith	1e3b671298	Fixes to make yesterday's changes compile	2015-05-13 09:58:59 -07:00
Ray Smith	7bc6d3e059	Merge remote-tracking branch 'refs/remotes/origin/master' Updating from master.	2015-05-13 09:06:44 -07:00
Ray Smith	c34dea6543	Missing from `25d0968`	2015-05-13 09:05:08 -07:00
Jim O'Regan	b13691fda0	Merge conflict: going with Ray's version	2015-05-13 08:54:28 +01:00
Ray Smith	03f3c9dc88	Misc fixes missed from previous commits	2015-05-12 18:13:15 -07:00
Ray Smith	b2a3924585	Major updates to training system as a result of extensive testing on 100 languages - makefile.am	2015-05-12 18:08:39 -07:00
Ray Smith	6be25156f7	Major updates to training system as a result of extensive testing on 100 languages	2015-05-12 18:04:31 -07:00
Ray Smith	21805e63a4	Improved performance with PIC compilation option	2015-05-12 17:56:04 -07:00
Ray Smith	164897210a	Improved newlines and spaces in a box file so it works better with RTL languages.	2015-05-12 17:51:03 -07:00
Ray Smith	6b634170c1	Significant change to invisible font system to improve correctness and compatibility with external programs, particularly ghostscript. We will start mapping everything to a single glyph, rather than allowing characters to run off the end of the font. A more detailed design discussion is embedded into pdfrenderer.cpp comments. The font, source code that produces the font, and the design comments were contributed by Ken Sharp from Artifex Software.	2015-05-12 17:33:18 -07:00
Ray Smith	2924d3ae15	Changes missed from diacritic fix edit	2015-05-12 17:28:56 -07:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	2eec979577	Makefile.am for fix to issue 1252	2015-05-12 15:25:00 -07:00
Ray Smith	53fc4456cc	Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify. Eliminated the flexfx scheme for calling global feature extractor functions through an array of function pointers. Deleted dead code I found as a by-product. This CL does not change BlobToTrainingSample or ExtractFeatures to be full members of Classify (the eventual goal) as that would make it even bigger, since there are a lot of callers to these functions. When ExtractFeatures and BlobToTrainingSample are members of Classify they will be able to access control parameters in Classify, which will greatly simplify developing variations to the feature extraction process.	2015-05-12 15:22:34 -07:00
Ray Smith	e735a9017b	Makefile.am change for Split/seam refactor	2015-05-12 15:05:56 -07:00
Ray Smith	25d0968d09	Major refactor to improve speed on difficut images, especially when running a heap checker. SEAM and SPLIT have been begging for a refactor for a LONG time. This change does most of the work of turning them into proper classes: Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions. Made the splits full data members of SEAM in an array instead of 3 separate pointers. This greatly reduces the amount of new/delete happening in the chopper, which is the main goal. Deleted redundant files: olutil., makechop. Brought other code into SEAM in order to keep its data members private with only priority having accessors.	2015-05-12 14:59:14 -07:00
Zdenko Podobný	d508751e58	Fixed issue 1317 - git revision info used as version info for autotools & DEBUG	2015-05-02 12:15:13 +02:00
Zdenko Podobný	d1c749f6ad	Fixed issue 1133 - part3 (Nick's replacement of InputBuffer-ReadLine with InputBuffer-Read)	2015-05-01 19:33:56 +02:00
Zdenko Podobný	5e754af9cb	Fixed issue 1133 - part2	2015-05-01 19:12:03 +02:00
Zdenko Podobný	53eab2ee92	fix issue 1354	2015-04-15 22:37:58 +02:00
Zdenko Podobný	370f1c65ad	fix issue 1436	2015-04-12 16:38:03 +02:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	09b0c91fc9	fix Issue 1398	2015-02-06 23:44:58 +01:00
Zdenko Podobný	15d48361b4	fix VS2010 build;	2015-02-05 17:27:18 +01:00
Zdenko Podobný	9bca55c73b	fix space issue in revision `36883b4faf`	2015-01-30 22:24:26 +01:00
Zdenko Podobný	36883b4faf	preserve interword spaces patch - Issue 1409	2015-01-27 22:58:04 +01:00
Zdenko Podobný	e0441d0c6b	fix typo/ issue 1397	2014-12-31 22:31:50 +01:00
Zdenko Podobný	473141c1de	fix bool in c-api	2014-12-28 17:55:56 +01:00
Zdenko Podobný	4da712d04d	Add paragraph info to C-API(fix issue 1388)	2014-12-07 14:07:14 +01:00
Zdenko Podobný	239f350a72	remove const from C API TessResultIteratorGetChoiceIterator (issue 1342)	2014-10-14 22:46:11 +02:00

1 2 3 4 5 ...

953 Commits