tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-15 09:47:46 +08:00

Author	SHA1	Message	Date
Jim O'Regan	99be295349	Merge branch 'monitor' of https://github.com/tesseract-ocr/tesseract into monitor	2015-05-18 12:29:11 +01:00
Renard Wellnitz	49a7ed13ea	fix to compile tesseract on mac with clang	2015-05-18 09:59:10 +01:00
Jim O'Regan	16ac3b0a20	/usr/share/fonts is the wrong path on Mac	2015-05-18 09:53:14 +01:00
zdenop	e9f59351de	Merge pull request #19 from haf/feature/readme-improvement [infra] updating readme	2015-05-18 08:46:46 +02:00
Zdenko Podobný	438edd6c7b	added row attributes to hocr output	2015-05-17 22:13:59 +02:00
Zdenko Podobný	917e994caa	extend ETEXT_DESC by progress_callback	2015-05-17 21:56:40 +02:00
Zdenko Podobný	ed6ae9b974	Add monitor to GetHOCRText	2015-05-17 21:55:50 +02:00
Henrik Feldt	a0ea634e15	[infra] README -> README.md, links	2015-05-16 19:19:54 +02:00
Henrik Feldt	03c29f96d8	[infra] updating readme	2015-05-16 19:10:10 +02:00
Zdenko Podobný	59bcbc79b3	fix GIT_VER info in VS2010	2015-05-15 15:14:49 +02:00
Zdenko Podobný	e98849b482	rint error message when pdf.ttf is not found.	2015-05-15 15:14:00 +02:00
Jim O'Regan	e7b087ffe6	update Doxyfile	2015-05-14 13:43:07 +01:00
Zdenko Podobný	aec22a47ec	fix autotools c++11 issue with disabled training	2015-05-14 14:25:49 +02:00
Zdenko Podobný	1d6de86150	fix VS2010 linking error	2015-05-14 14:24:55 +02:00
Zdenko Podobný	035b324f0f	reflect the latest commits in VS2010 build	2015-05-14 10:52:54 +02:00
Ray Smith	941d87057e	Fixed training build	2015-05-13 17:46:58 -07:00
Ray Smith	81b67f7ed9	Removed debug logging that doesn't belong	2015-05-13 17:12:23 -07:00
Ray Smith	d91df9856b	Fixed crash on debugging classifier with a shapetable present	2015-05-13 17:10:23 -07:00
Ray Smith	4598061324	Fixed infinite loop in training due to poor clipping of the table filler	2015-05-13 17:09:35 -07:00
Ray Smith	5bb0d89291	Improved debug of class pruner	2015-05-13 17:07:11 -07:00
zhivko.tabakov@gmail.com	07be522e43	Issue 1351: OpenCL build - kernel_ThresholdRectToPix() not accounting for padding bits in the output pix?! https://code.google.com/p/tesseract-ocr/issues/detail?id=1351 What steps will reproduce the problem? 1.Use tesseract build with OpenCL. 2.Pass full color image with width which is not multiple of 32. 3.Recognition is way too slow and does not recognize anything. I read the article on http://www.sk-spell.sk.cx/tesseract-meets-the-opencl-first-test and decided to give OCL a try. The initial result was as per point 3 above. After some debugging I figured the problem is that the OCL version of threshold rect generation does not account for padding bits in the output pix lines. To prove my discovery I made a quick fix in oclkernels.h replacing the definition of kernel_ThresholdRectToPix Just a reminder: it is necessary to force OCL kernel recompilation after changing this source (e.g. delete “kernel - <device>.bin” from the exec folder). The fix is working but I am not sure about it since the original source apparently works for other people (as per the article). If I am right the OS/GPU are irrelevant since the bug is algorithmic, but mine are Windows/AMD. Also similar fix is applicable to kernel_ThresholdRectToPix_OneChan(), but there the input array might have some padding bytes as well, so its indexing will need further adjustments. I can come with some prove/fix for it either - I have not played with it yet. Disclaimer: I have no prior experience with image processing and tesseract source or with GPU computing and OpenCL (but please do explain if I am wrong).	2015-05-13 21:23:23 +01:00
Ray Smith	1e3b671298	Fixes to make yesterday's changes compile	2015-05-13 09:58:59 -07:00
Ray Smith	7bc6d3e059	Merge remote-tracking branch 'refs/remotes/origin/master' Updating from master.	2015-05-13 09:06:44 -07:00
Ray Smith	c34dea6543	Missing from `25d0968`	2015-05-13 09:05:08 -07:00
Jim O'Regan	a94943cc1f	remove unneeded comment from commit	2015-05-13 14:59:02 +01:00
oriahulrich@microvu.com	d3252f926e	Issue 1316: The traineddata file must be closed after it was opened	2015-05-13 14:53:37 +01:00
Jim O'Regan	b13691fda0	Merge conflict: going with Ray's version	2015-05-13 08:54:28 +01:00
Ray Smith	03f3c9dc88	Misc fixes missed from previous commits	2015-05-12 18:13:15 -07:00
Ray Smith	b2a3924585	Major updates to training system as a result of extensive testing on 100 languages - makefile.am	2015-05-12 18:08:39 -07:00
Ray Smith	6be25156f7	Major updates to training system as a result of extensive testing on 100 languages	2015-05-12 18:04:31 -07:00
Ray Smith	21805e63a4	Improved performance with PIC compilation option	2015-05-12 17:56:04 -07:00
Ray Smith	164897210a	Improved newlines and spaces in a box file so it works better with RTL languages.	2015-05-12 17:51:03 -07:00
Ray Smith	6b634170c1	Significant change to invisible font system to improve correctness and compatibility with external programs, particularly ghostscript. We will start mapping everything to a single glyph, rather than allowing characters to run off the end of the font. A more detailed design discussion is embedded into pdfrenderer.cpp comments. The font, source code that produces the font, and the design comments were contributed by Ken Sharp from Artifex Software.	2015-05-12 17:33:18 -07:00
Ray Smith	2924d3ae15	Changes missed from diacritic fix edit	2015-05-12 17:28:56 -07:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	2eec979577	Makefile.am for fix to issue 1252	2015-05-12 15:25:00 -07:00
Ray Smith	53fc4456cc	Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify. Eliminated the flexfx scheme for calling global feature extractor functions through an array of function pointers. Deleted dead code I found as a by-product. This CL does not change BlobToTrainingSample or ExtractFeatures to be full members of Classify (the eventual goal) as that would make it even bigger, since there are a lot of callers to these functions. When ExtractFeatures and BlobToTrainingSample are members of Classify they will be able to access control parameters in Classify, which will greatly simplify developing variations to the feature extraction process.	2015-05-12 15:22:34 -07:00
Ray Smith	e735a9017b	Makefile.am change for Split/seam refactor	2015-05-12 15:05:56 -07:00
Ray Smith	25d0968d09	Major refactor to improve speed on difficut images, especially when running a heap checker. SEAM and SPLIT have been begging for a refactor for a LONG time. This change does most of the work of turning them into proper classes: Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions. Made the splits full data members of SEAM in an array instead of 3 separate pointers. This greatly reduces the amount of new/delete happening in the chopper, which is the main goal. Deleted redundant files: olutil., makechop. Brought other code into SEAM in order to keep its data members private with only priority having accessors.	2015-05-12 14:59:14 -07:00
Zdenko Podobný	d508751e58	Fixed issue 1317 - git revision info used as version info for autotools & DEBUG	2015-05-02 12:15:13 +02:00
Zdenko Podobný	d1c749f6ad	Fixed issue 1133 - part3 (Nick's replacement of InputBuffer-ReadLine with InputBuffer-Read)	2015-05-01 19:33:56 +02:00
Zdenko Podobný	5e754af9cb	Fixed issue 1133 - part2	2015-05-01 19:12:03 +02:00
Zdenko Podobný	53eab2ee92	fix issue 1354	2015-04-15 22:37:58 +02:00
Zdenko Podobný	370f1c65ad	fix issue 1436	2015-04-12 16:38:03 +02:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	09b0c91fc9	fix Issue 1398	2015-02-06 23:44:58 +01:00
Zdenko Podobný	15d48361b4	fix VS2010 build;	2015-02-05 17:27:18 +01:00

... 10 11 12 13 14 ...

1509 Commits