tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-15 09:34:30 +08:00

Author	SHA1	Message	Date
Jan Ruzicka	f89c7808cf	more link updates modifying link to training from google code and adding link to documentation by Doxygen.	2015-06-02 14:12:42 -04:00
zdenop	8faea4bf06	Update README.md fix links to wiki	2015-06-02 09:56:55 +02:00
Zdenko Podobný	fc793355a8	Move pdf documents to docs repository	2015-05-22 22:10:31 +02:00
Zdenko Podobný	b1b02572ab	Merge branch 'Issue1474' * Issue1474: Fix potential null pointer dereference in ccmain/paragraphs.cpp.	2015-05-22 21:19:14 +02:00
Zdenko Podobný	d8a55d739d	Fix potential null pointer dereference in ccmain/paragraphs.cpp.	2015-05-22 21:17:33 +02:00
zdenop	e4136f28a5	Merge pull request #33 from rmtheis/tweak-readme Minor edits to Readme	2015-05-22 08:25:44 +02:00
Robert Theis	a36a5f96d0	Minor edits to Readme	2015-05-21 19:36:50 -07:00
zdenop	f8ebff262e	Merge pull request #32 from orbitcowboy/master Fix potential null pointer dereference in ccmain/paragraphs.cpp.	2015-05-20 19:01:13 +02:00
orbitcowboy	9328f0e5d4	Fix potential null pointer dereference in ccmain/paragraphs.cpp.	2015-05-19 10:17:44 +02:00
Jim Regan	05acff6253	Merge pull request #23 from tesseract-ocr/training-sh /usr/share/fonts is the wrong path on Mac	2015-05-18 14:05:44 +01:00
Jim O'Regan	4a6195202c	fix typo	2015-05-18 12:32:36 +01:00
Jim O'Regan	99be295349	Merge branch 'monitor' of https://github.com/tesseract-ocr/tesseract into monitor	2015-05-18 12:29:11 +01:00
Renard Wellnitz	49a7ed13ea	fix to compile tesseract on mac with clang	2015-05-18 09:59:10 +01:00
Jim O'Regan	16ac3b0a20	/usr/share/fonts is the wrong path on Mac	2015-05-18 09:53:14 +01:00
zdenop	e9f59351de	Merge pull request #19 from haf/feature/readme-improvement [infra] updating readme	2015-05-18 08:46:46 +02:00
Zdenko Podobný	438edd6c7b	added row attributes to hocr output	2015-05-17 22:13:59 +02:00
Zdenko Podobný	917e994caa	extend ETEXT_DESC by progress_callback	2015-05-17 21:56:40 +02:00
Zdenko Podobný	ed6ae9b974	Add monitor to GetHOCRText	2015-05-17 21:55:50 +02:00
Henrik Feldt	a0ea634e15	[infra] README -> README.md, links	2015-05-16 19:19:54 +02:00
Henrik Feldt	03c29f96d8	[infra] updating readme	2015-05-16 19:10:10 +02:00
Zdenko Podobný	59bcbc79b3	fix GIT_VER info in VS2010	2015-05-15 15:14:49 +02:00
Zdenko Podobný	e98849b482	rint error message when pdf.ttf is not found.	2015-05-15 15:14:00 +02:00
Jim O'Regan	e7b087ffe6	update Doxyfile	2015-05-14 13:43:07 +01:00
Zdenko Podobný	aec22a47ec	fix autotools c++11 issue with disabled training	2015-05-14 14:25:49 +02:00
Zdenko Podobný	1d6de86150	fix VS2010 linking error	2015-05-14 14:24:55 +02:00
Zdenko Podobný	035b324f0f	reflect the latest commits in VS2010 build	2015-05-14 10:52:54 +02:00
Ray Smith	941d87057e	Fixed training build	2015-05-13 17:46:58 -07:00
Ray Smith	81b67f7ed9	Removed debug logging that doesn't belong	2015-05-13 17:12:23 -07:00
Ray Smith	d91df9856b	Fixed crash on debugging classifier with a shapetable present	2015-05-13 17:10:23 -07:00
Ray Smith	4598061324	Fixed infinite loop in training due to poor clipping of the table filler	2015-05-13 17:09:35 -07:00
Ray Smith	5bb0d89291	Improved debug of class pruner	2015-05-13 17:07:11 -07:00
zhivko.tabakov@gmail.com	07be522e43	Issue 1351: OpenCL build - kernel_ThresholdRectToPix() not accounting for padding bits in the output pix?! https://code.google.com/p/tesseract-ocr/issues/detail?id=1351 What steps will reproduce the problem? 1.Use tesseract build with OpenCL. 2.Pass full color image with width which is not multiple of 32. 3.Recognition is way too slow and does not recognize anything. I read the article on http://www.sk-spell.sk.cx/tesseract-meets-the-opencl-first-test and decided to give OCL a try. The initial result was as per point 3 above. After some debugging I figured the problem is that the OCL version of threshold rect generation does not account for padding bits in the output pix lines. To prove my discovery I made a quick fix in oclkernels.h replacing the definition of kernel_ThresholdRectToPix Just a reminder: it is necessary to force OCL kernel recompilation after changing this source (e.g. delete “kernel - <device>.bin” from the exec folder). The fix is working but I am not sure about it since the original source apparently works for other people (as per the article). If I am right the OS/GPU are irrelevant since the bug is algorithmic, but mine are Windows/AMD. Also similar fix is applicable to kernel_ThresholdRectToPix_OneChan(), but there the input array might have some padding bytes as well, so its indexing will need further adjustments. I can come with some prove/fix for it either - I have not played with it yet. Disclaimer: I have no prior experience with image processing and tesseract source or with GPU computing and OpenCL (but please do explain if I am wrong).	2015-05-13 21:23:23 +01:00
Ray Smith	1e3b671298	Fixes to make yesterday's changes compile	2015-05-13 09:58:59 -07:00
Ray Smith	7bc6d3e059	Merge remote-tracking branch 'refs/remotes/origin/master' Updating from master.	2015-05-13 09:06:44 -07:00
Ray Smith	c34dea6543	Missing from `25d0968`	2015-05-13 09:05:08 -07:00
Jim O'Regan	a94943cc1f	remove unneeded comment from commit	2015-05-13 14:59:02 +01:00
oriahulrich@microvu.com	d3252f926e	Issue 1316: The traineddata file must be closed after it was opened	2015-05-13 14:53:37 +01:00
Jim O'Regan	b13691fda0	Merge conflict: going with Ray's version	2015-05-13 08:54:28 +01:00
Ray Smith	03f3c9dc88	Misc fixes missed from previous commits	2015-05-12 18:13:15 -07:00
Ray Smith	b2a3924585	Major updates to training system as a result of extensive testing on 100 languages - makefile.am	2015-05-12 18:08:39 -07:00
Ray Smith	6be25156f7	Major updates to training system as a result of extensive testing on 100 languages	2015-05-12 18:04:31 -07:00
Ray Smith	21805e63a4	Improved performance with PIC compilation option	2015-05-12 17:56:04 -07:00
Ray Smith	164897210a	Improved newlines and spaces in a box file so it works better with RTL languages.	2015-05-12 17:51:03 -07:00
Ray Smith	6b634170c1	Significant change to invisible font system to improve correctness and compatibility with external programs, particularly ghostscript. We will start mapping everything to a single glyph, rather than allowing characters to run off the end of the font. A more detailed design discussion is embedded into pdfrenderer.cpp comments. The font, source code that produces the font, and the design comments were contributed by Ken Sharp from Artifex Software.	2015-05-12 17:33:18 -07:00
Ray Smith	2924d3ae15	Changes missed from diacritic fix edit	2015-05-12 17:28:56 -07:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	2eec979577	Makefile.am for fix to issue 1252	2015-05-12 15:25:00 -07:00

... 12 13 14 15 16 ...

1620 Commits