Commit Graph

1024 Commits

Author SHA1 Message Date
Zdenko Podobný
9b7f2527f1 fix links in doc; autotools requires README 2015-06-13 00:08:05 +02:00
Ray Smith
0ee178d79b Clang fixes to earlier changes and build compatability with Google environment part 2 2015-06-12 11:17:47 -07:00
Ray Smith
d174c4fd33 Fixed occurrence of small rotated blocks in loosely spaced text part 2 2015-06-12 11:12:06 -07:00
Ray Smith
b1d99dfe23 Added a backup adaptive classifier to take over from primary when it fills on a large document 2015-06-12 11:10:53 -07:00
Ray Smith
78b5e1a77d Fixed occurrence of small rotated blocks in loosely spaced text 2015-06-12 11:05:00 -07:00
Ray Smith
d74c625e52 Fixed blob division params to fix CJK training speed. 2015-06-12 10:59:26 -07:00
Ray Smith
4c7ab0caea Fixed font lists, improved wordlist management 2015-06-12 10:56:40 -07:00
Ray Smith
ab0f4e2c38 Clang fixes to earlier changes and build compatability with Google environment 2015-06-12 10:53:21 -07:00
zdenop
3ba1f83eb1 Merge pull request #36 from jan-ruzicka/patch-2
ChangeLog reformatting for consistent ordering
2015-06-11 09:50:38 +02:00
Jan Ruzicka
953c563efb change order of entries V1.0 ... V2.04
This is to have the newest on top ordering of revisions.
2015-06-11 01:34:45 -04:00
Jan Ruzicka
36740897e0 convert date formats 2015-06-11 01:27:11 -04:00
Jan Ruzicka
42481f2cf4 uniform bullet formatting 2015-06-10 22:52:37 -04:00
zdenop
10ea4f0636 Merge pull request #35 from jan-ruzicka/patch-1
more link updates
2015-06-02 21:29:47 +02:00
Jan Ruzicka
f89c7808cf more link updates
modifying link to training from google code and adding link to documentation by Doxygen.
2015-06-02 14:12:42 -04:00
zdenop
8faea4bf06 Update README.md
fix links to wiki
2015-06-02 09:56:55 +02:00
Zdenko Podobný
fc793355a8 Move pdf documents to docs repository 2015-05-22 22:10:31 +02:00
Zdenko Podobný
b1b02572ab Merge branch 'Issue1474'
* Issue1474:
  Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-22 21:19:14 +02:00
Zdenko Podobný
d8a55d739d Fix potential null pointer dereference in ccmain/paragraphs.cpp. 2015-05-22 21:17:33 +02:00
zdenop
e4136f28a5 Merge pull request #33 from rmtheis/tweak-readme
Minor edits to Readme
2015-05-22 08:25:44 +02:00
Robert Theis
a36a5f96d0 Minor edits to Readme 2015-05-21 19:36:50 -07:00
zdenop
f8ebff262e Merge pull request #32 from orbitcowboy/master
Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-20 19:01:13 +02:00
orbitcowboy
9328f0e5d4 Fix potential null pointer dereference in ccmain/paragraphs.cpp. 2015-05-19 10:17:44 +02:00
Jim Regan
05acff6253 Merge pull request #23 from tesseract-ocr/training-sh
/usr/share/fonts is the wrong path on Mac
2015-05-18 14:05:44 +01:00
Jim O'Regan
16ac3b0a20 /usr/share/fonts is the wrong path on Mac 2015-05-18 09:53:14 +01:00
zdenop
e9f59351de Merge pull request #19 from haf/feature/readme-improvement
[infra] updating readme
2015-05-18 08:46:46 +02:00
Henrik Feldt
a0ea634e15 [infra] README -> README.md, links 2015-05-16 19:19:54 +02:00
Henrik Feldt
03c29f96d8 [infra] updating readme 2015-05-16 19:10:10 +02:00
Zdenko Podobný
59bcbc79b3 fix GIT_VER info in VS2010 2015-05-15 15:14:49 +02:00
Zdenko Podobný
e98849b482 rint error message when pdf.ttf is not found. 2015-05-15 15:14:00 +02:00
Jim O'Regan
e7b087ffe6 update Doxyfile 2015-05-14 13:43:07 +01:00
Zdenko Podobný
aec22a47ec fix autotools c++11 issue with disabled training 2015-05-14 14:25:49 +02:00
Zdenko Podobný
1d6de86150 fix VS2010 linking error 2015-05-14 14:24:55 +02:00
Zdenko Podobný
035b324f0f reflect the latest commits in VS2010 build 2015-05-14 10:52:54 +02:00
Ray Smith
941d87057e Fixed training build 2015-05-13 17:46:58 -07:00
Ray Smith
81b67f7ed9 Removed debug logging that doesn't belong 2015-05-13 17:12:23 -07:00
Ray Smith
d91df9856b Fixed crash on debugging classifier with a shapetable present 2015-05-13 17:10:23 -07:00
Ray Smith
4598061324 Fixed infinite loop in training due to poor clipping of the table filler 2015-05-13 17:09:35 -07:00
Ray Smith
5bb0d89291 Improved debug of class pruner 2015-05-13 17:07:11 -07:00
Ray Smith
1e3b671298 Fixes to make yesterday's changes compile 2015-05-13 09:58:59 -07:00
Ray Smith
7bc6d3e059 Merge remote-tracking branch 'refs/remotes/origin/master'
Updating from master.
2015-05-13 09:06:44 -07:00
Ray Smith
c34dea6543 Missing from 25d0968 2015-05-13 09:05:08 -07:00
Jim O'Regan
b13691fda0 Merge conflict: going with Ray's version 2015-05-13 08:54:28 +01:00
Ray Smith
03f3c9dc88 Misc fixes missed from previous commits 2015-05-12 18:13:15 -07:00
Ray Smith
b2a3924585 Major updates to training system as a result of extensive testing on 100 languages - makefile.am 2015-05-12 18:08:39 -07:00
Ray Smith
6be25156f7 Major updates to training system as a result of extensive testing on 100 languages 2015-05-12 18:04:31 -07:00
Ray Smith
21805e63a4 Improved performance with PIC compilation option 2015-05-12 17:56:04 -07:00
Ray Smith
164897210a Improved newlines and spaces in a box file so it works better with RTL languages. 2015-05-12 17:51:03 -07:00
Ray Smith
6b634170c1 Significant change to invisible font system
to improve correctness and compatibility with
external programs, particularly ghostscript.
We will start mapping everything to a single glyph,
rather than allowing characters to run off the end
of the font.

A more detailed design discussion is embedded into
pdfrenderer.cpp comments. The font, source code
that produces the font, and the design comments
were contributed by Ken Sharp from Artifex Software.
2015-05-12 17:33:18 -07:00
Ray Smith
2924d3ae15 Changes missed from diacritic fix edit 2015-05-12 17:28:56 -07:00
Ray Smith
84920b92b3 Font and classifier output structure cleanup.
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.

Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00