Ray Smith
d174c4fd33
Fixed occurrence of small rotated blocks in loosely spaced text part 2
2015-06-12 11:12:06 -07:00
Ray Smith
b1d99dfe23
Added a backup adaptive classifier to take over from primary when it fills on a large document
2015-06-12 11:10:53 -07:00
Ray Smith
78b5e1a77d
Fixed occurrence of small rotated blocks in loosely spaced text
2015-06-12 11:05:00 -07:00
Ray Smith
d74c625e52
Fixed blob division params to fix CJK training speed.
2015-06-12 10:59:26 -07:00
Ray Smith
4c7ab0caea
Fixed font lists, improved wordlist management
2015-06-12 10:56:40 -07:00
Ray Smith
ab0f4e2c38
Clang fixes to earlier changes and build compatability with Google environment
2015-06-12 10:53:21 -07:00
zdenop
3ba1f83eb1
Merge pull request #36 from jan-ruzicka/patch-2
...
ChangeLog reformatting for consistent ordering
2015-06-11 09:50:38 +02:00
Jan Ruzicka
953c563efb
change order of entries V1.0 ... V2.04
...
This is to have the newest on top ordering of revisions.
2015-06-11 01:34:45 -04:00
Jan Ruzicka
36740897e0
convert date formats
2015-06-11 01:27:11 -04:00
Jan Ruzicka
42481f2cf4
uniform bullet formatting
2015-06-10 22:52:37 -04:00
zdenop
10ea4f0636
Merge pull request #35 from jan-ruzicka/patch-1
...
more link updates
2015-06-02 21:29:47 +02:00
Jan Ruzicka
f89c7808cf
more link updates
...
modifying link to training from google code and adding link to documentation by Doxygen.
2015-06-02 14:12:42 -04:00
zdenop
8faea4bf06
Update README.md
...
fix links to wiki
2015-06-02 09:56:55 +02:00
Zdenko Podobný
fc793355a8
Move pdf documents to docs repository
2015-05-22 22:10:31 +02:00
Zdenko Podobný
b1b02572ab
Merge branch 'Issue1474'
...
* Issue1474:
Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-22 21:19:14 +02:00
Zdenko Podobný
d8a55d739d
Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-22 21:17:33 +02:00
zdenop
e4136f28a5
Merge pull request #33 from rmtheis/tweak-readme
...
Minor edits to Readme
2015-05-22 08:25:44 +02:00
Robert Theis
a36a5f96d0
Minor edits to Readme
2015-05-21 19:36:50 -07:00
zdenop
f8ebff262e
Merge pull request #32 from orbitcowboy/master
...
Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-20 19:01:13 +02:00
orbitcowboy
9328f0e5d4
Fix potential null pointer dereference in ccmain/paragraphs.cpp.
2015-05-19 10:17:44 +02:00
Jim Regan
05acff6253
Merge pull request #23 from tesseract-ocr/training-sh
...
/usr/share/fonts is the wrong path on Mac
2015-05-18 14:05:44 +01:00
Jim O'Regan
16ac3b0a20
/usr/share/fonts is the wrong path on Mac
2015-05-18 09:53:14 +01:00
zdenop
e9f59351de
Merge pull request #19 from haf/feature/readme-improvement
...
[infra] updating readme
2015-05-18 08:46:46 +02:00
Henrik Feldt
a0ea634e15
[infra] README -> README.md, links
2015-05-16 19:19:54 +02:00
Henrik Feldt
03c29f96d8
[infra] updating readme
2015-05-16 19:10:10 +02:00
Zdenko Podobný
59bcbc79b3
fix GIT_VER info in VS2010
2015-05-15 15:14:49 +02:00
Zdenko Podobný
e98849b482
rint error message when pdf.ttf is not found.
2015-05-15 15:14:00 +02:00
Jim O'Regan
e7b087ffe6
update Doxyfile
2015-05-14 13:43:07 +01:00
Zdenko Podobný
aec22a47ec
fix autotools c++11 issue with disabled training
2015-05-14 14:25:49 +02:00
Zdenko Podobný
1d6de86150
fix VS2010 linking error
2015-05-14 14:24:55 +02:00
Zdenko Podobný
035b324f0f
reflect the latest commits in VS2010 build
2015-05-14 10:52:54 +02:00
Ray Smith
941d87057e
Fixed training build
2015-05-13 17:46:58 -07:00
Ray Smith
81b67f7ed9
Removed debug logging that doesn't belong
2015-05-13 17:12:23 -07:00
Ray Smith
d91df9856b
Fixed crash on debugging classifier with a shapetable present
2015-05-13 17:10:23 -07:00
Ray Smith
4598061324
Fixed infinite loop in training due to poor clipping of the table filler
2015-05-13 17:09:35 -07:00
Ray Smith
5bb0d89291
Improved debug of class pruner
2015-05-13 17:07:11 -07:00
Ray Smith
1e3b671298
Fixes to make yesterday's changes compile
2015-05-13 09:58:59 -07:00
Ray Smith
7bc6d3e059
Merge remote-tracking branch 'refs/remotes/origin/master'
...
Updating from master.
2015-05-13 09:06:44 -07:00
Ray Smith
c34dea6543
Missing from 25d0968
2015-05-13 09:05:08 -07:00
Jim O'Regan
b13691fda0
Merge conflict: going with Ray's version
2015-05-13 08:54:28 +01:00
Ray Smith
03f3c9dc88
Misc fixes missed from previous commits
2015-05-12 18:13:15 -07:00
Ray Smith
b2a3924585
Major updates to training system as a result of extensive testing on 100 languages - makefile.am
2015-05-12 18:08:39 -07:00
Ray Smith
6be25156f7
Major updates to training system as a result of extensive testing on 100 languages
2015-05-12 18:04:31 -07:00
Ray Smith
21805e63a4
Improved performance with PIC compilation option
2015-05-12 17:56:04 -07:00
Ray Smith
164897210a
Improved newlines and spaces in a box file so it works better with RTL languages.
2015-05-12 17:51:03 -07:00
Ray Smith
6b634170c1
Significant change to invisible font system
...
to improve correctness and compatibility with
external programs, particularly ghostscript.
We will start mapping everything to a single glyph,
rather than allowing characters to run off the end
of the font.
A more detailed design discussion is embedded into
pdfrenderer.cpp comments. The font, source code
that produces the font, and the design comments
were contributed by Ken Sharp from Artifex Software.
2015-05-12 17:33:18 -07:00
Ray Smith
2924d3ae15
Changes missed from diacritic fix edit
2015-05-12 17:28:56 -07:00
Ray Smith
84920b92b3
Font and classifier output structure cleanup.
...
Font recognition was poor, due to forcing a 1st and 2nd choice at
a character level, when the total score for the correct font is often
correct at the word level, so allowed the propagation of a full set
of fonts and scores to the word recognizer, which can now decide word
level fonts using the scores instead of simple votes.
Change precipitated a cleanup of output data structures for classifier
results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few
extra elements going in UnicharRating, and using that wherever possible.
That added the extra complexity of 1-rating due to a flip between 0 is
good and 0 is bad for the internal classifier scores before they are
converted to rating and certainty.
2015-05-12 17:24:34 -07:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
b6d0184806
Fixed problems with shifted baselines so recognition can recover from layout analysis errors.
2015-05-12 15:53:45 -07:00