Commit Graph

1556 Commits

Author SHA1 Message Date
zdenop
8c7d158977 Merge pull request #227 from tfmorris/hocr-both
hOCR improvements
2016-02-17 19:49:26 +01:00
Tom Morris
acdbcecd3c Add LTR & mixed direction test files 2016-02-17 11:40:31 -05:00
Tom Morris
6c44775d8a Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa Fix varsize array for Microsoft compiler 2016-02-17 10:20:18 -05:00
zdenop
e028274ae6 Merge pull request #226 from tfmorris/issue225
INCOMPATIBLE fix to hOCR line height information - fixes #225.
2016-02-16 21:15:47 +01:00
Tom Morris
431786276c INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I 
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
Egor Pugin
3422059e09 Merge pull request #222 from tfmorris/hocr-config
Document hocr_font_info in config
2016-02-15 13:48:32 +03:00
Tom Morris
e3e1fe0e20 Document hocr_font_info in config 2016-02-14 16:49:00 -05:00
zdenop
640a98f24b Merge pull request #211 from amitdo/amitdo-readme-update1
Update README.md
2016-02-12 09:46:37 +01:00
zdenop
4393d040bd Merge pull request #220 from jbarlow83/master
Replace pdf.ttf with sharp2.ttf, keep name the same
2016-02-12 09:46:20 +01:00
James R. Barlow
b30930b95a Replace pdf.ttf with sharp2.ttf, keep name the same
As discussed at length in issue #182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince.  It does
seem to fix Kindle and OS X Preview.
2016-02-11 15:44:11 -08:00
Amit Dovev
a67278f61a Update README.md 2016-02-09 23:30:46 +02:00
Amit Dovev
cc88f3509b Update README.md 2016-02-09 16:42:12 +02:00
Amit Dovev
7a90446a0b Update README.md 2016-02-06 15:04:27 +02:00
Egor Pugin
b68be44265 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2016-02-04 13:44:22 +03:00
Egor Pugin
7b94871ba2 Add more include directories. 2016-02-04 13:44:07 +03:00
zdenop
ec44221e33 Merge pull request #205 from devurandom/fix/leptonica-1.73-compat
Compatibility with Leptonica 1.73
2016-01-31 21:42:05 +01:00
zdenop
cd3ea0760b Merge pull request #206 from amitdo/fix-box-training
Fix #64. Make box training work
2016-01-31 14:13:58 +01:00
Dennis Schridde
6072814fea Compatibility with Leptonica 1.73
http://www.leptonica.org/source/version-notes.html:
       Naming changes (to avoid collisions):
         #defines MALLOC --> LEPT_MALLOC, CALLOC --> LEPT_CALLOC, etc.
         ByteBuffer --> L_ByteBuffer

Introduction of the TESSERACT_LIBLEPT_PREREQ macro allows backward compatibility with Leptonica <1.73.
2016-01-31 12:21:20 +01:00
amitdo
6be9d7a5f8 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00
zdenop
1826ac140b Merge pull request #198 from egorpugin/master
[ci] Switch to leptonica 1.73. Better leptonica search on *nix systems.
2016-01-26 13:57:41 +01:00
Egor Pugin
b48abd8e17 Improve leptonica search. 2016-01-26 14:52:18 +03:00
Egor Pugin
0970227ca7 Update CMakeLists.txt 2016-01-26 14:28:41 +03:00
Egor Pugin
354526e85e Merge branch 'master' of github.com-egorpugin:egorpugin/tesseract 2016-01-26 14:21:29 +03:00
Egor Pugin
ddcac38ffc Update appveyor.yml 2016-01-26 14:15:37 +03:00
Egor Pugin
ef32ec9c68 Update .travis.yml 2016-01-26 14:15:17 +03:00
Egor Pugin
94be926be0 Update leptonica version. 2016-01-26 14:14:48 +03:00
Egor Pugin
b9a6aa823b Update CMakeLists.txt 2016-01-26 14:04:29 +03:00
Egor Pugin
d855a9d611 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2016-01-26 13:47:16 +03:00
Egor Pugin
9bfa7643b4 Update .travis.yml 2016-01-26 13:42:59 +03:00
Egor Pugin
2cf2cfcf99 Update CMakeLists.txt 2016-01-26 13:39:59 +03:00
Egor Pugin
74a72cd015 Update appveyor.yml 2016-01-26 12:44:52 +03:00
Egor Pugin
dac1bd4c9e Update .travis.yml 2016-01-26 12:44:36 +03:00
zdenop
516d58dc88 Merge pull request #189 from ryanfb/latin-language-specific
Use different font list and exposures for "lat" language training in language-specific.sh
2016-01-25 10:41:51 +01:00
zdenop
167565fdb3 Merge pull request #191 from amitdo/fix-184
Fix #184. Training should work now
2016-01-17 20:44:21 +01:00
amitdo
06fc0533c8 Fix #184. Training should work now 2016-01-17 14:27:35 +02:00
Egor Pugin
a3b175de7e Update appveyor.yml 2016-01-14 14:35:29 +03:00
Egor Pugin
c2e8dd0fc8 Update appveyor.yml 2016-01-14 14:28:46 +03:00
Egor Pugin
bbf25ee871 Update appveyor.yml 2016-01-14 14:23:12 +03:00
Egor Pugin
2da1fb1914 Test release build on windows. 2016-01-14 14:02:48 +03:00
Egor Pugin
fceb3abc1f Update ci scripts. 2016-01-14 14:01:55 +03:00
Ryan Baumann
bd5452d40c Add Junicode to neo-Latin fonts 2016-01-13 10:15:57 -05:00
zdenop
6f6953a972 Merge pull request #180 from stweil/master
Remove unneeded definition for NULL
2016-01-05 17:22:57 +01:00
Zdenko Podobný
1db94823a9 Add info for progress monitor, make it visible in doxygen doc; remove commented code 2016-01-05 17:21:53 +01:00
zdenop
c53add706e Merge pull request #27 from tesseract-ocr/monitor
Monitor
2016-01-05 16:28:42 +01:00
Ryan Baumann
5b40277d08 Use different font list and exposures for "lat" language training 2016-01-04 11:48:02 -05:00
zdenop
add1ed1067 Merge pull request #179 from hamidsafdari/master
correct minor syntax errors in language-specific.sh
2015-12-25 21:43:03 +01:00
Stefan Weil
7334572c4c Remove unneeded definition for NULL
NULL is already defined in stddef.h,
so a local definition is not be needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-12-25 12:25:54 +01:00
Hamid Safdari
0cd6e17419 correct minor syntax errors language-specific.sh 2015-12-25 09:50:15 +04:30