Commit Graph

1522 Commits

Author SHA1 Message Date
Tom Morris
fc80ceafb9 Fix hocrtsv references in Makefile 2016-03-02 10:46:52 -05:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce Avoids HTML escaping. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b Render output in TSV format. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
738fe4f757 Adds BoolParam tessedit_create_hocrtsv in class Tesseract. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
937ceb2d1b Adds hocrtsv to tessdata/configs/Makefile.am 2016-03-01 12:25:15 -05:00
Sundar M. Vaidya
3163b38151 Adds hocrtsv file to configs folder. 2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
59d593d796 Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. 2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
4d13892f5b Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format. 2016-03-01 12:13:42 -05:00
Sundar M. Vaidya
d04e3259af Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int). 2016-03-01 12:13:42 -05:00
zdenop
2597296b69 Merge pull request #228 from tfmorris/ltr-test-files
Add LTR & mixed direction test files
2016-02-29 15:03:48 +01:00
zdenop
c35a36cb83 Merge pull request #229 from amitdo/amitdo-readme-new-release
Update README.md
2016-02-18 08:48:00 +01:00
zdenop
b0aaade8ae Merge pull request #230 from stweil/master
Fix compiler warning (signed / unsigned mismatch)
2016-02-18 08:37:28 +01:00
Stefan Weil
8a04050df3 Fix compiler warning (signed / unsigned mismatch)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-18 07:38:37 +01:00
Amit Dovev
b26d328e33 Update README.md 2016-02-18 00:16:29 +02:00
Egor Pugin
f4366c1f5a Merge pull request #89 from ceisserer/master
Initialize output parameters of word_char_quality() to zero before early exit
2016-02-17 22:26:36 +03:00
zdenop
8c7d158977 Merge pull request #227 from tfmorris/hocr-both
hOCR improvements
2016-02-17 19:49:26 +01:00
Tom Morris
acdbcecd3c Add LTR & mixed direction test files 2016-02-17 11:40:31 -05:00
Tom Morris
6c44775d8a Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa Fix varsize array for Microsoft compiler 2016-02-17 10:20:18 -05:00
zdenop
e028274ae6 Merge pull request #226 from tfmorris/issue225
INCOMPATIBLE fix to hOCR line height information - fixes #225.
2016-02-16 21:15:47 +01:00
Tom Morris
431786276c INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I 
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
Egor Pugin
3422059e09 Merge pull request #222 from tfmorris/hocr-config
Document hocr_font_info in config
2016-02-15 13:48:32 +03:00
Tom Morris
e3e1fe0e20 Document hocr_font_info in config 2016-02-14 16:49:00 -05:00
zdenop
640a98f24b Merge pull request #211 from amitdo/amitdo-readme-update1
Update README.md
2016-02-12 09:46:37 +01:00
zdenop
4393d040bd Merge pull request #220 from jbarlow83/master
Replace pdf.ttf with sharp2.ttf, keep name the same
2016-02-12 09:46:20 +01:00
James R. Barlow
b30930b95a Replace pdf.ttf with sharp2.ttf, keep name the same
As discussed at length in issue #182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince.  It does
seem to fix Kindle and OS X Preview.
2016-02-11 15:44:11 -08:00
Amit Dovev
a67278f61a Update README.md 2016-02-09 23:30:46 +02:00
Amit Dovev
cc88f3509b Update README.md 2016-02-09 16:42:12 +02:00
Amit Dovev
7a90446a0b Update README.md 2016-02-06 15:04:27 +02:00
Egor Pugin
b68be44265 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2016-02-04 13:44:22 +03:00
Egor Pugin
7b94871ba2 Add more include directories. 2016-02-04 13:44:07 +03:00
zdenop
ec44221e33 Merge pull request #205 from devurandom/fix/leptonica-1.73-compat
Compatibility with Leptonica 1.73
2016-01-31 21:42:05 +01:00
zdenop
cd3ea0760b Merge pull request #206 from amitdo/fix-box-training
Fix #64. Make box training work
2016-01-31 14:13:58 +01:00
Dennis Schridde
6072814fea Compatibility with Leptonica 1.73
http://www.leptonica.org/source/version-notes.html:
       Naming changes (to avoid collisions):
         #defines MALLOC --> LEPT_MALLOC, CALLOC --> LEPT_CALLOC, etc.
         ByteBuffer --> L_ByteBuffer

Introduction of the TESSERACT_LIBLEPT_PREREQ macro allows backward compatibility with Leptonica <1.73.
2016-01-31 12:21:20 +01:00
amitdo
6be9d7a5f8 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00
zdenop
1826ac140b Merge pull request #198 from egorpugin/master
[ci] Switch to leptonica 1.73. Better leptonica search on *nix systems.
2016-01-26 13:57:41 +01:00
Egor Pugin
b48abd8e17 Improve leptonica search. 2016-01-26 14:52:18 +03:00
Egor Pugin
0970227ca7 Update CMakeLists.txt 2016-01-26 14:28:41 +03:00
Egor Pugin
354526e85e Merge branch 'master' of github.com-egorpugin:egorpugin/tesseract 2016-01-26 14:21:29 +03:00
Egor Pugin
ddcac38ffc Update appveyor.yml 2016-01-26 14:15:37 +03:00
Egor Pugin
ef32ec9c68 Update .travis.yml 2016-01-26 14:15:17 +03:00
Egor Pugin
94be926be0 Update leptonica version. 2016-01-26 14:14:48 +03:00
Egor Pugin
b9a6aa823b Update CMakeLists.txt 2016-01-26 14:04:29 +03:00
Egor Pugin
d855a9d611 Merge branch 'master' of github.com:tesseract-ocr/tesseract 2016-01-26 13:47:16 +03:00
Egor Pugin
9bfa7643b4 Update .travis.yml 2016-01-26 13:42:59 +03:00
Egor Pugin
2cf2cfcf99 Update CMakeLists.txt 2016-01-26 13:39:59 +03:00
Egor Pugin
74a72cd015 Update appveyor.yml 2016-01-26 12:44:52 +03:00
Egor Pugin
dac1bd4c9e Update .travis.yml 2016-01-26 12:44:36 +03:00