Tom Morris
fc80ceafb9
Fix hocrtsv references in Makefile
2016-03-02 10:46:52 -05:00
Tom Morris
6700edd8bc
Cleanup TSV renderer
...
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce
Avoids HTML escaping.
2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b
Render output in TSV format.
2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
738fe4f757
Adds BoolParam tessedit_create_hocrtsv in class Tesseract.
2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
937ceb2d1b
Adds hocrtsv to tessdata/configs/Makefile.am
2016-03-01 12:25:15 -05:00
Sundar M. Vaidya
3163b38151
Adds hocrtsv file to configs folder.
2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
59d593d796
Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true.
2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
4d13892f5b
Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format.
2016-03-01 12:13:42 -05:00
Sundar M. Vaidya
d04e3259af
Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int).
2016-03-01 12:13:42 -05:00
zdenop
2597296b69
Merge pull request #228 from tfmorris/ltr-test-files
...
Add LTR & mixed direction test files
2016-02-29 15:03:48 +01:00
zdenop
c35a36cb83
Merge pull request #229 from amitdo/amitdo-readme-new-release
...
Update README.md
2016-02-18 08:48:00 +01:00
zdenop
b0aaade8ae
Merge pull request #230 from stweil/master
...
Fix compiler warning (signed / unsigned mismatch)
2016-02-18 08:37:28 +01:00
Stefan Weil
8a04050df3
Fix compiler warning (signed / unsigned mismatch)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-02-18 07:38:37 +01:00
Amit Dovev
b26d328e33
Update README.md
2016-02-18 00:16:29 +02:00
Egor Pugin
f4366c1f5a
Merge pull request #89 from ceisserer/master
...
Initialize output parameters of word_char_quality() to zero before early exit
2016-02-17 22:26:36 +03:00
zdenop
8c7d158977
Merge pull request #227 from tfmorris/hocr-both
...
hOCR improvements
2016-02-17 19:49:26 +01:00
Tom Morris
acdbcecd3c
Add LTR & mixed direction test files
2016-02-17 11:40:31 -05:00
Tom Morris
6c44775d8a
Emit fewer "lang" attributes
...
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046
Only generate dir for HOCR when needed - fixes #208
...
Takes advantage of inheritance and dir="ltr" default to:
- only generate paragraph dirs which are not ltr
- only generate word dirs which don't match enclosing paragraph
Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa
Fix varsize array for Microsoft compiler
2016-02-17 10:20:18 -05:00
zdenop
e028274ae6
Merge pull request #226 from tfmorris/issue225
...
INCOMPATIBLE fix to hOCR line height information - fixes #225 .
2016-02-16 21:15:47 +01:00
Tom Morris
431786276c
INCOMPATIBLE fix to hOCR line height information - fixes #225 .
...
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).
This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
Egor Pugin
3422059e09
Merge pull request #222 from tfmorris/hocr-config
...
Document hocr_font_info in config
2016-02-15 13:48:32 +03:00
Tom Morris
e3e1fe0e20
Document hocr_font_info in config
2016-02-14 16:49:00 -05:00
zdenop
640a98f24b
Merge pull request #211 from amitdo/amitdo-readme-update1
...
Update README.md
2016-02-12 09:46:37 +01:00
zdenop
4393d040bd
Merge pull request #220 from jbarlow83/master
...
Replace pdf.ttf with sharp2.ttf, keep name the same
2016-02-12 09:46:20 +01:00
James R. Barlow
b30930b95a
Replace pdf.ttf with sharp2.ttf, keep name the same
...
As discussed at length in issue #182 , the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.
With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince. It does
seem to fix Kindle and OS X Preview.
2016-02-11 15:44:11 -08:00
Amit Dovev
a67278f61a
Update README.md
2016-02-09 23:30:46 +02:00
Amit Dovev
cc88f3509b
Update README.md
2016-02-09 16:42:12 +02:00
Amit Dovev
7a90446a0b
Update README.md
2016-02-06 15:04:27 +02:00
Egor Pugin
b68be44265
Merge branch 'master' of github.com:tesseract-ocr/tesseract
2016-02-04 13:44:22 +03:00
Egor Pugin
7b94871ba2
Add more include directories.
2016-02-04 13:44:07 +03:00
zdenop
ec44221e33
Merge pull request #205 from devurandom/fix/leptonica-1.73-compat
...
Compatibility with Leptonica 1.73
2016-01-31 21:42:05 +01:00
zdenop
cd3ea0760b
Merge pull request #206 from amitdo/fix-box-training
...
Fix #64 . Make box training work
2016-01-31 14:13:58 +01:00
Dennis Schridde
6072814fea
Compatibility with Leptonica 1.73
...
http://www.leptonica.org/source/version-notes.html :
Naming changes (to avoid collisions):
#defines MALLOC --> LEPT_MALLOC, CALLOC --> LEPT_CALLOC, etc.
ByteBuffer --> L_ByteBuffer
Introduction of the TESSERACT_LIBLEPT_PREREQ macro allows backward compatibility with Leptonica <1.73.
2016-01-31 12:21:20 +01:00
amitdo
6be9d7a5f8
Fix #64 . Make box training work
...
This commit is better than 06fc0533c
. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00
zdenop
1826ac140b
Merge pull request #198 from egorpugin/master
...
[ci] Switch to leptonica 1.73. Better leptonica search on *nix systems.
2016-01-26 13:57:41 +01:00
Egor Pugin
b48abd8e17
Improve leptonica search.
2016-01-26 14:52:18 +03:00
Egor Pugin
0970227ca7
Update CMakeLists.txt
2016-01-26 14:28:41 +03:00
Egor Pugin
354526e85e
Merge branch 'master' of github.com-egorpugin:egorpugin/tesseract
2016-01-26 14:21:29 +03:00
Egor Pugin
ddcac38ffc
Update appveyor.yml
2016-01-26 14:15:37 +03:00
Egor Pugin
ef32ec9c68
Update .travis.yml
2016-01-26 14:15:17 +03:00
Egor Pugin
94be926be0
Update leptonica version.
2016-01-26 14:14:48 +03:00
Egor Pugin
b9a6aa823b
Update CMakeLists.txt
2016-01-26 14:04:29 +03:00
Egor Pugin
d855a9d611
Merge branch 'master' of github.com:tesseract-ocr/tesseract
2016-01-26 13:47:16 +03:00
Egor Pugin
9bfa7643b4
Update .travis.yml
2016-01-26 13:42:59 +03:00
Egor Pugin
2cf2cfcf99
Update CMakeLists.txt
2016-01-26 13:39:59 +03:00
Egor Pugin
74a72cd015
Update appveyor.yml
2016-01-26 12:44:52 +03:00
Egor Pugin
dac1bd4c9e
Update .travis.yml
2016-01-26 12:44:36 +03:00