Nick White
d71133a769
Use ocrx_cinfo to hold character box and confidence information
...
With hocr_char_boxes enabled in hocr output, each grapheme now gets
its own span tag, which holds the character confidence and box
coordinates. Using x_bboxes at the ocrx_word level was
inappropriate, as it was impossible to find which grapheme was
represented by each bounding box.
2016-05-06 13:06:46 +01:00
Nick White
06b7a7b188
Add option to include character bounding boxes in hocr output
...
Add the 'hocr_char_boxes' configuration option (off by default),
which enables printing the bounding boxes of each character in the
x_bboxes property of an ocrx_word element in hocr output.
2016-04-29 15:37:46 +01:00
Philip Rinn
7461b61743
Fix ABI break introduced in 3.04.00, fixes #254
2016-03-08 11:35:24 +01:00
Zdenko Podobný
b2262750eb
solve segfault for box.train; fixes #57
2016-03-04 23:04:55 +01:00
Tom Morris
6700edd8bc
Cleanup TSV renderer
...
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce
Avoids HTML escaping.
2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b
Render output in TSV format.
2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
d04e3259af
Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int).
2016-03-01 12:13:42 -05:00
Tom Morris
6c44775d8a
Emit fewer "lang" attributes
...
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046
Only generate dir for HOCR when needed - fixes #208
...
Takes advantage of inheritance and dir="ltr" default to:
- only generate paragraph dirs which are not ltr
- only generate word dirs which don't match enclosing paragraph
Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa
Fix varsize array for Microsoft compiler
2016-02-17 10:20:18 -05:00
Tom Morris
431786276c
INCOMPATIBLE fix to hOCR line height information - fixes #225 .
...
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).
This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
zdenop
c53add706e
Merge pull request #27 from tesseract-ocr/monitor
...
Monitor
2016-01-05 16:28:42 +01:00
Stefan Weil
3272b62201
Don't use NULL for integer arguments
...
This fixes compiler warnings:
api/baseapi.cpp:1422:49: warning:
passing NULL to non-pointer argument 6 of
'int MultiByteToWideChar(UINT, DWORD, LPCCH, int, LPWSTR, int)'
[-Wconversion-null]
api/baseapi.cpp:1427:54:
warning: passing NULL to non-pointer argument 6 of
'int WideCharToMultiByte(UINT, DWORD, LPCWCH, int, LPSTR, int, LPCCH, LPBOOL)'
[-Wconversion-null]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:38:01 +01:00
amitdo
6bbcb50dd9
Added osd renderer for psm 0.
...
Works for single page and multi-page.
2015-10-30 20:09:00 +02:00
Stefan Weil
11b2a4d9af
api: Fix typos in comments (all found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:54:27 +02:00
Zdenko Podobný
67ede37b50
Fixes #74 NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi
2015-08-09 18:09:30 +02:00
Zdenko Podobný
41478fd5a1
implement build without cube (-DNO_CUBE_BUILD)
2015-07-24 11:51:44 +02:00
artem
2b6801eddb
Fix null pointer dereference when writing font name into HOCR.
2015-07-19 22:05:02 +02:00
Ray Smith
b1d99dfe23
Added a backup adaptive classifier to take over from primary when it fills on a large document
2015-06-12 11:10:53 -07:00
Zdenko Podobný
438edd6c7b
added row attributes to hocr output
2015-05-17 22:13:59 +02:00
Zdenko Podobný
ed6ae9b974
Add monitor to GetHOCRText
2015-05-17 21:55:50 +02:00
Zdenko Podobný
59bcbc79b3
fix GIT_VER info in VS2010
2015-05-15 15:14:49 +02:00
Zdenko Podobný
035b324f0f
reflect the latest commits in VS2010 build
2015-05-14 10:52:54 +02:00
Jim O'Regan
b13691fda0
Merge conflict: going with Ray's version
2015-05-13 08:54:28 +01:00
Ray Smith
4a3caefd92
Add ability to build under android (without cube or scrollview).
2015-05-12 15:41:15 -07:00
Ray Smith
53fc4456cc
Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify.
...
Eliminated the flexfx scheme for calling global feature extractor functions
through an array of function pointers.
Deleted dead code I found as a by-product.
This CL does not change BlobToTrainingSample or ExtractFeatures to be full
members of Classify (the eventual goal) as that would make it even bigger,
since there are a lot of callers to these functions.
When ExtractFeatures and BlobToTrainingSample are members of Classify they
will be able to access control parameters in Classify, which will greatly
simplify developing variations to the feature extraction process.
2015-05-12 15:22:34 -07:00
Zdenko Podobný
d508751e58
Fixed issue 1317 - git revision info used as version info for autotools & DEBUG
2015-05-02 12:15:13 +02:00
Zdenko Podobný
09b0c91fc9
fix Issue 1398
2015-02-06 23:44:58 +01:00
Ray Smith
648e7ca311
Merge branch 'master' of https://code.google.com/p/tesseract-ocr
...
Usual git need to merge if local is out of date.
2014-09-17 18:10:17 -07:00
Ray Smith
0256529c1f
Fixed issue 1243
2014-09-17 18:09:45 -07:00
Jim O'Regan
c0c719306a
update docs for TessBaseAPI::SetProbabilityInContextFunc based on Ray's email today
2014-09-09 20:37:27 +01:00
Ray Smith
cd2653c167
Cleanup from previous changes
2014-08-12 16:12:46 -07:00
theraysmith@gmail.com
dbf6197471
Major refactor of control.cpp to enable line recognition
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
zdenop
1156098567
Add font info to hocr output - fix issue 1219
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-03 16:22:12 +00:00
zdenop
95b7783a95
fix issue 1228: bilevel pdf output - horizontal/vertical lines removed
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1118 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-23 21:04:37 +00:00
zdenop
905e6162b9
put info about (API) version; fix typo
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1117 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-22 18:31:42 +00:00
zdenop
fad9de4e1b
fix issue 1217: GetThresholdedImage accesses possibly NULL thresholder_
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1113 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 21:21:37 +00:00
zdenop
36f3f76d64
fix tiff issue on windows
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1111 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 07:27:54 +00:00
zdenop@gmail.com
84cdcb32cc
fixed windows build
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-26 06:48:58 +00:00
zdenop
ffe52737d5
check if input file exists
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1108 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-25 19:58:00 +00:00
theraysmith@gmail.com
25a8c7b720
Enabled streaming input and output of multi-page documents
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1105 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:46:21 +00:00
zdenop
44b0d0e28e
addition to r1100
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1101 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:24:54 +00:00
zdenop
6051e40212
fix issue 1197
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1100 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:20:38 +00:00
zdenop
bdb912c186
escape input_file name in hOCR output - fix issue 1154
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1098 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-09 22:19:30 +00:00
theraysmith@gmail.com
45e106820f
Fixed issue 1116
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1074 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 00:50:27 +00:00
theraysmith@gmail.com
2fcea93846
Fixed issues 1081-1090
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
theraysmith@gmail.com
d11dc049e3
Fixed a lot of compiler/clang warnings
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
1a487252f4
Fixed slow-down that was caused by upping MAX_NUM_CLASSES
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1013 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:12:35 +00:00
zdenop@gmail.com
71ae509354
fix for mingw32/g++ 4.8.1
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@998 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-22 08:10:15 +00:00