Commit Graph

124 Commits

Author SHA1 Message Date
James R. Barlow
bc95798e01 Implement a new orientation and script detection API for C and C++
See issue #424.

The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class.  Generally it won't
be possible for a regular C compiler to do this properly.

It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.

This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'.  Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
2016-12-07 13:21:05 -08:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
ea786e25a4 api/baseapi: Fix memory leaks at program termination
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 19:11:10 +02:00
Zdenko Podobný
54fafc4e2e improve multipage tiff processing (jbreiden patch from 2016-03-29) 2016-10-06 11:13:42 +02:00
Stefan Weil
db2a8e9f85 api: Remove unused constant kBytesPerBlob
This fixes a compiler warning:

api/baseapi.cpp:1743:11: warning:
 unused variable 'kBytesPerBlob' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
caffb3133b Remove unneeded 'struct' from TessBaseAPI::GetHOCRText (issue #414)
It conflicts with a previous 'class' declaration for ETEXT_DESC:

include/tesseract/baseapi.h:594:21:
 Struct 'ETEXT_DESC' was previously declared as a class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-05 13:17:13 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00
Philip Rinn
7461b61743 Fix ABI break introduced in 3.04.00, fixes #254 2016-03-08 11:35:24 +01:00
Zdenko Podobný
b2262750eb solve segfault for box.train; fixes #57 2016-03-04 23:04:55 +01:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce Avoids HTML escaping. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b Render output in TSV format. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
d04e3259af Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int). 2016-03-01 12:13:42 -05:00
Tom Morris
6c44775d8a Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa Fix varsize array for Microsoft compiler 2016-02-17 10:20:18 -05:00
Tom Morris
431786276c INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I 
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
zdenop
c53add706e Merge pull request #27 from tesseract-ocr/monitor
Monitor
2016-01-05 16:28:42 +01:00
Stefan Weil
3272b62201 Don't use NULL for integer arguments
This fixes compiler warnings:

api/baseapi.cpp:1422:49: warning:
 passing NULL to non-pointer argument 6 of
 'int MultiByteToWideChar(UINT, DWORD, LPCCH, int, LPWSTR, int)'
 [-Wconversion-null]
api/baseapi.cpp:1427:54:
 warning: passing NULL to non-pointer argument 6 of
 'int WideCharToMultiByte(UINT, DWORD, LPCWCH, int, LPSTR, int, LPCCH, LPBOOL)'
 [-Wconversion-null]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:38:01 +01:00
amitdo
6bbcb50dd9 Added osd renderer for psm 0.
Works for single page and multi-page.
2015-10-30 20:09:00 +02:00
Stefan Weil
11b2a4d9af api: Fix typos in comments (all found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:54:27 +02:00
Zdenko Podobný
67ede37b50 Fixes #74 NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi 2015-08-09 18:09:30 +02:00
Zdenko Podobný
41478fd5a1 implement build without cube (-DNO_CUBE_BUILD) 2015-07-24 11:51:44 +02:00
artem
2b6801eddb Fix null pointer dereference when writing font name into HOCR. 2015-07-19 22:05:02 +02:00
Ray Smith
b1d99dfe23 Added a backup adaptive classifier to take over from primary when it fills on a large document 2015-06-12 11:10:53 -07:00
Zdenko Podobný
438edd6c7b added row attributes to hocr output 2015-05-17 22:13:59 +02:00
Zdenko Podobný
ed6ae9b974 Add monitor to GetHOCRText 2015-05-17 21:55:50 +02:00
Zdenko Podobný
59bcbc79b3 fix GIT_VER info in VS2010 2015-05-15 15:14:49 +02:00
Zdenko Podobný
035b324f0f reflect the latest commits in VS2010 build 2015-05-14 10:52:54 +02:00
Jim O'Regan
b13691fda0 Merge conflict: going with Ray's version 2015-05-13 08:54:28 +01:00
Ray Smith
4a3caefd92 Add ability to build under android (without cube or scrollview). 2015-05-12 15:41:15 -07:00
Ray Smith
53fc4456cc Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify.
Eliminated the flexfx scheme for calling global feature extractor functions
through an array of function pointers.
Deleted dead code I found as a by-product.
This CL does not change BlobToTrainingSample or ExtractFeatures to be full
members of Classify (the eventual goal) as that would make it even bigger,
since there are a lot of callers to these functions.
When ExtractFeatures and BlobToTrainingSample are members of Classify they
will be able to access control parameters in Classify, which will greatly
simplify developing variations to the feature extraction process.
2015-05-12 15:22:34 -07:00
Zdenko Podobný
d508751e58 Fixed issue 1317 - git revision info used as version info for autotools & DEBUG 2015-05-02 12:15:13 +02:00
Zdenko Podobný
09b0c91fc9 fix Issue 1398 2015-02-06 23:44:58 +01:00
Ray Smith
648e7ca311 Merge branch 'master' of https://code.google.com/p/tesseract-ocr
Usual git need to merge if local is out of date.
2014-09-17 18:10:17 -07:00
Ray Smith
0256529c1f Fixed issue 1243 2014-09-17 18:09:45 -07:00
Jim O'Regan
c0c719306a update docs for TessBaseAPI::SetProbabilityInContextFunc based on Ray's email today 2014-09-09 20:37:27 +01:00
Ray Smith
cd2653c167 Cleanup from previous changes 2014-08-12 16:12:46 -07:00
theraysmith@gmail.com
dbf6197471 Major refactor of control.cpp to enable line recognition
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
zdenop
1156098567 Add font info to hocr output - fix issue 1219
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-03 16:22:12 +00:00
zdenop
95b7783a95 fix issue 1228: bilevel pdf output - horizontal/vertical lines removed
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1118 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-23 21:04:37 +00:00
zdenop
905e6162b9 put info about (API) version; fix typo
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1117 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-22 18:31:42 +00:00
zdenop
fad9de4e1b fix issue 1217: GetThresholdedImage accesses possibly NULL thresholder_
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1113 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 21:21:37 +00:00
zdenop
36f3f76d64 fix tiff issue on windows
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1111 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 07:27:54 +00:00
zdenop@gmail.com
84cdcb32cc fixed windows build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-26 06:48:58 +00:00
zdenop
ffe52737d5 check if input file exists
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1108 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-25 19:58:00 +00:00