Commit Graph

298 Commits

Author SHA1 Message Date
Stefan Weil
8af3629e9f openmp: Fix OpenMP support
* Add OPENMP_CXXFLAGS for ccmain.
* Replace OPENMP_CFLAGS by OPENMP_CXXFLAGS.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-11 22:43:37 +01:00
James R. Barlow
66c03c9166 Revise after code review 2016-12-11 22:26:05 +01:00
James R. Barlow
bdb690ba06 Implement a new orientation and script detection API for C and C++
See issue #424.

The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class.  Generally it won't
be possible for a regular C compiler to do this properly.

It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.

This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'.  Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
2016-12-11 22:25:59 +01:00
Zdenko Podobn??
59ba80bb3a More clang-tidy from previous commits
# Conflicts:
#	opencl/opencl_device_selection.h
#	opencl/openclwrapper.cpp
2016-12-08 15:50:22 +01:00
Jeff Breidenbach
d969ed1352 Produce warning for invalid resolution. Fix #453 2016-12-07 22:03:28 +01:00
Stefan Weil
f29abea160 tesseract: Disable Leptonica messages
Disable debugging and informational messages from Leptonica
for release builds.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 21:55:23 +01:00
Stefan Weil
4535d24d13 Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:18:51 +01:00
Stefan Weil
6933b0618c Change tesseract parameter -psm to --psm
For compatibility reasons the old variant is still supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:12:45 +01:00
Stefan Weil
9984077798 Change tesseract parameter -oem to --oem
It was introduced recently in commit f24ef67d, so there is no need
to support the old variant for compatibility reasons.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:12:37 +01:00
Stefan Weil
4789ca2ab8 Simplify new operations
It is not necessary to check for null pointers after new.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:10:10 +01:00
Stefan Weil
743eb8104a Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 15:59:11 +01:00
Zdenko Podobný
fc3d07b44f backport from 4.00: api changes 2016-12-02 21:01:17 +01:00
Zdenko Podobný
775a108dc7 backport from 4.00: enable selection of OCR engine mode from command line 2016-12-02 19:50:54 +01:00
Zdenko Podobn??
ed0c60bc65 backport from 4.00: use ".empty()" instead of ".size() > 0" 2016-11-30 11:33:35 +01:00
Zdenko Podobný
c8e2be63d0 backport from 4.00: fix pdfrenderer 2016-11-29 11:21:21 +01:00
Zdenko Podobný
d01dd0bdd4 backport from 4.00: show PSM 11-13 2016-11-29 11:18:52 +01:00
Zdenko Podobný
7169545a86 fix code style 2016-11-29 11:16:10 +01:00
Zdenko Podobný
90651e111f backport style changes from 4.00 for better identification of fixes and new code 2016-11-25 15:14:46 +01:00
Stefan Weil
ea786e25a4 api/baseapi: Fix memory leaks at program termination
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 19:11:10 +02:00
Stefan Weil
f1d3a3b7c3 api/tesseractmain: Fix memory leak caused by exit()
When exit() is called from ParseArgs(), no destructors are executed
for the auto variables vars_vec and vars_values.

Making both variables static fixes the memory leaks, because now the
destructors are always executed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 20:20:24 +02:00
Zdenko Podobný
54fafc4e2e improve multipage tiff processing (jbreiden patch from 2016-03-29) 2016-10-06 11:13:42 +02:00
Stefan Weil
db2a8e9f85 api: Remove unused constant kBytesPerBlob
This fixes a compiler warning:

api/baseapi.cpp:1743:11: warning:
 unused variable 'kBytesPerBlob' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
caffb3133b Remove unneeded 'struct' from TessBaseAPI::GetHOCRText (issue #414)
It conflicts with a previous 'class' declaration for ETEXT_DESC:

include/tesseract/baseapi.h:594:21:
 Struct 'ETEXT_DESC' was previously declared as a class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-05 13:17:13 +02:00
Zdenko Podobný
5610738be9 fix #369 - pdf output with transparent background image 2016-08-05 22:37:58 +02:00
Stefan Weil
75fdc086ec win32: Check whether tiffio.h is available
The previous commit added a dependency on tiffio.h, so enable the new
code only if that file is available.

The code which conditionally defines HAVE_TIFFIO_H was already there
although that macro was unused up to now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-17 12:07:58 +02:00
Stefan Weil
896e80d9a7 win32: Show TIFF warnings on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-16 10:45:18 +02:00
zdenop
647b88daf0 Merge pull request #359 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation (addition to #358)
2016-06-27 22:19:22 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
zdenop
828f8528a8 Merge pull request #358 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation
2016-06-27 09:09:12 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00
Marco Atzeri
b1c921b59e Fix Cygwin compatibility 2016-06-17 15:52:01 +03:00
Stefan Weil
e59be55bcc Print list of languages to stdout instead to stderr
It is common practice for command line programs to print
user requested information on stdout.

This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
7e98c33432 Print help text to stdout instead to stderr
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Zdenko Podobný
66f37f0cd3 add copyright to renderer.cpp and pdfr.cpp 2016-03-18 19:43:45 +01:00
Zdenko Podobný
75e27414eb add copyright to C-API 2016-03-18 19:17:09 +01:00
Stefan Weil
076f21c1f2 Print version to stdout instead to stderr
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.

Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-16 12:10:27 +01:00
Philip Rinn
7461b61743 Fix ABI break introduced in 3.04.00, fixes #254 2016-03-08 11:35:24 +01:00
amitdo
bf5345f6a1 Don't display tesseract's banner when quiet mode is active 2016-03-07 19:25:09 +02:00
Zdenko Podobný
b2262750eb solve segfault for box.train; fixes #57 2016-03-04 23:04:55 +01:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce Avoids HTML escaping. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b Render output in TSV format. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
59d593d796 Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. 2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
4d13892f5b Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format. 2016-03-01 12:13:42 -05:00
Sundar M. Vaidya
d04e3259af Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int). 2016-03-01 12:13:42 -05:00
Tom Morris
6c44775d8a Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa Fix varsize array for Microsoft compiler 2016-02-17 10:20:18 -05:00
Tom Morris
431786276c INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I 
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
amitdo
6be9d7a5f8 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00