Commit Graph

306 Commits

Author SHA1 Message Date
Stefan Weil
f52d445074 Update Leptonica configuration
This synchronizes the code with the master branch.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-22 19:26:18 +02:00
James R. Barlow
4c044eb361 [3.05] Fix #786 - 3.05 linkage fails on macOS Sierra with --enable-opencl 2017-04-10 14:42:06 -07:00
Zdenko Podobný
f30cac479d libtiff is needed for windows build of tesseract executable 2017-03-17 20:44:37 +01:00
Zdenko Podobný
697c3dc4da Fix indentation after conditional [-Wmisleading-indentation]
The indentation is wrong since commit
fd0683f9e0 and results in a gcc warning:

api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)':
api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
     if (tessedit_page_number >= 0)
     ^~
api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
       pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset)
       ^~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-10 11:24:53 +01:00
Zdenko Podobný
998d4735d0 3.05.00 release 2017-02-16 18:59:32 +01:00
Jeff Breidenbach
d500231f06 fix #665 process file list 2017-01-19 15:18:52 +01:00
Zdenko Podobný
3df54a4318 remove (fake) OPENMP support 2016-12-26 13:44:20 +01:00
Zdenko Podobný
245eebdf29 Multi-page TIFF buffering is broken - fix #233 2016-12-26 12:11:25 +01:00
Stefan Weil
8af3629e9f openmp: Fix OpenMP support
* Add OPENMP_CXXFLAGS for ccmain.
* Replace OPENMP_CFLAGS by OPENMP_CXXFLAGS.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-11 22:43:37 +01:00
James R. Barlow
66c03c9166 Revise after code review 2016-12-11 22:26:05 +01:00
James R. Barlow
bdb690ba06 Implement a new orientation and script detection API for C and C++
See issue #424.

The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class.  Generally it won't
be possible for a regular C compiler to do this properly.

It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.

This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'.  Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
2016-12-11 22:25:59 +01:00
Zdenko Podobn??
59ba80bb3a More clang-tidy from previous commits
# Conflicts:
#	opencl/opencl_device_selection.h
#	opencl/openclwrapper.cpp
2016-12-08 15:50:22 +01:00
Jeff Breidenbach
d969ed1352 Produce warning for invalid resolution. Fix #453 2016-12-07 22:03:28 +01:00
Stefan Weil
f29abea160 tesseract: Disable Leptonica messages
Disable debugging and informational messages from Leptonica
for release builds.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 21:55:23 +01:00
Stefan Weil
4535d24d13 Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:18:51 +01:00
Stefan Weil
6933b0618c Change tesseract parameter -psm to --psm
For compatibility reasons the old variant is still supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:12:45 +01:00
Stefan Weil
9984077798 Change tesseract parameter -oem to --oem
It was introduced recently in commit f24ef67d, so there is no need
to support the old variant for compatibility reasons.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:12:37 +01:00
Stefan Weil
4789ca2ab8 Simplify new operations
It is not necessary to check for null pointers after new.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 17:10:10 +01:00
Stefan Weil
743eb8104a Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 15:59:11 +01:00
Zdenko Podobný
fc3d07b44f backport from 4.00: api changes 2016-12-02 21:01:17 +01:00
Zdenko Podobný
775a108dc7 backport from 4.00: enable selection of OCR engine mode from command line 2016-12-02 19:50:54 +01:00
Zdenko Podobn??
ed0c60bc65 backport from 4.00: use ".empty()" instead of ".size() > 0" 2016-11-30 11:33:35 +01:00
Zdenko Podobný
c8e2be63d0 backport from 4.00: fix pdfrenderer 2016-11-29 11:21:21 +01:00
Zdenko Podobný
d01dd0bdd4 backport from 4.00: show PSM 11-13 2016-11-29 11:18:52 +01:00
Zdenko Podobný
7169545a86 fix code style 2016-11-29 11:16:10 +01:00
Zdenko Podobný
90651e111f backport style changes from 4.00 for better identification of fixes and new code 2016-11-25 15:14:46 +01:00
Stefan Weil
ea786e25a4 api/baseapi: Fix memory leaks at program termination
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 19:11:10 +02:00
Stefan Weil
f1d3a3b7c3 api/tesseractmain: Fix memory leak caused by exit()
When exit() is called from ParseArgs(), no destructors are executed
for the auto variables vars_vec and vars_values.

Making both variables static fixes the memory leaks, because now the
destructors are always executed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 20:20:24 +02:00
Zdenko Podobný
54fafc4e2e improve multipage tiff processing (jbreiden patch from 2016-03-29) 2016-10-06 11:13:42 +02:00
Stefan Weil
db2a8e9f85 api: Remove unused constant kBytesPerBlob
This fixes a compiler warning:

api/baseapi.cpp:1743:11: warning:
 unused variable 'kBytesPerBlob' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
caffb3133b Remove unneeded 'struct' from TessBaseAPI::GetHOCRText (issue #414)
It conflicts with a previous 'class' declaration for ETEXT_DESC:

include/tesseract/baseapi.h:594:21:
 Struct 'ETEXT_DESC' was previously declared as a class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-05 13:17:13 +02:00
Zdenko Podobný
5610738be9 fix #369 - pdf output with transparent background image 2016-08-05 22:37:58 +02:00
Stefan Weil
75fdc086ec win32: Check whether tiffio.h is available
The previous commit added a dependency on tiffio.h, so enable the new
code only if that file is available.

The code which conditionally defines HAVE_TIFFIO_H was already there
although that macro was unused up to now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-17 12:07:58 +02:00
Stefan Weil
896e80d9a7 win32: Show TIFF warnings on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-16 10:45:18 +02:00
zdenop
647b88daf0 Merge pull request #359 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation (addition to #358)
2016-06-27 22:19:22 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
zdenop
828f8528a8 Merge pull request #358 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation
2016-06-27 09:09:12 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00
Marco Atzeri
b1c921b59e Fix Cygwin compatibility 2016-06-17 15:52:01 +03:00
Stefan Weil
e59be55bcc Print list of languages to stdout instead to stderr
It is common practice for command line programs to print
user requested information on stdout.

This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
7e98c33432 Print help text to stdout instead to stderr
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Zdenko Podobný
66f37f0cd3 add copyright to renderer.cpp and pdfr.cpp 2016-03-18 19:43:45 +01:00
Zdenko Podobný
75e27414eb add copyright to C-API 2016-03-18 19:17:09 +01:00
Stefan Weil
076f21c1f2 Print version to stdout instead to stderr
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.

Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-16 12:10:27 +01:00
Philip Rinn
7461b61743 Fix ABI break introduced in 3.04.00, fixes #254 2016-03-08 11:35:24 +01:00
amitdo
bf5345f6a1 Don't display tesseract's banner when quiet mode is active 2016-03-07 19:25:09 +02:00
Zdenko Podobný
b2262750eb solve segfault for box.train; fixes #57 2016-03-04 23:04:55 +01:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce Avoids HTML escaping. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b Render output in TSV format. 2016-03-01 12:30:39 -05:00