Commit Graph

395 Commits

Author SHA1 Message Date
Ray Smith
d55f462c9c More clang-tidy from previous commits 2016-12-06 13:45:49 -08:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
6140be6a55 openmp: Fix build with clang++ and compilers without OpenMP support
Builds without support for OpenMP failed with the old code. Fix this:

* Add OPENMP_CXXFLAGS for ccmain.
* Replace unconditional -fopenmp by OPENMP_CXXFLAGS for lstm.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 18:44:03 +01:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
cefc420ddb Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
zdenop
e4faf95586 Merge pull request #515 from stweil/parameters
Support standard parameter form --oem, --psm for tesseract executable
2016-12-01 20:37:30 +01:00
Stefan Weil
92d981b93a Change tesseract parameter -psm to --psm
For compatibility reasons the old variant is still supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 22:23:46 +01:00
Stefan Weil
d2f9264383 Change tesseract parameter -oem to --oem
It was introduced recently in commit f24ef67d, so there is no need
to support the old variant for compatibility reasons.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 22:23:46 +01:00
Stefan Weil
78d91701bd Simplify new operations
It is not necessary to check for null pointers after new.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 20:24:38 +01:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
ea786e25a4 api/baseapi: Fix memory leaks at program termination
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 19:11:10 +02:00
Stefan Weil
f1d3a3b7c3 api/tesseractmain: Fix memory leak caused by exit()
When exit() is called from ParseArgs(), no destructors are executed
for the auto variables vars_vec and vars_values.

Making both variables static fixes the memory leaks, because now the
destructors are always executed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 20:20:24 +02:00
Zdenko Podobný
54fafc4e2e improve multipage tiff processing (jbreiden patch from 2016-03-29) 2016-10-06 11:13:42 +02:00
Stefan Weil
db2a8e9f85 api: Remove unused constant kBytesPerBlob
This fixes a compiler warning:

api/baseapi.cpp:1743:11: warning:
 unused variable 'kBytesPerBlob' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
caffb3133b Remove unneeded 'struct' from TessBaseAPI::GetHOCRText (issue #414)
It conflicts with a previous 'class' declaration for ETEXT_DESC:

include/tesseract/baseapi.h:594:21:
 Struct 'ETEXT_DESC' was previously declared as a class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-05 13:17:13 +02:00
Zdenko Podobný
5610738be9 fix #369 - pdf output with transparent background image 2016-08-05 22:37:58 +02:00
Stefan Weil
75fdc086ec win32: Check whether tiffio.h is available
The previous commit added a dependency on tiffio.h, so enable the new
code only if that file is available.

The code which conditionally defines HAVE_TIFFIO_H was already there
although that macro was unused up to now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-17 12:07:58 +02:00
Stefan Weil
896e80d9a7 win32: Show TIFF warnings on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-16 10:45:18 +02:00
zdenop
647b88daf0 Merge pull request #359 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation (addition to #358)
2016-06-27 22:19:22 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
zdenop
828f8528a8 Merge pull request #358 from StefRe/tsv-fix
Fix TSV bounding box width/hight calculation
2016-06-27 09:09:12 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00
Marco Atzeri
b1c921b59e Fix Cygwin compatibility 2016-06-17 15:52:01 +03:00
Stefan Weil
e59be55bcc Print list of languages to stdout instead to stderr
It is common practice for command line programs to print
user requested information on stdout.

This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
7e98c33432 Print help text to stdout instead to stderr
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Zdenko Podobný
66f37f0cd3 add copyright to renderer.cpp and pdfr.cpp 2016-03-18 19:43:45 +01:00
Zdenko Podobný
75e27414eb add copyright to C-API 2016-03-18 19:17:09 +01:00
Stefan Weil
076f21c1f2 Print version to stdout instead to stderr
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.

Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-16 12:10:27 +01:00
Philip Rinn
7461b61743 Fix ABI break introduced in 3.04.00, fixes #254 2016-03-08 11:35:24 +01:00
amitdo
bf5345f6a1 Don't display tesseract's banner when quiet mode is active 2016-03-07 19:25:09 +02:00
Zdenko Podobný
b2262750eb solve segfault for box.train; fixes #57 2016-03-04 23:04:55 +01:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
858f4b75ce Avoids HTML escaping. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
b1e4a82b0b Render output in TSV format. 2016-03-01 12:30:39 -05:00
Sundar M. Vaidya
59d593d796 Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. 2016-03-01 12:23:12 -05:00
Sundar M. Vaidya
4d13892f5b Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format. 2016-03-01 12:13:42 -05:00
Sundar M. Vaidya
d04e3259af Adds char* GetHOCRTSVText(int) as placeholder. Copy of char* GetHOCRText(int). 2016-03-01 12:13:42 -05:00
Tom Morris
6c44775d8a Emit fewer "lang" attributes
Add "lang" attribute to paragraph markup and only include
word lang attribute if it's different from the paragraph's value.
2016-02-17 10:23:41 -05:00
Tom Morris
ea401c9046 Only generate dir for HOCR when needed - fixes #208
Takes advantage of inheritance and dir="ltr" default to:
 - only generate paragraph dirs which are not ltr
 - only generate word dirs which don't match enclosing paragraph

Tested against LTR, RTL, and mixed direction files. Files for the
latter two cases are in a separate commit on the ltr-test-files branch.
2016-02-17 10:23:41 -05:00
Tom Morris
809bbd9bfa Fix varsize array for Microsoft compiler 2016-02-17 10:20:18 -05:00
Tom Morris
431786276c INCOMPATIBLE fix to hOCR line height information - fixes #225.
This fixes the duplicate line IDs caused by inserting height information
into the middle of the ID and it moves the line height info into
the title attribute like everything else, rather than using non-standard
HTML attributes (which won't validate).

This change may break consumers of the HTML output, but 3.04 has only
been in the wild for 6 months and the current HTML is invalid, so I 
believe the benefit outweighs the cost for the fix.
2016-02-15 18:02:46 -05:00
amitdo
6be9d7a5f8 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00
amitdo
06fc0533c8 Fix #184. Training should work now 2016-01-17 14:27:35 +02:00
zdenop
c53add706e Merge pull request #27 from tesseract-ocr/monitor
Monitor
2016-01-05 16:28:42 +01:00
amitdo
a20156fc67 Add missing ')'_to make the code compile 2015-12-11 19:42:16 +02:00
amitdo
c2f5e9b849 If there is no explicit renderer(s), default to TessTextRenderer
Revert fd429c32, 43834da7, 05de195e.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
2015-12-11 19:06:49 +02:00
Stefan Weil
71c9e028f7 tesseractmain: Prettify help message
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.

* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
  (partially revert commit 99110df757).

In addition, whitespace characters at end of lines were removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-29 10:26:40 +01:00
zdenop
7cc7c6f9c2 Merge pull request #156 from stweil/master
pdfrenderer: Fix uninitialized local variables
2015-11-27 21:53:55 +01:00
amitdo
99110df757 tesseractmain.cpp: Split huge main() to sub functions
Add these functions to api/tesseractmain.cpp:
PrintVersionInfo()
PrintUsage()
PrintHelpForPSM()
PrintHelpMessage()
SetVariablesFromCLArgs()
PrintLangsList()
FixPageSegMode()
ParseArgs()
PreloadRenderers()
2015-11-26 11:36:16 +02:00
Stefan Weil
5ce88d7f49 pdfrenderer: Fix uninitialized local variables
Coverity bug reports:

CID 1270405: Uninitialized scalar variable
CID 1270408: Uninitialized scalar variable
CID 1270409: Uninitialized scalar variable
CID 1270410: Uninitialized scalar variable

Those variables are set conditionally in the while loop
and must keep their values in following iterations, so
they must be declared outside of the loop.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-25 22:24:06 +01:00
Stefan Weil
03f37c0cdc tesseractmain: Fix unterminated string
Coverity bug report: CID 1270421 "Buffer not null terminated".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-24 17:17:17 +01:00
Stefan Weil
997c4a6078 api: Fix printing of a size_t value
size_t is not always the same as long, especially not for 64 bit Windows:

api/pdfrenderer.cpp:549:31: warning:
 format '%ld' expects argument of type 'long int',
 but argument 4 has type 'size_t {aka long long unsigned int}' [-Wformat=]

size_t normally requires a format string "%zu", but this is unsupported
by Visual Studio, so use a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:39:35 +01:00
Stefan Weil
3272b62201 Don't use NULL for integer arguments
This fixes compiler warnings:

api/baseapi.cpp:1422:49: warning:
 passing NULL to non-pointer argument 6 of
 'int MultiByteToWideChar(UINT, DWORD, LPCCH, int, LPWSTR, int)'
 [-Wconversion-null]
api/baseapi.cpp:1427:54:
 warning: passing NULL to non-pointer argument 6 of
 'int WideCharToMultiByte(UINT, DWORD, LPCWCH, int, LPSTR, int, LPCCH, LPBOOL)'
 [-Wconversion-null]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:38:01 +01:00
Stefan Weil
edf765b952 Remove unneeded const qualifiers
This fixes compiler warnings like this one:

api/baseapi.h:739:32: warning:
 type qualifiers ignored on function return type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:36:42 +01:00
amitdo
6bbcb50dd9 Added osd renderer for psm 0.
Works for single page and multi-page.
2015-10-30 20:09:00 +02:00
amitdo
dcfdd5c035 OSD: Print script name instead of meaningless script id 2015-10-28 09:50:28 +02:00
Stefan Weil
11b2a4d9af api: Fix typos in comments (all found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:54:27 +02:00
James R. Barlow
18ac7ae7ef Get OpenCL to compile on OS X
However, the output of the OpenCL build is garbage....
2015-08-26 02:03:07 -07:00
Zdenko Podobný
bb19f2c16b Fixes #76 - enable OpenMP support 2015-08-14 21:39:40 +02:00
Robert Theis
aa6a0b12f9 Remove extraneous line feed 2015-08-12 18:02:35 -07:00
Zdenko Podobný
0337d898d4 fix bug in UTF-16BE conversion 2015-08-10 21:22:20 +02:00
Zdenko Podobný
545a0634da improve NO_CUBE_BUILD 2015-08-09 18:09:52 +02:00
Zdenko Podobný
67ede37b50 Fixes #74 NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi 2015-08-09 18:09:30 +02:00
Zdenko Podobný
628de5ba3f enable pdfrender with NO_CUBE_BUILD 2015-08-07 23:20:22 +02:00
Jeff Breidenbach
9dcf2c6aa8 replace CubeUtils::UTF8ToUTF32 in pdfrenderer 2015-08-07 22:18:33 +02:00
Zdenko Podobný
66a76a9477 Revert "temporary add config/*, configure and Makefile.in for release"
This reverts commits ec9581d8f2, 1afe382c4e, 4b2cfabcc1
2015-07-31 21:44:43 +02:00
Zdenko Podobný
41478fd5a1 implement build without cube (-DNO_CUBE_BUILD) 2015-07-24 11:51:44 +02:00
Zdenko Podobný
71e226c44f increase version number 2015-07-21 22:46:52 +02:00
zdenop
e4f4893fb8 Merge pull request #52 from unbe/null-pointer-access-in-hocr
Fix null pointer dereference when writing font name into HOCR.
2015-07-20 07:40:59 +02:00
artem
2b6801eddb Fix null pointer dereference when writing font name into HOCR. 2015-07-19 22:05:02 +02:00
unbe
67ffea8877 Update capi.cpp
Make TessDeleteResultRenderer use delete, not delete[]
2015-07-19 15:15:42 +02:00
Zdenko Podobný
ec9581d8f2 temporary add configure and Makefile.in for release 2015-07-11 09:42:43 +02:00
Ray Smith
a303ab9d00 Misc fixes, mostly clang formatting, but some bug fixes in matrix, werd, and tesstrain_utils. Also updates unicharset to match traineddata files. 2015-07-09 14:28:20 -07:00
Ray Smith
b1d99dfe23 Added a backup adaptive classifier to take over from primary when it fills on a large document 2015-06-12 11:10:53 -07:00
Ray Smith
ab0f4e2c38 Clang fixes to earlier changes and build compatability with Google environment 2015-06-12 10:53:21 -07:00
orbitcowboy
9328f0e5d4 Fix potential null pointer dereference in ccmain/paragraphs.cpp. 2015-05-19 10:17:44 +02:00
Jim O'Regan
4a6195202c fix typo 2015-05-18 12:32:36 +01:00
Zdenko Podobný
438edd6c7b added row attributes to hocr output 2015-05-17 22:13:59 +02:00
Zdenko Podobný
ed6ae9b974 Add monitor to GetHOCRText 2015-05-17 21:55:50 +02:00
Zdenko Podobný
59bcbc79b3 fix GIT_VER info in VS2010 2015-05-15 15:14:49 +02:00
Zdenko Podobný
e98849b482 rint error message when pdf.ttf is not found. 2015-05-15 15:14:00 +02:00
Zdenko Podobný
035b324f0f reflect the latest commits in VS2010 build 2015-05-14 10:52:54 +02:00
Jim O'Regan
b13691fda0 Merge conflict: going with Ray's version 2015-05-13 08:54:28 +01:00
Ray Smith
03f3c9dc88 Misc fixes missed from previous commits 2015-05-12 18:13:15 -07:00
Ray Smith
6b634170c1 Significant change to invisible font system
to improve correctness and compatibility with
external programs, particularly ghostscript.
We will start mapping everything to a single glyph,
rather than allowing characters to run off the end
of the font.

A more detailed design discussion is embedded into
pdfrenderer.cpp comments. The font, source code
that produces the font, and the design comments
were contributed by Ken Sharp from Artifex Software.
2015-05-12 17:33:18 -07:00
Ray Smith
4a3caefd92 Add ability to build under android (without cube or scrollview). 2015-05-12 15:41:15 -07:00
Ray Smith
53fc4456cc Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify.
Eliminated the flexfx scheme for calling global feature extractor functions
through an array of function pointers.
Deleted dead code I found as a by-product.
This CL does not change BlobToTrainingSample or ExtractFeatures to be full
members of Classify (the eventual goal) as that would make it even bigger,
since there are a lot of callers to these functions.
When ExtractFeatures and BlobToTrainingSample are members of Classify they
will be able to access control parameters in Classify, which will greatly
simplify developing variations to the feature extraction process.
2015-05-12 15:22:34 -07:00
Zdenko Podobný
d508751e58 Fixed issue 1317 - git revision info used as version info for autotools & DEBUG 2015-05-02 12:15:13 +02:00
Zdenko Podobný
4c7c960bfd fix issue 1417 2015-02-07 22:22:20 +01:00
Zdenko Podobný
09b0c91fc9 fix Issue 1398 2015-02-06 23:44:58 +01:00
Zdenko Podobný
e0441d0c6b fix typo/ issue 1397 2014-12-31 22:31:50 +01:00
Zdenko Podobný
473141c1de fix bool in c-api 2014-12-28 17:55:56 +01:00
Zdenko Podobný
4da712d04d Add paragraph info to C-API(fix issue 1388) 2014-12-07 14:07:14 +01:00
Zdenko Podobný
239f350a72 remove const from C API TessResultIteratorGetChoiceIterator (issue 1342) 2014-10-14 22:46:11 +02:00
Ray Smith
242b14ae7f Reduced size of multi-renderer implementation from code review 2014-10-09 13:29:46 -07:00
Ray Smith
d9699c4099 Fixed bidi handling in PDF output 2014-10-09 13:29:01 -07:00
Zdenko Podobný
d0cb1071b2 remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285) 2014-10-07 23:37:34 +02:00
Zdenko Podobný
4904afe65b fix issue 1300 - patch from #35 2014-10-06 22:43:56 +02:00
Zdenko Podobný
4c01561b0f fix issue 1300 - patch from #26 2014-10-02 21:19:17 +02:00
Zdenko Podobný
c0640a4bef fix cygwin build (issue 1289) 2014-09-28 23:19:52 +02:00
Zdenko Podobný
f8613fab22 fix issue 1300 /patches from breidenbach 2014-09-21 16:38:24 +02:00
Zdenko Podobný
9e8629d9ef allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ) 2014-09-19 23:33:47 +02:00
Ray Smith
648e7ca311 Merge branch 'master' of https://code.google.com/p/tesseract-ocr
Usual git need to merge if local is out of date.
2014-09-17 18:10:17 -07:00
Ray Smith
0256529c1f Fixed issue 1243 2014-09-17 18:09:45 -07:00
Jim O'Regan
c0c719306a update docs for TessBaseAPI::SetProbabilityInContextFunc based on Ray's email today 2014-09-09 20:37:27 +01:00
Zdenko Podobný
d1aa61c110 fix issue 1285: reimplement option to select pdf compression 2014-09-06 09:32:22 +02:00
Ray Smith
cd2653c167 Cleanup from previous changes 2014-08-12 16:12:46 -07:00
theraysmith@gmail.com
dbf6197471 Major refactor of control.cpp to enable line recognition
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
theraysmith@gmail.com
b64ad05096 Improved efficiency of image processing for PDF
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1141 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:15:25 +00:00
zdenop
bce2cd5f33 enable to select pdf compression type and jpeg quality (fix issue 1263)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1134 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-08 21:18:44 +00:00
zdenop
1156098567 Add font info to hocr output - fix issue 1219
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-03 16:22:12 +00:00
zdenop
5b779456f9 fix compatibility with leptonica 1.71 and 1.70
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1126 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-07-24 19:11:39 +00:00
zdenop
95b7783a95 fix issue 1228: bilevel pdf output - horizontal/vertical lines removed
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1118 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-23 21:04:37 +00:00
zdenop
905e6162b9 put info about (API) version; fix typo
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1117 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-22 18:31:42 +00:00
zdenop
fad9de4e1b fix issue 1217: GetThresholdedImage accesses possibly NULL thresholder_
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1113 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 21:21:37 +00:00
zdenop
e64f555567 fix Issue 1223: TessPolyBlockType enum is outdated in C-API
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1112 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 20:31:48 +00:00
zdenop
36f3f76d64 fix tiff issue on windows
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1111 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-31 07:27:54 +00:00
zdenop@gmail.com
84cdcb32cc fixed windows build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-26 06:48:58 +00:00
zdenop
19c4c2f0e7 fix C-API to resent C++ API changes - thanks to Nick White
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1109 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-25 21:03:11 +00:00
zdenop
ffe52737d5 check if input file exists
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1108 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-25 19:58:00 +00:00
theraysmith@gmail.com
25a8c7b720 Enabled streaming input and output of multi-page documents
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1105 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:46:21 +00:00
zdenop
979f9cafe5 Add word recognition language to C-API - fix issue 1200
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1102 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-16 18:35:54 +00:00
zdenop
44b0d0e28e addition to r1100
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1101 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:24:54 +00:00
zdenop
6051e40212 fix issue 1197
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1100 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 21:20:38 +00:00
zdenop
2e520f2fac fix hocr/pdf output when image is provided from stdin - issue 1196
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1099 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 15:59:47 +00:00
zdenop
bdb912c186 escape input_file name in hOCR output - fix issue 1154
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1098 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-09 22:19:30 +00:00
zdenop
30f6ae6742 amendment to r1091
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1095 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-07 20:53:03 +00:00
zdenop
ee73e3b107 fix issue 123: user-words (and user-patterns) file specified by command line
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1093 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-04 21:11:00 +00:00
zdenop
bc09cd9040 fix formating in C-API and add TessChoiceIteratorDelete
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1092 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-03 20:21:37 +00:00
zdenop
f86e9d83d4 add ChoiceIterator to C-API - fix issue 1149
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1091 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-03 09:29:20 +00:00
theraysmith@gmail.com
45e106820f Fixed issue 1116
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1074 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-04-24 00:50:27 +00:00
zdenop@gmail.com
2367ba1f6e fix PDF rendering for Arabic. http://ftp.de.debian.org/debian/pool/main/t/tesseract/tesseract_3.03.02-3.diff.gz
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1055 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-03-21 10:11:32 +00:00
zdenop
d451b28054 fix issue 1127; add unvl output to tesseract executable
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1052 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-03-02 14:40:21 +00:00
zdenop
f01ea0e485 C-API: remove comments
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1047 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 20:01:55 +00:00
theraysmith@gmail.com
2fcea93846 Fixed issues 1081-1090
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1046 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-04 02:23:18 +00:00
zdenop
790a3da22f remove 'class IMAGE;'
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1045 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 23:32:23 +00:00
theraysmith@gmail.com
864b2f6d80 Fixed problems with selection/copy/paste in some PDF viewers
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1042 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 19:14:16 +00:00
zdenop
4e526f987e C-API: another update API based on changes in baseapi.h
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1041 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 13:24:55 +00:00
zdenop
0e238d43ba C-API: update API based on changes in baseapi.h; add renderer
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1040 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-02 23:50:47 +00:00
zdenop
32789291a8 provide output for -psm 0
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1037 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-01 12:56:36 +00:00
zdenop
080e0c028a C-API: add function to set init parameter during Init with c-string array
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1036 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-01 10:36:40 +00:00
theraysmith@gmail.com
4585a4c9df Fixed empty page with color input
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1032 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-30 02:18:51 +00:00
theraysmith@gmail.com
0ddc7bfcaf Fixed first-word only bug in PDF output.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1022 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-27 22:40:03 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
1a487252f4 Fixed slow-down that was caused by upping MAX_NUM_CLASSES
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1013 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-24 21:12:35 +00:00