Commit Graph

51 Commits

Author SHA1 Message Date
Thijs Leegwater
f061503a14 Added JPEG quality option parameter (-c jpg_quality=n) 2018-01-11 09:11:30 +01:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Raf Schietekat
986970d6ca RAII: pdfrenderer.cpp: pdftext 2017-05-11 02:02:37 +02:00
Raf Schietekat
3c6e18ecf9 RAII: pdfrenderer.cpp: buffer 2017-05-11 02:02:37 +02:00
Raf Schietekat
936ca00c44 RAII: pdfrenderer.cpp: cidtogidmap 2017-05-11 02:02:37 +02:00
Raf Schietekat
4840c65bf0 RAII: ResultIterator::GetUTF8Text(): was leaked inside TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Stefan Weil
1c59914b61 Use Leptonica struct names L_Compressed_Data, Pix
The Tesseract project prefers that names, so fix the remaining exceptions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 10:50:12 +02:00
Ray Smith
77015526fa Jeff's fixes to pdf rendering 2017-04-28 13:38:13 -07:00
Jeff Breidenbach
9038faf436 Better escaping for PDF title; fixes #636 2017-04-02 19:01:16 +02:00
Ray Smith
ca16a08c10 Removed dead TODO 2017-01-25 15:54:11 -08:00
James R. Barlow
bf638b9202 Fix PDF syntax error: "XObject" instead of "/XObject" when textonly_pdf=false 2017-01-20 13:36:38 -08:00
Zdenko Podobný
effa5741e6 Implement invisible text only for PDF 2017-01-20 21:26:34 +01:00
Stefan Weil
78d91701bd Simplify new operations
It is not necessary to check for null pointers after new.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 20:24:38 +01:00
Ray Smith
5913d7344f Added missing license headers 2016-11-18 15:53:11 -08:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Zdenko Podobný
5610738be9 fix #369 - pdf output with transparent background image 2016-08-05 22:37:58 +02:00
Zdenko Podobný
66f37f0cd3 add copyright to renderer.cpp and pdfr.cpp 2016-03-18 19:43:45 +01:00
Stefan Weil
5ce88d7f49 pdfrenderer: Fix uninitialized local variables
Coverity bug reports:

CID 1270405: Uninitialized scalar variable
CID 1270408: Uninitialized scalar variable
CID 1270409: Uninitialized scalar variable
CID 1270410: Uninitialized scalar variable

Those variables are set conditionally in the while loop
and must keep their values in following iterations, so
they must be declared outside of the loop.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-25 22:24:06 +01:00
Stefan Weil
997c4a6078 api: Fix printing of a size_t value
size_t is not always the same as long, especially not for 64 bit Windows:

api/pdfrenderer.cpp:549:31: warning:
 format '%ld' expects argument of type 'long int',
 but argument 4 has type 'size_t {aka long long unsigned int}' [-Wformat=]

size_t normally requires a format string "%zu", but this is unsupported
by Visual Studio, so use a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-05 06:39:35 +01:00
Stefan Weil
11b2a4d9af api: Fix typos in comments (all found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:54:27 +02:00
Zdenko Podobný
0337d898d4 fix bug in UTF-16BE conversion 2015-08-10 21:22:20 +02:00
Zdenko Podobný
628de5ba3f enable pdfrender with NO_CUBE_BUILD 2015-08-07 23:20:22 +02:00
Jeff Breidenbach
9dcf2c6aa8 replace CubeUtils::UTF8ToUTF32 in pdfrenderer 2015-08-07 22:18:33 +02:00
Ray Smith
a303ab9d00 Misc fixes, mostly clang formatting, but some bug fixes in matrix, werd, and tesstrain_utils. Also updates unicharset to match traineddata files. 2015-07-09 14:28:20 -07:00
Ray Smith
ab0f4e2c38 Clang fixes to earlier changes and build compatability with Google environment 2015-06-12 10:53:21 -07:00
orbitcowboy
9328f0e5d4 Fix potential null pointer dereference in ccmain/paragraphs.cpp. 2015-05-19 10:17:44 +02:00
Zdenko Podobný
e98849b482 rint error message when pdf.ttf is not found. 2015-05-15 15:14:00 +02:00
Ray Smith
6b634170c1 Significant change to invisible font system
to improve correctness and compatibility with
external programs, particularly ghostscript.
We will start mapping everything to a single glyph,
rather than allowing characters to run off the end
of the font.

A more detailed design discussion is embedded into
pdfrenderer.cpp comments. The font, source code
that produces the font, and the design comments
were contributed by Ken Sharp from Artifex Software.
2015-05-12 17:33:18 -07:00
Ray Smith
d9699c4099 Fixed bidi handling in PDF output 2014-10-09 13:29:01 -07:00
Zdenko Podobný
d0cb1071b2 remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285) 2014-10-07 23:37:34 +02:00
Zdenko Podobný
4904afe65b fix issue 1300 - patch from #35 2014-10-06 22:43:56 +02:00
Zdenko Podobný
4c01561b0f fix issue 1300 - patch from #26 2014-10-02 21:19:17 +02:00
Zdenko Podobný
f8613fab22 fix issue 1300 /patches from breidenbach 2014-09-21 16:38:24 +02:00
Zdenko Podobný
d1aa61c110 fix issue 1285: reimplement option to select pdf compression 2014-09-06 09:32:22 +02:00
theraysmith@gmail.com
b64ad05096 Improved efficiency of image processing for PDF
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1141 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:15:25 +00:00
zdenop
bce2cd5f33 enable to select pdf compression type and jpeg quality (fix issue 1263)
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1134 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-08 21:18:44 +00:00
zdenop
5b779456f9 fix compatibility with leptonica 1.71 and 1.70
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1126 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-07-24 19:11:39 +00:00
zdenop
905e6162b9 put info about (API) version; fix typo
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1117 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-06-22 18:31:42 +00:00
theraysmith@gmail.com
25a8c7b720 Enabled streaming input and output of multi-page documents
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1105 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:46:21 +00:00
zdenop@gmail.com
2367ba1f6e fix PDF rendering for Arabic. http://ftp.de.debian.org/debian/pool/main/t/tesseract/tesseract_3.03.02-3.diff.gz
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1055 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-03-21 10:11:32 +00:00
theraysmith@gmail.com
864b2f6d80 Fixed problems with selection/copy/paste in some PDF viewers
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1042 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-02-03 19:14:16 +00:00
theraysmith@gmail.com
4585a4c9df Fixed empty page with color input
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1032 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-30 02:18:51 +00:00
theraysmith@gmail.com
0ddc7bfcaf Fixed first-word only bug in PDF output.
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1022 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-27 22:40:03 +00:00
theraysmith@gmail.com
d11dc049e3 Fixed a lot of compiler/clang warnings
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1015 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-25 02:28:51 +00:00
theraysmith@gmail.com
5b9a7e06eb Turned on pdfrenderer functionality that needs leptonica 1.70
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1009 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-23 23:01:10 +00:00
zdenop@gmail.com
ef3b1d936e fix mingw build issues
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@995 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-18 09:00:54 +00:00
zdenop@gmail.com
94d08567e1 fix vs2010 (and maybe vs2008) build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@983 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-12 20:13:55 +00:00