Commit Graph

378 Commits

Author SHA1 Message Date
Thijs Leegwater
f061503a14 Added JPEG quality option parameter (-c jpg_quality=n) 2018-01-11 09:11:30 +01:00
Josh Reid
cdc35338c5 Added check if input PSM value is outside of range (#1236)
Wrote a function to throw an error if PSM is outside 0-13 or OEM is outside 0-5.
fixes #1234
2017-12-14 11:37:44 +01:00
Stefan Weil
aa6eb6bd46 Remove Tesseract parameter "include_page_breaks" and use FF by default
Now Tesseract adds a page break (normally form feed) by default.

It is still possible to suppress page breaks by setting an empty
page_separator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-19 07:34:32 +02:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Stefan Weil
b9365cdff1 api: Fix typo in comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-03 09:14:00 +02:00
zdenop
7afa05a03e Merge pull request #1072 from stweil/listlangs
List available languages recursively
2017-08-13 14:50:42 +02:00
chrismamo1
5fd3e22f74 move code around so that list-langs will work without an English traineddata file 2017-08-12 17:15:27 -05:00
Stefan Weil
cc0d87c5b8 List available languages recursively
Tesseract supports hierarchies of languages and uses them since
the new files best/*.traineddata were added.

Now `tesseract --list-langs` also shows any traineddata files in
subdirectories of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-10 18:55:38 +02:00
Stefan Weil
0720b3f38b Change default resolution from 70 to 300 dpi
The default resolution is used for images without an explicit resolution
or with an unreasonable resolution (smaller than 70 or larger than 2400).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-08 16:48:10 +02:00
Ray Smith
2ef1aeaeb4 Added AVX2 and AVX512 detector 2017-08-02 14:15:50 -07:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
7588540296 Removed changes from last commit that didn't belong 2017-07-14 11:08:26 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Justin Hotchkiss Palermo
1d862a54bd Add new line to a few error messages. 2017-07-01 08:40:57 -04:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
zdenop
ffb1ec3535 Merge pull request #918 from rfschtkt/issue529
Issue529
2017-05-13 19:33:46 +02:00
Raf Schietekat
b4cf46697f Issue #529: inside main() use return rather than exit 2017-05-13 18:02:00 +02:00
Stefan Weil
84396707a8 Fix crash if output file could not be opened
This error case results in fout_ == nullptr.
Closing a nullptr file is not allowed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 17:27:07 +02:00
zdenop
29f3de9be1 Merge pull request #914 from stweil/clean
Clean code
2017-05-13 12:45:57 +02:00
Stefan Weil
5dc4af62fb baseapi: Simplify code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 12:14:29 +02:00
Stefan Weil
78142593d2 Fix order of destructor calls for DawgCache and TessBaseAPI
TessBaseAPI must release its cache use before DawgCache is destroyed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 11:35:30 +02:00
Stefan Weil
f37f858c99 main: Fix two memory leaks
When Tesseract terminates by calling the exit function,
the destructor of any local auto variable is not called.

Fix two cases by using static variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 21:15:52 +02:00
Stefan Weil
5e3665c6ae Remove most libtiff dependencies
libtiff is no longer needed for OpenCL, so remove that dependency.

It is still suggested for Windows to redirect warning messages
from the tesseract executable to the console.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 10:15:35 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
8aa0a2dd48 RAII: *::GetUNLVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
1dab23916f RAII: *::GetBoxText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
b7b68a65dd RAII: *::GetTSVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
a1fff874b4 RAII: *::GetHOCRText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
986970d6ca RAII: pdfrenderer.cpp: pdftext 2017-05-11 02:02:37 +02:00
Raf Schietekat
3c6e18ecf9 RAII: pdfrenderer.cpp: buffer 2017-05-11 02:02:37 +02:00
Raf Schietekat
936ca00c44 RAII: pdfrenderer.cpp: cidtogidmap 2017-05-11 02:02:37 +02:00
Raf Schietekat
2772f78170 RAII: LTRResultIterator::GetUTF8Text 2017-05-11 02:02:37 +02:00
Raf Schietekat
f75665c34f RAII: TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
4840c65bf0 RAII: ResultIterator::GetUTF8Text(): was leaked inside TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Egor Pugin
0afd5939b1 Use NDEBUG macro instead of DEBUG. 2017-05-08 13:01:22 +03:00
Ray Smith
6ac31dcbdd Fixed DetectOS so it doesn't crash with a big image 2017-05-03 15:50:31 -07:00
Stefan Weil
c1d649ebbc api: Replace Tesseract data types by POSIX data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 18:21:44 +02:00
Stefan Weil
aea0d9a8d5 api: Remove unneeded NULL checks
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 19:23:24 +02:00
Stefan Weil
1c59914b61 Use Leptonica struct names L_Compressed_Data, Pix
The Tesseract project prefers that names, so fix the remaining exceptions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 10:50:12 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
77015526fa Jeff's fixes to pdf rendering 2017-04-28 13:38:13 -07:00
zdenop
13b7900ebf Merge pull request #778 from cjmayo/singleopts
tidy tesseract(1) adding missing options
2017-04-28 18:58:40 +02:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
James R. Barlow
f54577e6be Fix #786 - 3.05 linkage fails on macOS Sierra with --enable-opencl
Also needed for 4.00.
2017-04-10 22:22:49 -07:00