Commit Graph

291 Commits

Author SHA1 Message Date
Amit D
ad5ee18415 Make font size estimation work with the lstm engine (#1173)
**Partial** fix for issue #1074
2017-10-20 10:07:16 +02:00
Stefan Weil
aa6eb6bd46 Remove Tesseract parameter "include_page_breaks" and use FF by default
Now Tesseract adds a page break (normally form feed) by default.

It is still possible to suppress page breaks by setting an empty
page_separator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-19 07:34:32 +02:00
jm
2a77d5ad69 returns the correct dictionary if lstm only used 2017-09-14 13:03:22 +02:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
0382222d85 More clang-tidy fixes from sync 2017-09-08 10:22:32 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Stefan Weil
b016c48d06 Add missing spaces in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-23 19:12:41 +02:00
Stefan Weil
8bb5a89d5a Don't add empty line to text output
Empty lines in text output are needed to separate paragraphs,
but there should not be an empty line at the end of the text.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-21 09:47:35 +02:00
Ray Smith
4e8018d013 Important fix to RTL languages saves last space on each line, which was previously lost 2017-07-19 17:04:06 -07:00
Ray Smith
cec1037260 Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original 2017-07-19 16:28:26 -07:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
Stefan Weil
fef5972d23 EquationDetect: Remove unneeded new / delete operations
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-18 07:39:36 +02:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Stefan Weil
8c75d26657 Remove unneeded type casts when using Leptonica macro GET_DATA_BYTE
The first parameter is casted to an unsigned byte by Leptonica,
so we don't need additional type casts in Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 20:08:27 +02:00
Stefan Weil
ef1d9600b1 Use standard macros for format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
4840c65bf0 RAII: ResultIterator::GetUTF8Text(): was leaked inside TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Ray Smith
6ac31dcbdd Fixed DetectOS so it doesn't crash with a big image 2017-05-03 15:50:31 -07:00
Stefan Weil
5cc8c058fa ccmain: Replace Tesseract data types by POSIX data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 18:21:51 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
becec34057 Fix some typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-10 19:50:17 +01:00
Jeff Breidenbach
bd45b3ae4f fix #537: Error in pixClone: pixs not defined 2017-01-29 16:59:52 +01:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Ray Smith
a1c22fb0d0 Fixed issue #557 2017-01-25 16:05:59 -08:00
Ray Smith
b453f74e01 Fixed issue #633 (multi-language mode 2017-01-25 15:58:39 -08:00
zdenop
c768b5867d Merge pull request #668 from Wikinaut/chg-textonly-pdf-parameter-description
Improve textonly_pdf parameter description
2017-01-21 16:29:06 +01:00
Wikinaut
c03299e2b4 Improve textonly_pdf parameter description 2017-01-21 16:18:53 +01:00
Wikinaut
98df78ca8a fix typo in parameter description 2017-01-21 10:48:25 +01:00
Zdenko Podobný
effa5741e6 Implement invisible text only for PDF 2017-01-20 21:26:34 +01:00
Wikinaut
f06ef543fc typo correction "specific" 2017-01-13 04:24:16 +01:00
Wikinaut
39274d8000 typo correction "specific" 2017-01-13 04:17:32 +01:00
Stefan Weil
680bfddb4f Remove code for old versions of Leptonica
Those versions are no longer supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 11:57:44 +01:00
Simon Strandgaard
d38cffc332 Fixed typo 2016-12-15 14:58:53 +00:00
Egor Pugin
ead87a7180 Fix build. 2016-12-15 12:43:13 +03:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Ray Smith
5c3839bdb4 Delete cube code 2016-12-14 11:00:43 -08:00
Ray Smith
432684dd6e Makefile changes to remove cube 2016-12-14 10:58:24 -08:00
Ray Smith
9f5ba9105f Removed dependency on cube from the code 2016-12-14 10:55:15 -08:00
Ray Smith
13e46ae1c4 Made LSTM the default engine, pushed cube out 2016-12-13 14:37:40 -08:00
Jeff Breidenbach
ed4c4c6bf5 Produce warning for invalid resolution. Fix #453 2016-12-07 22:06:00 +01:00
zdenop
7f7cea1ee6 Merge pull request #532 from stweil/openmp
openmp: Fix build with clang++ and compilers without OpenMP support
2016-12-07 14:47:08 +01:00