Commit Graph

298 Commits

Author SHA1 Message Date
Stefan Weil
18c8f8833f Remove deprecated parameters (#1418)
They were deprecated nearly 3 years ago in
commit 0e868ef377.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:35:29 +02:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Amit D
766b7bd620 Don't drop words with low certainty (#1264)
Fix #681.
2018-02-20 17:19:10 +01:00
Stefan Weil
af6994efd9 Don't try alternate path for tessdata (#1328)
This simplifies the code and the user interface.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:18:14 +01:00
Stefan Weil
01f9a7f3c2 Clean use of double / float (#1323)
The variable 'diff' gets a double value and is compared with a double value,
so it should be double, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:34:04 +01:00
Stefan Weil
20b3ff8796 Fix some minor issues reported by Coverity Scan (#1321)
* Dereference pointer after NULL check (CID 1385638)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Dereference pointer after NULL check (CID 1385635)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Dereference pointer after NULL check (CID 1385634)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164527 'Constant' variable guards dead code

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:22:59 +01:00
Amit D
ad5ee18415 Make font size estimation work with the lstm engine (#1173)
**Partial** fix for issue #1074
2017-10-20 10:07:16 +02:00
Stefan Weil
aa6eb6bd46 Remove Tesseract parameter "include_page_breaks" and use FF by default
Now Tesseract adds a page break (normally form feed) by default.

It is still possible to suppress page breaks by setting an empty
page_separator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-19 07:34:32 +02:00
jm
2a77d5ad69 returns the correct dictionary if lstm only used 2017-09-14 13:03:22 +02:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
0382222d85 More clang-tidy fixes from sync 2017-09-08 10:22:32 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Stefan Weil
b016c48d06 Add missing spaces in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-23 19:12:41 +02:00
Stefan Weil
8bb5a89d5a Don't add empty line to text output
Empty lines in text output are needed to separate paragraphs,
but there should not be an empty line at the end of the text.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-21 09:47:35 +02:00
Ray Smith
4e8018d013 Important fix to RTL languages saves last space on each line, which was previously lost 2017-07-19 17:04:06 -07:00
Ray Smith
cec1037260 Fixed BestPix to always return the highest resolution available, even if a lower bit depth than the original 2017-07-19 16:28:26 -07:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
Stefan Weil
fef5972d23 EquationDetect: Remove unneeded new / delete operations
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-18 07:39:36 +02:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
Stefan Weil
8c75d26657 Remove unneeded type casts when using Leptonica macro GET_DATA_BYTE
The first parameter is casted to an unsigned byte by Leptonica,
so we don't need additional type casts in Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 20:08:27 +02:00
Stefan Weil
ef1d9600b1 Use standard macros for format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
4840c65bf0 RAII: ResultIterator::GetUTF8Text(): was leaked inside TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Ray Smith
6ac31dcbdd Fixed DetectOS so it doesn't crash with a big image 2017-05-03 15:50:31 -07:00
Stefan Weil
5cc8c058fa ccmain: Replace Tesseract data types by POSIX data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 18:21:51 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
becec34057 Fix some typos in comments (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-10 19:50:17 +01:00
Jeff Breidenbach
bd45b3ae4f fix #537: Error in pixClone: pixs not defined 2017-01-29 16:59:52 +01:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Ray Smith
a1c22fb0d0 Fixed issue #557 2017-01-25 16:05:59 -08:00
Ray Smith
b453f74e01 Fixed issue #633 (multi-language mode 2017-01-25 15:58:39 -08:00
zdenop
c768b5867d Merge pull request #668 from Wikinaut/chg-textonly-pdf-parameter-description
Improve textonly_pdf parameter description
2017-01-21 16:29:06 +01:00
Wikinaut
c03299e2b4 Improve textonly_pdf parameter description 2017-01-21 16:18:53 +01:00
Wikinaut
98df78ca8a fix typo in parameter description 2017-01-21 10:48:25 +01:00
Zdenko Podobný
effa5741e6 Implement invisible text only for PDF 2017-01-20 21:26:34 +01:00
Wikinaut
f06ef543fc typo correction "specific" 2017-01-13 04:24:16 +01:00
Wikinaut
39274d8000 typo correction "specific" 2017-01-13 04:17:32 +01:00
Stefan Weil
680bfddb4f Remove code for old versions of Leptonica
Those versions are no longer supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-27 11:57:44 +01:00
Simon Strandgaard
d38cffc332 Fixed typo 2016-12-15 14:58:53 +00:00
Egor Pugin
ead87a7180 Fix build. 2016-12-15 12:43:13 +03:00