Commit Graph

163 Commits

Author SHA1 Message Date
Stefan Weil
d13b862050 Remove deprecated method DumpPGM (#1420)
It was deprecated in commit a18816f83 more than 7 years ago.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:29:11 +02:00
Stefan Weil
a02b0f9726 Remove vcsversion.h (#1412)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-23 18:42:20 +01:00
Stefan Weil
b94bbd6e83 Update version handling (#1408)
ccutil/version.h is now no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 21:49:47 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Stefan Weil
b9365cdff1 api: Fix typo in comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-03 09:14:00 +02:00
Stefan Weil
cc0d87c5b8 List available languages recursively
Tesseract supports hierarchies of languages and uses them since
the new files best/*.traineddata were added.

Now `tesseract --list-langs` also shows any traineddata files in
subdirectories of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-10 18:55:38 +02:00
Stefan Weil
0720b3f38b Change default resolution from 70 to 300 dpi
The default resolution is used for images without an explicit resolution
or with an unreasonable resolution (smaller than 70 or larger than 2400).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-08 16:48:10 +02:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
1d862a54bd Add new line to a few error messages. 2017-07-01 08:40:57 -04:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
Stefan Weil
5dc4af62fb baseapi: Simplify code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 12:14:29 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
8aa0a2dd48 RAII: *::GetUNLVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
1dab23916f RAII: *::GetBoxText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
b7b68a65dd RAII: *::GetTSVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
a1fff874b4 RAII: *::GetHOCRText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
2772f78170 RAII: LTRResultIterator::GetUTF8Text 2017-05-11 02:02:37 +02:00
Raf Schietekat
f75665c34f RAII: TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
4840c65bf0 RAII: ResultIterator::GetUTF8Text(): was leaked inside TessBaseAPI::GetUTF8Text() 2017-05-11 02:02:37 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
6ac31dcbdd Fixed DetectOS so it doesn't crash with a big image 2017-05-03 15:50:31 -07:00
Stefan Weil
c1d649ebbc api: Replace Tesseract data types by POSIX data types
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-02 18:21:44 +02:00
Stefan Weil
aea0d9a8d5 api: Remove unneeded NULL checks
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 19:23:24 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Igor Pylypiv
cea24b7e44 Remove redundant condition from TessBaseAPI::AdaptToWordStr()
Expression (wordstr[w] != '\0') is always true if (wordstr[w] == ' ') is true.
2017-03-23 22:55:40 -07:00
Stefan Weil
7b33dad059 api: Remove unused variables
This fixes a compiler warning:

api/baseapi.cpp:1621:17: warning:
 variable 'font_name' set but not used [-Wunused-but-set-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-08 07:38:46 +01:00
Stefan Weil
cd925fd812 Fix indentation after conditional [-Wmisleading-indentation]
The indentation is wrong since commit
fd0683f9e0 and results in a gcc warning:

api/baseapi.cpp: In member function 'bool tesseract::TessBaseAPI::ProcessPagesMultipageTiff(const l_uint8*, size_t, const char*, const char*, int, tesseract::TessResultRenderer*, int)':
api/baseapi.cpp:986:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
     if (tessedit_page_number >= 0)
     ^~
api/baseapi.cpp:988:7: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
       pix = (data) ? pixReadMemFromMultipageTiff(data, size, &offset)
       ^~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-03-07 19:05:40 +01:00
Jeff Breidenbach
fd0683f9e0 remove obsolete OpenCl code from TessBaseAPI::ProcessPagesMultipageTiff; fixes #635 2017-01-29 16:43:10 +01:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Jeff Breidenbach
a979494897 fix #665 process file list 2017-01-19 15:19:35 +01:00
Zdenko Podobný
11f205707e Multi-page TIFF buffering is broken - fix #233 2016-12-24 09:17:02 +01:00
Ray Smith
9f5ba9105f Removed dependency on cube from the code 2016-12-14 10:55:15 -08:00
zdenop
3a47adcbe1 Merge pull request #544 from jbarlow83/master
Add new C API for detecting orientation and script, remove old one (4.00)
2016-12-09 13:34:18 +01:00
James R. Barlow
56b6f061cd Revise after code review 2016-12-08 15:08:48 -08:00
James R. Barlow
bc95798e01 Implement a new orientation and script detection API for C and C++
See issue #424.

The existing C API for TessBaseAPIDetectOS requires a C caller to successfully allocate struct OSResults which is actually a C++ class.  Generally it won't
be possible for a regular C compiler to do this properly.

It's also assumed that most API level users of Tesseract are only interested in Tesseract's best guess as to script and orientation, not the individual values for all possible scripts.

This introduces a new API with a better name that is more closely aligned with the output of 'tesseract -psm 0'.  Both tesseract -psm 0 and this API now share the same code in baseapi.cpp.
2016-12-07 13:21:05 -08:00
Jeff Breidenbach
ed4c4c6bf5 Produce warning for invalid resolution. Fix #453 2016-12-07 22:06:00 +01:00
Stefan Weil
85e37798cb Simplify delete operations
It is not necessary to check for null pointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-24 17:59:13 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
Ray Smith
c1c1e426b3 Added new LSTM-based neural network line recognizer 2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
ea786e25a4 api/baseapi: Fix memory leaks at program termination
Calling TessBaseAPI::Clear() which calls TessBaseAPI::ClearResults()
which calls SavePixForCrash(0, NULL) is needed to release objects
allocated in global_crash_pixes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-25 19:11:10 +02:00
Zdenko Podobný
54fafc4e2e improve multipage tiff processing (jbreiden patch from 2016-03-29) 2016-10-06 11:13:42 +02:00
Stefan Weil
db2a8e9f85 api: Remove unused constant kBytesPerBlob
This fixes a compiler warning:

api/baseapi.cpp:1743:11: warning:
 unused variable 'kBytesPerBlob' [-Wunused-const-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-06 21:49:26 +02:00
Stefan Weil
caffb3133b Remove unneeded 'struct' from TessBaseAPI::GetHOCRText (issue #414)
It conflicts with a previous 'class' declaration for ETEXT_DESC:

include/tesseract/baseapi.h:594:21:
 Struct 'ETEXT_DESC' was previously declared as a class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-09-05 13:17:13 +02:00
Steffen Rehberg
c0fcce2f8f Fix text box width/hight calculation (addition)
This occurrence was should have been included in commit 29d971e
but was overlooked by error.
2016-06-27 21:58:29 +02:00
Steffen Rehberg
29d971eb0c Fix text box width/hight calculation
In Tesseract's coordinate system, width is just right - left, cf. slide #2 of
github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
2016-06-25 12:40:28 +02:00