Commit Graph

395 Commits

Author SHA1 Message Date
Stefan Weil
9612ca4a23 Fix compiler errors when including baseapi.h and capi.h
Including baseapi.h before capi.h is now possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-04-15 10:26:48 +02:00
Zdenko Podobný
e9e1e93686 add tess_version.h to distribution 2018-04-02 21:48:29 +02:00
Zdenko Podobný
64a73155ba add licence info 2018-04-02 19:11:46 +02:00
Zdenko Podobný
af037c27e7 rename version.h.in because the filename is too general for distribution 2018-04-02 19:11:02 +02:00
Stefan Weil
6bbfc3b5fc Create version.h from available version information (#1432)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-27 14:32:30 +02:00
Stefan Weil
d13b862050 Remove deprecated method DumpPGM (#1420)
It was deprecated in commit a18816f83 more than 7 years ago.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:29:11 +02:00
Stefan Weil
ee201e1f4f Remove deprecated support for -psm argument (#1419)
It was replaced by --psm and deprecated in commit 92d981b93.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:28:33 +02:00
Stefan Weil
a02b0f9726 Remove vcsversion.h (#1412)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-23 18:42:20 +01:00
Stefan Weil
b94bbd6e83 Update version handling (#1408)
ccutil/version.h is now no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 21:49:47 +01:00
Stefan Weil
660b366401 Fix issues reported by Coverity Scan (#1409)
* Fix CID 1164532 'Constant' variable guards dead code

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164594 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164597 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1366447 Argument cannot be negative

Fix also the data type for current_pos, as ftell returns a long value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1270404 Arguments in wrong order

This does not change the code, but should help Coverity Scan to see
that the argument order is as intended.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 20:00:01 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
8130b8d346 Fix some typos in comments (found by codespell) (#1331)
Fix also a grammar issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:07 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00
Stefan Weil
a50ff5277d Improve help text for OCR engine mode (#1326)
The new text was suggested by Amit Dovev, see
https://github.com/tesseract-ocr/tesseract/pull/1325#issuecomment-366613553.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 09:43:58 +01:00
Stefan Weil
349de8b739 Support different help texts for normal and advanced users and restore legacy mode (#1325)
* Restore support for the legacy engine

It is still needed to get text attributes which are unsupported by the
LSTM engine, and it also has better recognition rates for some texts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* tesseractmain: Add missing 'static' attributes

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Support different help texts for normal and advanced users

The old option --help now shows a very basic help text.
The new option --help-extra shows the full help information.
It now also includes a hint that Tesseract supports lists of images.

Fix also the indentation in the PSM help and
use a more neutral text in the OEM help.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add missing line feed in error message

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 07:30:38 +01:00
Zdenko Podobný
173ad2bd00 mark& block legacy OCR Engine untill it will be removed. 2018-02-18 19:31:09 +01:00
Josh Reid
cdc35338c5 Added check if input PSM value is outside of range (#1236)
Wrote a function to throw an error if PSM is outside 0-13 or OEM is outside 0-5.
fixes #1234
2017-12-14 11:37:44 +01:00
Stefan Weil
aa6eb6bd46 Remove Tesseract parameter "include_page_breaks" and use FF by default
Now Tesseract adds a page break (normally form feed) by default.

It is still possible to suppress page breaks by setting an empty
page_separator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-19 07:34:32 +02:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Stefan Weil
b9365cdff1 api: Fix typo in comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-03 09:14:00 +02:00
zdenop
7afa05a03e Merge pull request #1072 from stweil/listlangs
List available languages recursively
2017-08-13 14:50:42 +02:00
chrismamo1
5fd3e22f74 move code around so that list-langs will work without an English traineddata file 2017-08-12 17:15:27 -05:00
Stefan Weil
cc0d87c5b8 List available languages recursively
Tesseract supports hierarchies of languages and uses them since
the new files best/*.traineddata were added.

Now `tesseract --list-langs` also shows any traineddata files in
subdirectories of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-10 18:55:38 +02:00
Stefan Weil
0720b3f38b Change default resolution from 70 to 300 dpi
The default resolution is used for images without an explicit resolution
or with an unreasonable resolution (smaller than 70 or larger than 2400).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-08 16:48:10 +02:00
Ray Smith
2ef1aeaeb4 Added AVX2 and AVX512 detector 2017-08-02 14:15:50 -07:00
Ray Smith
dc8745e6fd Move LSTM unicharset and recoder to traineddata with version string part1. Backwards compatible - maybe. 2017-07-14 11:14:23 -07:00
Ray Smith
7588540296 Removed changes from last commit that didn't belong 2017-07-14 11:08:26 -07:00
Ray Smith
3ec11bd37a Deleted some dead LSTM code, making everything use the recoder 2017-07-14 10:58:21 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Justin Hotchkiss Palermo
1d862a54bd Add new line to a few error messages. 2017-07-01 08:40:57 -04:00
Stefan Weil
1cf8fe51a0 Remove mathfix.h
It was only needed for MS Visual Studio 2012 and older.
Those compilers are not supported for Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-06-05 20:26:25 +02:00
zdenop
ffb1ec3535 Merge pull request #918 from rfschtkt/issue529
Issue529
2017-05-13 19:33:46 +02:00
Raf Schietekat
b4cf46697f Issue #529: inside main() use return rather than exit 2017-05-13 18:02:00 +02:00
Stefan Weil
84396707a8 Fix crash if output file could not be opened
This error case results in fout_ == nullptr.
Closing a nullptr file is not allowed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 17:27:07 +02:00
zdenop
29f3de9be1 Merge pull request #914 from stweil/clean
Clean code
2017-05-13 12:45:57 +02:00
Stefan Weil
5dc4af62fb baseapi: Simplify code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 12:14:29 +02:00
Stefan Weil
78142593d2 Fix order of destructor calls for DawgCache and TessBaseAPI
TessBaseAPI must release its cache use before DawgCache is destroyed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 11:35:30 +02:00
Stefan Weil
f37f858c99 main: Fix two memory leaks
When Tesseract terminates by calling the exit function,
the destructor of any local auto variable is not called.

Fix two cases by using static variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 21:15:52 +02:00
Stefan Weil
5e3665c6ae Remove most libtiff dependencies
libtiff is no longer needed for OpenCL, so remove that dependency.

It is still suggested for Windows to redirect warning messages
from the tesseract executable to the console.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 10:15:35 +02:00
Raf Schietekat
c335508e84 Fewer g++ -Wsign-compare warnings 2017-05-11 23:14:52 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
8aa0a2dd48 RAII: *::GetUNLVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
1dab23916f RAII: *::GetBoxText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
b7b68a65dd RAII: *::GetTSVText() 2017-05-11 02:02:37 +02:00
Raf Schietekat
a1fff874b4 RAII: *::GetHOCRText() 2017-05-11 02:02:37 +02:00