Commit Graph

2408 Commits

Author SHA1 Message Date
Stefan Weil
660b366401 Fix issues reported by Coverity Scan (#1409)
* Fix CID 1164532 'Constant' variable guards dead code

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164594 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164597 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1366447 Argument cannot be negative

Fix also the data type for current_pos, as ftell returns a long value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1270404 Arguments in wrong order

This does not change the code, but should help Coverity Scan to see
that the argument order is as intended.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 20:00:01 +01:00
Stefan Weil
1694be9223 tessdatamanager: Use PACKAGE_VERSION instead of TESSERACT_VERSION_STR (#1407)
This allows further simplifications for the version handling.

Move the implementation for the constructors from .h file to .cpp file
to reduce dependencies.

Remove unneeded include statements from the .h file to reduce more
dependencies.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 07:43:35 +01:00
Shreeshrii
198664fb0b
Add additional Unicodes to IsVedicAccent 2018-03-20 20:33:25 +05:30
Shreeshrii
aab83da67d
Add kVedicMark to ConsumeVowelIfValid 2018-03-20 20:31:25 +05:30
Stefan Weil
8c258750de Simplify Makefile and add missing dependency for target training-install (#1403)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 22:10:41 +01:00
Stefan Weil
8209ce3663 cmake: Update version and add it to config_auto.h (#1402)
In a next step, the package version should be read from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 21:10:07 +01:00
Stefan Weil
81c47288a2 configure: Use m4_esyscmd_s to suppress linefeed (fix needed for macOS) (#1401)
While "echo -n" works on Debian GNU Linux, it fails to produce a valid
configure file on macOS, so try a different shorter solution.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 20:15:14 +01:00
Stefan Weil
64af706f0c arch: Fix some compiler warnings (-Wignored-qualifiers) (#1400)
Fix these gcc warnings:

arch/dotproductavx.cpp:53:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductavx.cpp:54:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:59:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:60:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:40:53 +01:00
Stefan Weil
6b2a0901c6 viewer: Fix some compiler warnings (-Wstringop-truncation) (#1399)
gcc warnings:

viewer/scrollview.cpp:72:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:118:14: warning:
 ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on
 the length of the source argument [-Wstringop-overflow=]
viewer/scrollview.cpp:746:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:830:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:33:14 +01:00
Stefan Weil
c222145c38 wordrec: Fix compiler warning (-Wstringop-truncation) (#1398)
gcc warning:
wordrec/language_model.cpp:959:16: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

memcpy could also be a little bit faster than strncpy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:03:23 +01:00
Stefan Weil
860dd10b8b autogen: Fix typo in comment (#1396)
It was introduced by commit d50769dc01.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:35:30 +01:00
Stefan Weil
d50769dc01 autogen: Report missing requirements (#1394)
* autogen: Report missing autoconf-archive

autoconf-archive is required, but users often missed that requirement.

The script now detects and reports missing autoconf-archive and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* autogen: Report missing pkg-config

pkg-config is required.

The script now detects and reports missing pkg-config and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:10:53 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
40c71bfcb8 Update unittest for new script data location and fix out-of-tree build (#1386)
tessdata_best and tessdata_fast recently changed the path for script data,
so the tests have to be updated, too.

In addition, the relative paths did not work with out-of-tree builds.
Use absolute paths and add them as C macros to the compiler flags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:44 +01:00
Stefan Weil
49dd464e5c Update googletest (#1383)
This updates the code to the latest version from git.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:12 +01:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
c6afad03b2 Fix compiler warning (-Wsign-compare) (#1385)
gcc reports this warning about 250 times:

ccutil/genericvector.h:378:48: warning:
 comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:00:49 +01:00
Stefan Weil
15638a5ce4 doc: Add missing language to list (#1368)
tessdata_fast includes bre.traineddata.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 18:58:53 +01:00
Stefan Weil
bdf6629722 Update version in README and manpages (#1381)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:39:29 +01:00
Stefan Weil
8fb68746fb configure: Get version string from git or from VERSION file (#1380)
Use git to create the version string if possible.
Otherwise get the version from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:38:46 +01:00
Stefan Weil
2d319cb8d3 configure: Update date, version and add project URL (#1379)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 19:53:00 +01:00
Shreeshrii
df58108972 Manpages (#1378)
* Add missing man pages

* Update lstmeval.1.asc

* Update combine_lang_model.1.asc

* Update lstmtraining.1.asc

* Update merge_unicharsets.1.asc

* Update set_unicharset_properties.1.asc

* Update text2image.1.asc

* Update text2image.1.asc

* Update combine_lang_model.1.asc
2018-03-12 19:08:15 +01:00
Stefan Weil
79c6fa6d10 Update package version (Visual Studio) (#1373)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 09:17:10 +01:00
Amit D
4b2bea79a5 Update TESSERACT_VERSION_STR (#1372) 2018-03-11 18:25:35 +01:00
Stefan Weil
14ee911978 lstm: Use MS C intrinsic function for faster calculation of log2 (#1369)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 20:52:39 +01:00
Stefan Weil
960007e58e Fix compiler warning (possible loss of data) (#1370)
Fix 306 warnings from MS C:

tesseract\ccutil\unicharset.h(242): warning C4267:
 'argument': conversion from 'size_t' to 'int', possible loss of data

The change also avoids some type conversions.
2018-03-10 20:51:52 +01:00
Stefan Weil
08ef815fe5 doc: Remove unsupported traineddata from list (#1367)
The languages dan_frak, deu_frak and slk_frak were contributions.
They are not part of tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 08:41:58 +01:00
Amit D
53f791ba8b Remove obsolete code (#1365)
MSVC 8.0 was released in 2005 and we don't support it.
2018-03-08 21:12:23 +01:00
Egor Pugin
59dc3e627e
Update appveyor.yml 2018-03-05 14:20:49 +03:00
Egor Pugin
1d6e9c1dc1
Update appveyor.yml 2018-03-05 01:03:26 +03:00
Shreeshrii
5845e1a27d Add unit test for OSD, update apiexample test (#1359)
* Update apiexample_test.cc

* Add OSD test and logging function

* Add images for OSD unittest
2018-03-04 14:52:27 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
0d9cdbe6dd README: Use CamelCase for GitHub (#1357)
Fix also some whitespace issues.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:40:13 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Stefan Weil
9035217acd Remove parameter m_data_sub_dir (#1356)
This further simplifies the finding of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:34:24 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
Jeroen Ooms
c6e8916065 fixes for C++11 (#1164) 2018-02-28 15:42:33 +01:00
fifothekid
ad6f3b412a Fixed unqualified class "string" (#1082) 2018-02-28 15:16:23 +01:00
Shreeshrii
40f43111e0 Add list of scripts to manpage for tesseract (#1347) 2018-02-24 09:37:25 +01:00
Shreeshrii
bb89dc3594 Add info regarding LSTM components and options (#1346) 2018-02-23 21:59:50 +01:00
zdenop
44588a3c7c
add commas to language list 2018-02-23 11:27:55 +01:00
Zdenko Podobný
035325dfd0 Update language list based on tessdata_fast; fix #1343 2018-02-23 11:19:18 +01:00
Egor Pugin
6f80c35b3f
Update appveyor.yml 2018-02-22 23:41:44 +03:00
Jim Regan
0d1365a5ee gle_uncial (#1342) 2018-02-22 17:20:31 +01:00
Egor Pugin
ce638c4b35
Update appveyor.yml 2018-02-21 21:11:24 +03:00
Stefan Weil
9f888f044a Fix typo in documentation (#1330)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:47 +01:00
Stefan Weil
8130b8d346 Fix some typos in comments (found by codespell) (#1331)
Fix also a grammar issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:07 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00