Commit Graph

2770 Commits

Author SHA1 Message Date
Stefan Weil
3dcab4e187 classify: Remove deprecated method ExtractIntFeat (#1422)
It was deprecated with commit 99edf4ccb four years ago.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:34:41 +02:00
Stefan Weil
bf0bddb9ca dict: Remove deprecated parameters (#1421)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:32:35 +02:00
Stefan Weil
d13b862050 Remove deprecated method DumpPGM (#1420)
It was deprecated in commit a18816f83 more than 7 years ago.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:29:11 +02:00
Stefan Weil
ee201e1f4f Remove deprecated support for -psm argument (#1419)
It was replaced by --psm and deprecated in commit 92d981b93.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:28:33 +02:00
Stefan Weil
afcd4cbec1 Remove unused local variable max_num_strokes (#1417)
This fixes a compiler warning. The variable is unused since commit
0e95e2ca87.

Remove also a related code comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:26:28 +02:00
Stefan Weil
d087f20212 configure: Remove GIT_REV which is no longer used (#1416)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:20:13 +02:00
Stefan Weil
8f7be2e72c lstm: Replace NULL by nullptr (#1415)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-25 17:19:27 +02:00
Egor Pugin
3fa25d9bbc Install training tools with CMake. 2018-03-23 23:28:51 +03:00
Stefan Weil
e45d9587b4 arch: Remove stray ) in pragma (#1413)
This fixes a warning from clang:

arch/dotproductavx.cpp:94:51: warning:
 unexpected token in pragma diagnostic [-Wunknown-pragmas]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-23 18:42:46 +01:00
Stefan Weil
a02b0f9726 Remove vcsversion.h (#1412)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-23 18:42:20 +01:00
Stefan Weil
b94bbd6e83 Update version handling (#1408)
ccutil/version.h is now no longer needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 21:49:47 +01:00
Stefan Weil
8c84d29c00 Fix CID 1270406, CID 1270407 and CID 1270411 Arguments in wrong order (#1410)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 21:33:54 +01:00
Stefan Weil
660b366401 Fix issues reported by Coverity Scan (#1409)
* Fix CID 1164532 'Constant' variable guards dead code

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164594 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164597 Argument cannot be negative

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1366447 Argument cannot be negative

Fix also the data type for current_pos, as ftell returns a long value.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1270404 Arguments in wrong order

This does not change the code, but should help Coverity Scan to see
that the argument order is as intended.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 20:00:01 +01:00
Stefan Weil
1694be9223 tessdatamanager: Use PACKAGE_VERSION instead of TESSERACT_VERSION_STR (#1407)
This allows further simplifications for the version handling.

Move the implementation for the constructors from .h file to .cpp file
to reduce dependencies.

Remove unneeded include statements from the .h file to reduce more
dependencies.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-22 07:43:35 +01:00
Shreeshrii
198664fb0b
Add additional Unicodes to IsVedicAccent 2018-03-20 20:33:25 +05:30
Shreeshrii
aab83da67d
Add kVedicMark to ConsumeVowelIfValid 2018-03-20 20:31:25 +05:30
Stefan Weil
8c258750de Simplify Makefile and add missing dependency for target training-install (#1403)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 22:10:41 +01:00
Stefan Weil
8209ce3663 cmake: Update version and add it to config_auto.h (#1402)
In a next step, the package version should be read from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 21:10:07 +01:00
Stefan Weil
81c47288a2 configure: Use m4_esyscmd_s to suppress linefeed (fix needed for macOS) (#1401)
While "echo -n" works on Debian GNU Linux, it fails to produce a valid
configure file on macOS, so try a different shorter solution.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 20:15:14 +01:00
Stefan Weil
64af706f0c arch: Fix some compiler warnings (-Wignored-qualifiers) (#1400)
Fix these gcc warnings:

arch/dotproductavx.cpp:53:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductavx.cpp:54:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:59:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:60:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:40:53 +01:00
Stefan Weil
6b2a0901c6 viewer: Fix some compiler warnings (-Wstringop-truncation) (#1399)
gcc warnings:

viewer/scrollview.cpp:72:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:118:14: warning:
 ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on
 the length of the source argument [-Wstringop-overflow=]
viewer/scrollview.cpp:746:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:830:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:33:14 +01:00
Stefan Weil
c222145c38 wordrec: Fix compiler warning (-Wstringop-truncation) (#1398)
gcc warning:
wordrec/language_model.cpp:959:16: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

memcpy could also be a little bit faster than strncpy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:03:23 +01:00
Stefan Weil
860dd10b8b autogen: Fix typo in comment (#1396)
It was introduced by commit d50769dc01.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:35:30 +01:00
Stefan Weil
d50769dc01 autogen: Report missing requirements (#1394)
* autogen: Report missing autoconf-archive

autoconf-archive is required, but users often missed that requirement.

The script now detects and reports missing autoconf-archive and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* autogen: Report missing pkg-config

pkg-config is required.

The script now detects and reports missing pkg-config and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:10:53 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
40c71bfcb8 Update unittest for new script data location and fix out-of-tree build (#1386)
tessdata_best and tessdata_fast recently changed the path for script data,
so the tests have to be updated, too.

In addition, the relative paths did not work with out-of-tree builds.
Use absolute paths and add them as C macros to the compiler flags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:44 +01:00
Stefan Weil
49dd464e5c Update googletest (#1383)
This updates the code to the latest version from git.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:12 +01:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
c6afad03b2 Fix compiler warning (-Wsign-compare) (#1385)
gcc reports this warning about 250 times:

ccutil/genericvector.h:378:48: warning:
 comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:00:49 +01:00
Stefan Weil
15638a5ce4 doc: Add missing language to list (#1368)
tessdata_fast includes bre.traineddata.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 18:58:53 +01:00
Stefan Weil
bdf6629722 Update version in README and manpages (#1381)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:39:29 +01:00
Stefan Weil
8fb68746fb configure: Get version string from git or from VERSION file (#1380)
Use git to create the version string if possible.
Otherwise get the version from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:38:46 +01:00
Stefan Weil
2d319cb8d3 configure: Update date, version and add project URL (#1379)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 19:53:00 +01:00
Shreeshrii
df58108972 Manpages (#1378)
* Add missing man pages

* Update lstmeval.1.asc

* Update combine_lang_model.1.asc

* Update lstmtraining.1.asc

* Update merge_unicharsets.1.asc

* Update set_unicharset_properties.1.asc

* Update text2image.1.asc

* Update text2image.1.asc

* Update combine_lang_model.1.asc
2018-03-12 19:08:15 +01:00
Stefan Weil
79c6fa6d10 Update package version (Visual Studio) (#1373)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 09:17:10 +01:00
Amit D
4b2bea79a5 Update TESSERACT_VERSION_STR (#1372) 2018-03-11 18:25:35 +01:00
Stefan Weil
14ee911978 lstm: Use MS C intrinsic function for faster calculation of log2 (#1369)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 20:52:39 +01:00
Stefan Weil
960007e58e Fix compiler warning (possible loss of data) (#1370)
Fix 306 warnings from MS C:

tesseract\ccutil\unicharset.h(242): warning C4267:
 'argument': conversion from 'size_t' to 'int', possible loss of data

The change also avoids some type conversions.
2018-03-10 20:51:52 +01:00
Stefan Weil
08ef815fe5 doc: Remove unsupported traineddata from list (#1367)
The languages dan_frak, deu_frak and slk_frak were contributions.
They are not part of tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 08:41:58 +01:00
Amit D
53f791ba8b Remove obsolete code (#1365)
MSVC 8.0 was released in 2005 and we don't support it.
2018-03-08 21:12:23 +01:00
Egor Pugin
59dc3e627e
Update appveyor.yml 2018-03-05 14:20:49 +03:00
Egor Pugin
1d6e9c1dc1
Update appveyor.yml 2018-03-05 01:03:26 +03:00
Shreeshrii
5845e1a27d Add unit test for OSD, update apiexample test (#1359)
* Update apiexample_test.cc

* Add OSD test and logging function

* Add images for OSD unittest
2018-03-04 14:52:27 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
0d9cdbe6dd README: Use CamelCase for GitHub (#1357)
Fix also some whitespace issues.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:40:13 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Stefan Weil
9035217acd Remove parameter m_data_sub_dir (#1356)
This further simplifies the finding of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:34:24 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
Jeroen Ooms
c6e8916065 fixes for C++11 (#1164) 2018-02-28 15:42:33 +01:00