Commit Graph

2506 Commits

Author SHA1 Message Date
Shreeshrii
198664fb0b
Add additional Unicodes to IsVedicAccent 2018-03-20 20:33:25 +05:30
Shreeshrii
aab83da67d
Add kVedicMark to ConsumeVowelIfValid 2018-03-20 20:31:25 +05:30
Stefan Weil
8c258750de Simplify Makefile and add missing dependency for target training-install (#1403)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 22:10:41 +01:00
Stefan Weil
8209ce3663 cmake: Update version and add it to config_auto.h (#1402)
In a next step, the package version should be read from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 21:10:07 +01:00
Stefan Weil
81c47288a2 configure: Use m4_esyscmd_s to suppress linefeed (fix needed for macOS) (#1401)
While "echo -n" works on Debian GNU Linux, it fails to produce a valid
configure file on macOS, so try a different shorter solution.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 20:15:14 +01:00
Stefan Weil
64af706f0c arch: Fix some compiler warnings (-Wignored-qualifiers) (#1400)
Fix these gcc warnings:

arch/dotproductavx.cpp:53:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductavx.cpp:54:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:59:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]
arch/dotproductsse.cpp:60:45: warning:
 type qualifiers ignored on cast result type [-Wignored-qualifiers]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:40:53 +01:00
Stefan Weil
6b2a0901c6 viewer: Fix some compiler warnings (-Wstringop-truncation) (#1399)
gcc warnings:

viewer/scrollview.cpp:72:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:118:14: warning:
 ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on
 the length of the source argument [-Wstringop-overflow=]
viewer/scrollview.cpp:746:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]
viewer/scrollview.cpp:830:10: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:33:14 +01:00
Stefan Weil
c222145c38 wordrec: Fix compiler warning (-Wstringop-truncation) (#1398)
gcc warning:
wordrec/language_model.cpp:959:16: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

memcpy could also be a little bit faster than strncpy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:03:23 +01:00
Stefan Weil
860dd10b8b autogen: Fix typo in comment (#1396)
It was introduced by commit d50769dc01.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:35:30 +01:00
Stefan Weil
d50769dc01 autogen: Report missing requirements (#1394)
* autogen: Report missing autoconf-archive

autoconf-archive is required, but users often missed that requirement.

The script now detects and reports missing autoconf-archive and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* autogen: Report missing pkg-config

pkg-config is required.

The script now detects and reports missing pkg-config and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:10:53 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
40c71bfcb8 Update unittest for new script data location and fix out-of-tree build (#1386)
tessdata_best and tessdata_fast recently changed the path for script data,
so the tests have to be updated, too.

In addition, the relative paths did not work with out-of-tree builds.
Use absolute paths and add them as C macros to the compiler flags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:44 +01:00
Stefan Weil
49dd464e5c Update googletest (#1383)
This updates the code to the latest version from git.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:12 +01:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
c6afad03b2 Fix compiler warning (-Wsign-compare) (#1385)
gcc reports this warning about 250 times:

ccutil/genericvector.h:378:48: warning:
 comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:00:49 +01:00
Stefan Weil
15638a5ce4 doc: Add missing language to list (#1368)
tessdata_fast includes bre.traineddata.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 18:58:53 +01:00
Stefan Weil
bdf6629722 Update version in README and manpages (#1381)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:39:29 +01:00
Stefan Weil
8fb68746fb configure: Get version string from git or from VERSION file (#1380)
Use git to create the version string if possible.
Otherwise get the version from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:38:46 +01:00
Stefan Weil
2d319cb8d3 configure: Update date, version and add project URL (#1379)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 19:53:00 +01:00
Shreeshrii
df58108972 Manpages (#1378)
* Add missing man pages

* Update lstmeval.1.asc

* Update combine_lang_model.1.asc

* Update lstmtraining.1.asc

* Update merge_unicharsets.1.asc

* Update set_unicharset_properties.1.asc

* Update text2image.1.asc

* Update text2image.1.asc

* Update combine_lang_model.1.asc
2018-03-12 19:08:15 +01:00
Stefan Weil
79c6fa6d10 Update package version (Visual Studio) (#1373)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 09:17:10 +01:00
Amit D
4b2bea79a5 Update TESSERACT_VERSION_STR (#1372) 2018-03-11 18:25:35 +01:00
Stefan Weil
14ee911978 lstm: Use MS C intrinsic function for faster calculation of log2 (#1369)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 20:52:39 +01:00
Stefan Weil
960007e58e Fix compiler warning (possible loss of data) (#1370)
Fix 306 warnings from MS C:

tesseract\ccutil\unicharset.h(242): warning C4267:
 'argument': conversion from 'size_t' to 'int', possible loss of data

The change also avoids some type conversions.
2018-03-10 20:51:52 +01:00
Stefan Weil
08ef815fe5 doc: Remove unsupported traineddata from list (#1367)
The languages dan_frak, deu_frak and slk_frak were contributions.
They are not part of tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 08:41:58 +01:00
Amit D
53f791ba8b Remove obsolete code (#1365)
MSVC 8.0 was released in 2005 and we don't support it.
2018-03-08 21:12:23 +01:00
Egor Pugin
59dc3e627e
Update appveyor.yml 2018-03-05 14:20:49 +03:00
Egor Pugin
1d6e9c1dc1
Update appveyor.yml 2018-03-05 01:03:26 +03:00
Shreeshrii
5845e1a27d Add unit test for OSD, update apiexample test (#1359)
* Update apiexample_test.cc

* Add OSD test and logging function

* Add images for OSD unittest
2018-03-04 14:52:27 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
0d9cdbe6dd README: Use CamelCase for GitHub (#1357)
Fix also some whitespace issues.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:40:13 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Stefan Weil
9035217acd Remove parameter m_data_sub_dir (#1356)
This further simplifies the finding of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:34:24 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
Jeroen Ooms
c6e8916065 fixes for C++11 (#1164) 2018-02-28 15:42:33 +01:00
fifothekid
ad6f3b412a Fixed unqualified class "string" (#1082) 2018-02-28 15:16:23 +01:00
Shreeshrii
40f43111e0 Add list of scripts to manpage for tesseract (#1347) 2018-02-24 09:37:25 +01:00
Shreeshrii
bb89dc3594 Add info regarding LSTM components and options (#1346) 2018-02-23 21:59:50 +01:00
zdenop
44588a3c7c
add commas to language list 2018-02-23 11:27:55 +01:00
Zdenko Podobný
035325dfd0 Update language list based on tessdata_fast; fix #1343 2018-02-23 11:19:18 +01:00
Egor Pugin
6f80c35b3f
Update appveyor.yml 2018-02-22 23:41:44 +03:00
Jim Regan
0d1365a5ee gle_uncial (#1342) 2018-02-22 17:20:31 +01:00
Egor Pugin
ce638c4b35
Update appveyor.yml 2018-02-21 21:11:24 +03:00
Stefan Weil
9f888f044a Fix typo in documentation (#1330)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:47 +01:00
Stefan Weil
8130b8d346 Fix some typos in comments (found by codespell) (#1331)
Fix also a grammar issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:07 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00
Egor Pugin
6a58b2e682
Remove whitespace. 2018-02-21 08:08:55 +03:00
Amit D
766b7bd620 Don't drop words with low certainty (#1264)
Fix #681.
2018-02-20 17:19:10 +01:00