Commit Graph

3250 Commits

Author SHA1 Message Date
Stefan Weil
c222145c38 wordrec: Fix compiler warning (-Wstringop-truncation) (#1398)
gcc warning:
wordrec/language_model.cpp:959:16: warning:
 ‘char* strncpy(char*, const char*, size_t)’ output truncated before
 terminating nul copying as many bytes from a string as its length
 [-Wstringop-truncation]

memcpy could also be a little bit faster than strncpy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-18 08:03:23 +01:00
Stefan Weil
860dd10b8b autogen: Fix typo in comment (#1396)
It was introduced by commit d50769dc01.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:35:30 +01:00
Stefan Weil
d50769dc01 autogen: Report missing requirements (#1394)
* autogen: Report missing autoconf-archive

autoconf-archive is required, but users often missed that requirement.

The script now detects and reports missing autoconf-archive and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* autogen: Report missing pkg-config

pkg-config is required.

The script now detects and reports missing pkg-config and removes
the incomplete generated configure script.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-17 14:10:53 +01:00
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
40c71bfcb8 Update unittest for new script data location and fix out-of-tree build (#1386)
tessdata_best and tessdata_fast recently changed the path for script data,
so the tests have to be updated, too.

In addition, the relative paths did not work with out-of-tree builds.
Use absolute paths and add them as C macros to the compiler flags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:44 +01:00
Stefan Weil
49dd464e5c Update googletest (#1383)
This updates the code to the latest version from git.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:15:12 +01:00
Stefan Weil
47a326b02d Use POSIX data types for external interfaces (#1358)
Replace the Tesseract specific data types in header files which are
part of Debian package libtesseract-dev by POSIX data types.

Update also matching cpp files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:01:40 +01:00
Stefan Weil
c6afad03b2 Fix compiler warning (-Wsign-compare) (#1385)
gcc reports this warning about 250 times:

ccutil/genericvector.h:378:48: warning:
 comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 19:00:49 +01:00
Stefan Weil
15638a5ce4 doc: Add missing language to list (#1368)
tessdata_fast includes bre.traineddata.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 18:58:53 +01:00
Stefan Weil
bdf6629722 Update version in README and manpages (#1381)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:39:29 +01:00
Stefan Weil
8fb68746fb configure: Get version string from git or from VERSION file (#1380)
Use git to create the version string if possible.
Otherwise get the version from the VERSION file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 21:38:46 +01:00
Stefan Weil
2d319cb8d3 configure: Update date, version and add project URL (#1379)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 19:53:00 +01:00
Shreeshrii
df58108972 Manpages (#1378)
* Add missing man pages

* Update lstmeval.1.asc

* Update combine_lang_model.1.asc

* Update lstmtraining.1.asc

* Update merge_unicharsets.1.asc

* Update set_unicharset_properties.1.asc

* Update text2image.1.asc

* Update text2image.1.asc

* Update combine_lang_model.1.asc
2018-03-12 19:08:15 +01:00
Stefan Weil
79c6fa6d10 Update package version (Visual Studio) (#1373)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-12 09:17:10 +01:00
Amit D
4b2bea79a5 Update TESSERACT_VERSION_STR (#1372) 2018-03-11 18:25:35 +01:00
Stefan Weil
14ee911978 lstm: Use MS C intrinsic function for faster calculation of log2 (#1369)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 20:52:39 +01:00
Stefan Weil
960007e58e Fix compiler warning (possible loss of data) (#1370)
Fix 306 warnings from MS C:

tesseract\ccutil\unicharset.h(242): warning C4267:
 'argument': conversion from 'size_t' to 'int', possible loss of data

The change also avoids some type conversions.
2018-03-10 20:51:52 +01:00
Stefan Weil
08ef815fe5 doc: Remove unsupported traineddata from list (#1367)
The languages dan_frak, deu_frak and slk_frak were contributions.
They are not part of tessdata_fast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-10 08:41:58 +01:00
Amit D
53f791ba8b Remove obsolete code (#1365)
MSVC 8.0 was released in 2005 and we don't support it.
2018-03-08 21:12:23 +01:00
Egor Pugin
59dc3e627e
Update appveyor.yml 2018-03-05 14:20:49 +03:00
Egor Pugin
1d6e9c1dc1
Update appveyor.yml 2018-03-05 01:03:26 +03:00
Shreeshrii
5845e1a27d Add unit test for OSD, update apiexample test (#1359)
* Update apiexample_test.cc

* Add OSD test and logging function

* Add images for OSD unittest
2018-03-04 14:52:27 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
0d9cdbe6dd README: Use CamelCase for GitHub (#1357)
Fix also some whitespace issues.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:40:13 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Stefan Weil
9035217acd Remove parameter m_data_sub_dir (#1356)
This further simplifies the finding of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:34:24 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
Jeroen Ooms
c6e8916065 fixes for C++11 (#1164) 2018-02-28 15:42:33 +01:00
fifothekid
ad6f3b412a Fixed unqualified class "string" (#1082) 2018-02-28 15:16:23 +01:00
Shreeshrii
40f43111e0 Add list of scripts to manpage for tesseract (#1347) 2018-02-24 09:37:25 +01:00
Shreeshrii
bb89dc3594 Add info regarding LSTM components and options (#1346) 2018-02-23 21:59:50 +01:00
zdenop
44588a3c7c
add commas to language list 2018-02-23 11:27:55 +01:00
Zdenko Podobný
035325dfd0 Update language list based on tessdata_fast; fix #1343 2018-02-23 11:19:18 +01:00
Egor Pugin
6f80c35b3f
Update appveyor.yml 2018-02-22 23:41:44 +03:00
Jim Regan
0d1365a5ee gle_uncial (#1342) 2018-02-22 17:20:31 +01:00
Egor Pugin
ce638c4b35
Update appveyor.yml 2018-02-21 21:11:24 +03:00
Stefan Weil
9f888f044a Fix typo in documentation (#1330)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:47 +01:00
Stefan Weil
8130b8d346 Fix some typos in comments (found by codespell) (#1331)
Fix also a grammar issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:07 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00
Egor Pugin
6a58b2e682
Remove whitespace. 2018-02-21 08:08:55 +03:00
Amit D
766b7bd620 Don't drop words with low certainty (#1264)
Fix #681.
2018-02-20 17:19:10 +01:00
Stefan Weil
af6994efd9 Don't try alternate path for tessdata (#1328)
This simplifies the code and the user interface.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:18:14 +01:00
Stefan Weil
2ca7d9451a Remove files generated by libtool (#1329)
It looks like those files were added accidentally
in commit fc6a390c6c.
Add them to .gitignore to avoid that from now on.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:17:49 +01:00
Stefan Weil
a50ff5277d Improve help text for OCR engine mode (#1326)
The new text was suggested by Amit Dovev, see
https://github.com/tesseract-ocr/tesseract/pull/1325#issuecomment-366613553.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 09:43:58 +01:00
Stefan Weil
349de8b739 Support different help texts for normal and advanced users and restore legacy mode (#1325)
* Restore support for the legacy engine

It is still needed to get text attributes which are unsupported by the
LSTM engine, and it also has better recognition rates for some texts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* tesseractmain: Add missing 'static' attributes

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Support different help texts for normal and advanced users

The old option --help now shows a very basic help text.
The new option --help-extra shows the full help information.
It now also includes a hint that Tesseract supports lists of images.

Fix also the indentation in the PSM help and
use a more neutral text in the OEM help.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add missing line feed in error message

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 07:30:38 +01:00
Zdenko Podobný
173ad2bd00 mark& block legacy OCR Engine untill it will be removed. 2018-02-18 19:31:09 +01:00
Stefan Weil
01f9a7f3c2 Clean use of double / float (#1323)
The variable 'diff' gets a double value and is compared with a double value,
so it should be double, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:34:04 +01:00
Stefan Weil
43f34f5c3e Clean Makefile.am (#1322)
Replace the doc-dummy hack with .PHONY.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:25:31 +01:00