Commit Graph

4667 Commits

Author SHA1 Message Date
Stefan Weil
e161501df6 Optimize performance by using inline MatrixDotVectorInternal
This improves performace for the "best" models because it
avoids function calls.

The compiler also knows the passed values for the parameters
add_bias_fwd and skip_bias_back.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-29 21:37:32 +01:00
Egor Pugin
685b136d89
Fix incorrect condition. 2018-11-29 19:02:54 +03:00
Stefan Weil
9e49539429
Merge pull request #2079 from ajschaeper/master
Add example in README.md
2018-11-26 09:44:24 +01:00
ajschaeper
0223abcb96 Add example in README.md 2018-11-26 09:30:22 +01:00
Egor Pugin
267b79982d
Merge pull request #2076 from jbarlow83/pythonize-training
RFC: Pythonize tesstrain.sh and friends
2018-11-25 13:31:48 +03:00
James R. Barlow
8aa25239ae Fix some of Codacy's complaints 2018-11-24 16:59:01 -08:00
James R. Barlow
9122e6249e Autoreformat code
This increases the deviation from the bash scripts so is done separately.
2018-11-24 00:50:29 -08:00
James R. Barlow
d9ae7ecc49 Pythonize tesstrain.sh -> tesstrain.py
This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently
supports only LSTM and not the Tesseract 3 training mode.

I attempted to keep source changes minimal so it would be easy to compare
bash to Python in code review and confirm equivalence.

Python 3.6+ is required.  Ubuntu 18.04 ships Python 3.6 and it is a mandatory
package (the package manager is also written in Python), so it is available
in the baseline Tesseract 4.0 system.

There are minor output and behavioral changes, and advantages.  Python's loggingis used.  Temporary files are only deleted on success, so they can be inspected
if training files.  Console output is more terse and the log file is more
verbose.  And there are progress bars!  (The python3-tqdm package is required.)
Where tesstrain.sh would sometimes fail without explanation and return an error
code of 1, it is much easier to find the point of failure in this version.
That was also the main motivation for this work.

Argument checking is also more comprehensive.
2018-11-24 00:45:35 -08:00
pndaza
fc8a3d5bbc combine condition with next 2018-11-24 09:21:05 +06:30
pndaza
5c85d8e03d add missed letters and symbols - 0x104a to 0x104f - 2018-11-24 09:14:31 +06:30
Egor Pugin
b08624acc0 Reapply: Add sw build system script (future cppan replacement). 2018-11-23 01:47:02 +03:00
Egor Pugin
19580b18bc Revert "Add sw build system script (future cppan replacement)."
This reverts commit b1e20043fd.
2018-11-23 01:44:58 +03:00
Egor Pugin
e98661c0e7
Merge pull request #2065 from thegrizzlylabs/fix-configure-android
fix(configure) Don't add rt on Android
2018-11-21 23:29:44 +03:00
Egor Pugin
b1e20043fd Add sw build system script (future cppan replacement). 2018-11-20 00:12:26 +03:00
zdenop
def7cdd641
Merge pull request #2063 from stweil/tprintf
Remove unused include statements for tprintf.h
2018-11-18 19:02:07 +01:00
Stefan Weil
9b783822a0 Remove unused include statements for tprintf.h
Format also a call of tprintf and add a missing explicit include statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-18 17:25:01 +01:00
zdenop
670ce8e4cf
Merge pull request #2060 from stweil/overflow
Fix wrong results from function streamtofloat
2018-11-18 08:11:55 +01:00
Stefan Weil
a93426c9ff Fix wrong results from function streamtofloat
The local variable k should be 10 ^ (number of digits after comma),
but will overflow when there are more than 9 digits after the comma
because an int value cannot store 10000000000.

This results in wrong double values read from .tr files for example
(or in a runtime exception if Tesseract was compiled with -ftrapv).

Using uint64_t does not fix the general problem but allows more digits
which should be sufficient for the data read by Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-17 20:02:21 +01:00
zdenop
b67ea2c1a7
Merge pull request #2058 from stweil/sh-fix
Fix some issues with the shell scripts for training
2018-11-17 09:51:26 +01:00
Stefan Weil
acca4fb999 Fix some unbound variables and other small issues in training shell scripts
Fix also the logging helper functions to work without log file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-16 11:13:46 +01:00
Stefan Weil
a4b03fbb27 Fix warning from shellcheck
shellcheck warning:

    In /tesseract/src/training/tesstrain_utils.sh line 209:
        TIMESTAMP=`date +%Y-%m-%d`
                  ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 17:45:20 +01:00
John Lin
bfe58aa56f Fix unbound variable $FONTS 2018-11-15 17:43:15 +01:00
Guillaume Gigaud
92b8833838
fix(configure) Don't add rt on Android
Library rt is included in the libc on Android: https://developer.android.com/ndk/guides/stable_apis#a3
2018-11-15 13:56:28 +01:00
Stefan Weil
0915cbd535 Simplify shell script using mktemp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 13:36:52 +01:00
John Lin
edb76e281a Simplify MKTEMP_DT logic 2018-11-15 10:38:40 +08:00
John Lin
dbfc89f9af Fix mktemp in tesstrain_utils.sh
The commit 10f2c45c00 unified the usage of mktemp, but with a
incorrect bash syntax and unnecessary definition of LANG_CODE
and TIMESTAMP. This patch fixes the above problems.
2018-11-14 09:04:34 +08:00
zdenop
ec476f908e
Merge pull request #2050 from stweil/leaks
Fix some memory leaks in unit tests
2018-11-13 12:59:36 +01:00
Stefan Weil
ff5347c4ad Fix memory leak in osd_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:23 +01:00
Stefan Weil
5209aa6c95 Fix memory leak in loadlang_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:10 +01:00
Stefan Weil
74f6d0e7ff Fix memory leak in apiexample_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:08:42 +01:00
Stefan Weil
303ac97102 Fix memory leaks and typos in progress_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 08:24:50 +01:00
Zdenko Podobný
4ef51d8bc0 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-11-12 12:54:33 +01:00
Ray Smith
ce88adbf32 fix issue #1192 2018-11-12 12:53:12 +01:00
zdenop
de3734a0f4
Merge pull request #2046 from stweil/tests
Update test submodule
2018-11-09 09:18:00 +01:00
Stefan Weil
fae47eb876 Update test submodule
This is needed to include the moved langtests and unlvtests.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-09 08:18:19 +01:00
zdenop
724957167e fix typo in non VS build 2018-11-08 23:10:14 +01:00
zdenop
eb104f9fe4 VS build: fix warning C4996: The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name. 2018-11-08 22:55:04 +01:00
zdenop
cdfb768010 move langtests and unlvtests from tesseract-ocr repository to test repository 2018-11-08 22:31:32 +01:00
zdenop
cbef2ebe12 implement patches vcpkg tesseract 2018-11-08 21:37:47 +01:00
zdenop
7a7f226228 ocrclass: Remove unused macros
Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccutil/ocrclass.h
2018-11-08 20:23:36 +01:00
zdenop
28df28123e Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-11-08 19:44:22 +01:00
Zdenko Podobný
2dd753ee4c replace VS implementation of gettimeofday with std::chrono::steady_clock::now(); fixes #2038 2018-11-08 19:43:46 +01:00
zdenop
39e5fe15ff
Merge pull request #2044 from stweil/tests
Remove dummy test
2018-11-08 19:30:12 +01:00
Stefan Weil
f4ec5beedc Remove dummy test
This reverts commit 99755b0732.
The dummy test is no longer needed as there exist a lot of real tests now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-08 18:53:15 +01:00
zdenop
7e27f78752
Merge pull request #2043 from stweil/copying
Remove redundant file COPYING
2018-11-08 15:25:47 +01:00
Stefan Weil
73cefff3a1 Include LICENSE file in distribution
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-08 15:14:59 +01:00
Stefan Weil
6184892905 Remove redundant file COPYING
Most of the information was already in README.md.
Add the missing hint for Leptonica, too, so the file can now be removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-08 13:16:39 +01:00
Egor Pugin
361f32641a
Merge pull request #2040 from stweil/docker
Update Dockerfile
2018-11-04 01:39:08 +03:00
Stefan Weil
ad30f52eed Dockerfile: Delete the apt-get lists after installing
This fixes an issue reported by Codacy and Hadolint.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-02 19:36:19 +01:00
Stefan Weil
0869fdfd16 Dockerfile: Replace deprecated MAINTAINER by LABEL
This fixes an issue reported by Codacy and Hadolint.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-02 19:36:19 +01:00