Commit Graph

3577 Commits

Author SHA1 Message Date
zdenop
53600c677e
Merge pull request #2092 from stweil/format
Format new ALTO code with clang-format
2018-11-30 08:08:52 +01:00
zdenop
53ee0595f0
Merge pull request #2091 from stweil/alto
Add configuration file for ALTO to installation
2018-11-30 08:08:04 +01:00
zdenop
f6493dd5e8
Merge pull request #2090 from stweil/inline
Optimize performance by using inline functions
2018-11-30 08:07:45 +01:00
Stefan Weil
c59c45fb3e Fix Amharic font list
This was reported for the Python code by LGTM.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 08:00:22 +01:00
Stefan Weil
57d0ae06c0 Use Python3 for LGTM
The Python scripts require Python3 and give errors with Python2.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 07:11:53 +01:00
Stefan Weil
b148644c1b Make Python script executable
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 07:08:45 +01:00
Stefan Weil
ed48b2a8f5 Format new ALTO code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Stefan Weil
e817d93e62 Add configuration file for ALTO to installation
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:17:04 +01:00
Jake Sebright
d7cee03a94 Add support for ALTO output 2018-11-30 06:09:36 +01:00
Stefan Weil
3c047f0ac8 Optimize performance by using inline function DotProduct
This improves performace for the "best" models because it
avoids function calls.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-29 21:43:41 +01:00
Stefan Weil
e161501df6 Optimize performance by using inline MatrixDotVectorInternal
This improves performace for the "best" models because it
avoids function calls.

The compiler also knows the passed values for the parameters
add_bias_fwd and skip_bias_back.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-29 21:37:32 +01:00
Egor Pugin
685b136d89
Fix incorrect condition. 2018-11-29 19:02:54 +03:00
Stefan Weil
9e49539429
Merge pull request #2079 from ajschaeper/master
Add example in README.md
2018-11-26 09:44:24 +01:00
ajschaeper
0223abcb96 Add example in README.md 2018-11-26 09:30:22 +01:00
Egor Pugin
267b79982d
Merge pull request #2076 from jbarlow83/pythonize-training
RFC: Pythonize tesstrain.sh and friends
2018-11-25 13:31:48 +03:00
James R. Barlow
8aa25239ae Fix some of Codacy's complaints 2018-11-24 16:59:01 -08:00
James R. Barlow
9122e6249e Autoreformat code
This increases the deviation from the bash scripts so is done separately.
2018-11-24 00:50:29 -08:00
James R. Barlow
d9ae7ecc49 Pythonize tesstrain.sh -> tesstrain.py
This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently
supports only LSTM and not the Tesseract 3 training mode.

I attempted to keep source changes minimal so it would be easy to compare
bash to Python in code review and confirm equivalence.

Python 3.6+ is required.  Ubuntu 18.04 ships Python 3.6 and it is a mandatory
package (the package manager is also written in Python), so it is available
in the baseline Tesseract 4.0 system.

There are minor output and behavioral changes, and advantages.  Python's loggingis used.  Temporary files are only deleted on success, so they can be inspected
if training files.  Console output is more terse and the log file is more
verbose.  And there are progress bars!  (The python3-tqdm package is required.)
Where tesstrain.sh would sometimes fail without explanation and return an error
code of 1, it is much easier to find the point of failure in this version.
That was also the main motivation for this work.

Argument checking is also more comprehensive.
2018-11-24 00:45:35 -08:00
pndaza
fc8a3d5bbc combine condition with next 2018-11-24 09:21:05 +06:30
pndaza
5c85d8e03d add missed letters and symbols - 0x104a to 0x104f - 2018-11-24 09:14:31 +06:30
Egor Pugin
b08624acc0 Reapply: Add sw build system script (future cppan replacement). 2018-11-23 01:47:02 +03:00
Egor Pugin
19580b18bc Revert "Add sw build system script (future cppan replacement)."
This reverts commit b1e20043fd.
2018-11-23 01:44:58 +03:00
Egor Pugin
e98661c0e7
Merge pull request #2065 from thegrizzlylabs/fix-configure-android
fix(configure) Don't add rt on Android
2018-11-21 23:29:44 +03:00
Egor Pugin
b1e20043fd Add sw build system script (future cppan replacement). 2018-11-20 00:12:26 +03:00
zdenop
def7cdd641
Merge pull request #2063 from stweil/tprintf
Remove unused include statements for tprintf.h
2018-11-18 19:02:07 +01:00
Stefan Weil
9b783822a0 Remove unused include statements for tprintf.h
Format also a call of tprintf and add a missing explicit include statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-18 17:25:01 +01:00
zdenop
670ce8e4cf
Merge pull request #2060 from stweil/overflow
Fix wrong results from function streamtofloat
2018-11-18 08:11:55 +01:00
Stefan Weil
a93426c9ff Fix wrong results from function streamtofloat
The local variable k should be 10 ^ (number of digits after comma),
but will overflow when there are more than 9 digits after the comma
because an int value cannot store 10000000000.

This results in wrong double values read from .tr files for example
(or in a runtime exception if Tesseract was compiled with -ftrapv).

Using uint64_t does not fix the general problem but allows more digits
which should be sufficient for the data read by Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-17 20:02:21 +01:00
zdenop
b67ea2c1a7
Merge pull request #2058 from stweil/sh-fix
Fix some issues with the shell scripts for training
2018-11-17 09:51:26 +01:00
Stefan Weil
acca4fb999 Fix some unbound variables and other small issues in training shell scripts
Fix also the logging helper functions to work without log file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-16 11:13:46 +01:00
Stefan Weil
a4b03fbb27 Fix warning from shellcheck
shellcheck warning:

    In /tesseract/src/training/tesstrain_utils.sh line 209:
        TIMESTAMP=`date +%Y-%m-%d`
                  ^-- SC2006: Use $(..) instead of legacy `..`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 17:45:20 +01:00
John Lin
bfe58aa56f Fix unbound variable $FONTS 2018-11-15 17:43:15 +01:00
Guillaume Gigaud
92b8833838
fix(configure) Don't add rt on Android
Library rt is included in the libc on Android: https://developer.android.com/ndk/guides/stable_apis#a3
2018-11-15 13:56:28 +01:00
Stefan Weil
0915cbd535 Simplify shell script using mktemp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 13:36:52 +01:00
John Lin
edb76e281a Simplify MKTEMP_DT logic 2018-11-15 10:38:40 +08:00
John Lin
dbfc89f9af Fix mktemp in tesstrain_utils.sh
The commit 10f2c45c00 unified the usage of mktemp, but with a
incorrect bash syntax and unnecessary definition of LANG_CODE
and TIMESTAMP. This patch fixes the above problems.
2018-11-14 09:04:34 +08:00
zdenop
ec476f908e
Merge pull request #2050 from stweil/leaks
Fix some memory leaks in unit tests
2018-11-13 12:59:36 +01:00
Stefan Weil
ff5347c4ad Fix memory leak in osd_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:23 +01:00
Stefan Weil
5209aa6c95 Fix memory leak in loadlang_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:10 +01:00
Stefan Weil
74f6d0e7ff Fix memory leak in apiexample_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:08:42 +01:00
Stefan Weil
303ac97102 Fix memory leaks and typos in progress_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 08:24:50 +01:00
Zdenko Podobný
4ef51d8bc0 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-11-12 12:54:33 +01:00
Ray Smith
ce88adbf32 fix issue #1192 2018-11-12 12:53:12 +01:00
zdenop
de3734a0f4
Merge pull request #2046 from stweil/tests
Update test submodule
2018-11-09 09:18:00 +01:00
Stefan Weil
fae47eb876 Update test submodule
This is needed to include the moved langtests and unlvtests.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-09 08:18:19 +01:00
zdenop
724957167e fix typo in non VS build 2018-11-08 23:10:14 +01:00
zdenop
eb104f9fe4 VS build: fix warning C4996: The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name. 2018-11-08 22:55:04 +01:00
zdenop
cdfb768010 move langtests and unlvtests from tesseract-ocr repository to test repository 2018-11-08 22:31:32 +01:00
zdenop
cbef2ebe12 implement patches vcpkg tesseract 2018-11-08 21:37:47 +01:00
zdenop
7a7f226228 ocrclass: Remove unused macros
Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccutil/ocrclass.h
2018-11-08 20:23:36 +01:00