zdenop
07b140364f
Merge pull request #2093 from stweil/python
...
Updates for Python scripts
2018-11-30 08:10:20 +01:00
zdenop
53600c677e
Merge pull request #2092 from stweil/format
...
Format new ALTO code with clang-format
2018-11-30 08:08:52 +01:00
zdenop
53ee0595f0
Merge pull request #2091 from stweil/alto
...
Add configuration file for ALTO to installation
2018-11-30 08:08:04 +01:00
zdenop
f6493dd5e8
Merge pull request #2090 from stweil/inline
...
Optimize performance by using inline functions
2018-11-30 08:07:45 +01:00
Stefan Weil
c59c45fb3e
Fix Amharic font list
...
This was reported for the Python code by LGTM.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 08:00:22 +01:00
Stefan Weil
57d0ae06c0
Use Python3 for LGTM
...
The Python scripts require Python3 and give errors with Python2.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 07:11:53 +01:00
Stefan Weil
b148644c1b
Make Python script executable
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 07:08:45 +01:00
Stefan Weil
ed48b2a8f5
Format new ALTO code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Stefan Weil
e817d93e62
Add configuration file for ALTO to installation
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:17:04 +01:00
Jake Sebright
d7cee03a94
Add support for ALTO output
2018-11-30 06:09:36 +01:00
Stefan Weil
3c047f0ac8
Optimize performance by using inline function DotProduct
...
This improves performace for the "best" models because it
avoids function calls.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-29 21:43:41 +01:00
Stefan Weil
e161501df6
Optimize performance by using inline MatrixDotVectorInternal
...
This improves performace for the "best" models because it
avoids function calls.
The compiler also knows the passed values for the parameters
add_bias_fwd and skip_bias_back.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-29 21:37:32 +01:00
Egor Pugin
685b136d89
Fix incorrect condition.
2018-11-29 19:02:54 +03:00
Stefan Weil
9e49539429
Merge pull request #2079 from ajschaeper/master
...
Add example in README.md
2018-11-26 09:44:24 +01:00
ajschaeper
0223abcb96
Add example in README.md
2018-11-26 09:30:22 +01:00
Egor Pugin
267b79982d
Merge pull request #2076 from jbarlow83/pythonize-training
...
RFC: Pythonize tesstrain.sh and friends
2018-11-25 13:31:48 +03:00
James R. Barlow
8aa25239ae
Fix some of Codacy's complaints
2018-11-24 16:59:01 -08:00
James R. Barlow
9122e6249e
Autoreformat code
...
This increases the deviation from the bash scripts so is done separately.
2018-11-24 00:50:29 -08:00
James R. Barlow
d9ae7ecc49
Pythonize tesstrain.sh -> tesstrain.py
...
This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently
supports only LSTM and not the Tesseract 3 training mode.
I attempted to keep source changes minimal so it would be easy to compare
bash to Python in code review and confirm equivalence.
Python 3.6+ is required. Ubuntu 18.04 ships Python 3.6 and it is a mandatory
package (the package manager is also written in Python), so it is available
in the baseline Tesseract 4.0 system.
There are minor output and behavioral changes, and advantages. Python's loggingis used. Temporary files are only deleted on success, so they can be inspected
if training files. Console output is more terse and the log file is more
verbose. And there are progress bars! (The python3-tqdm package is required.)
Where tesstrain.sh would sometimes fail without explanation and return an error
code of 1, it is much easier to find the point of failure in this version.
That was also the main motivation for this work.
Argument checking is also more comprehensive.
2018-11-24 00:45:35 -08:00
pndaza
fc8a3d5bbc
combine condition with next
2018-11-24 09:21:05 +06:30
pndaza
5c85d8e03d
add missed letters and symbols - 0x104a to 0x104f -
2018-11-24 09:14:31 +06:30
Egor Pugin
b08624acc0
Reapply: Add sw build system script (future cppan replacement).
2018-11-23 01:47:02 +03:00
Egor Pugin
19580b18bc
Revert "Add sw build system script (future cppan replacement)."
...
This reverts commit b1e20043fd
.
2018-11-23 01:44:58 +03:00
Egor Pugin
e98661c0e7
Merge pull request #2065 from thegrizzlylabs/fix-configure-android
...
fix(configure) Don't add rt on Android
2018-11-21 23:29:44 +03:00
Egor Pugin
b1e20043fd
Add sw build system script (future cppan replacement).
2018-11-20 00:12:26 +03:00
zdenop
def7cdd641
Merge pull request #2063 from stweil/tprintf
...
Remove unused include statements for tprintf.h
2018-11-18 19:02:07 +01:00
Stefan Weil
9b783822a0
Remove unused include statements for tprintf.h
...
Format also a call of tprintf and add a missing explicit include statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-18 17:25:01 +01:00
zdenop
670ce8e4cf
Merge pull request #2060 from stweil/overflow
...
Fix wrong results from function streamtofloat
2018-11-18 08:11:55 +01:00
Stefan Weil
a93426c9ff
Fix wrong results from function streamtofloat
...
The local variable k should be 10 ^ (number of digits after comma),
but will overflow when there are more than 9 digits after the comma
because an int value cannot store 10000000000.
This results in wrong double values read from .tr files for example
(or in a runtime exception if Tesseract was compiled with -ftrapv).
Using uint64_t does not fix the general problem but allows more digits
which should be sufficient for the data read by Tesseract.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-17 20:02:21 +01:00
zdenop
b67ea2c1a7
Merge pull request #2058 from stweil/sh-fix
...
Fix some issues with the shell scripts for training
2018-11-17 09:51:26 +01:00
Stefan Weil
acca4fb999
Fix some unbound variables and other small issues in training shell scripts
...
Fix also the logging helper functions to work without log file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-16 11:13:46 +01:00
Stefan Weil
a4b03fbb27
Fix warning from shellcheck
...
shellcheck warning:
In /tesseract/src/training/tesstrain_utils.sh line 209:
TIMESTAMP=`date +%Y-%m-%d`
^-- SC2006: Use $(..) instead of legacy `..`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 17:45:20 +01:00
John Lin
bfe58aa56f
Fix unbound variable $FONTS
2018-11-15 17:43:15 +01:00
Guillaume Gigaud
92b8833838
fix(configure) Don't add rt on Android
...
Library rt is included in the libc on Android: https://developer.android.com/ndk/guides/stable_apis#a3
2018-11-15 13:56:28 +01:00
Stefan Weil
0915cbd535
Simplify shell script using mktemp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-15 13:36:52 +01:00
John Lin
edb76e281a
Simplify MKTEMP_DT logic
2018-11-15 10:38:40 +08:00
John Lin
dbfc89f9af
Fix mktemp in tesstrain_utils.sh
...
The commit 10f2c45c00
unified the usage of mktemp, but with a
incorrect bash syntax and unnecessary definition of LANG_CODE
and TIMESTAMP. This patch fixes the above problems.
2018-11-14 09:04:34 +08:00
zdenop
ec476f908e
Merge pull request #2050 from stweil/leaks
...
Fix some memory leaks in unit tests
2018-11-13 12:59:36 +01:00
Stefan Weil
ff5347c4ad
Fix memory leak in osd_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:23 +01:00
Stefan Weil
5209aa6c95
Fix memory leak in loadlang_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:09:10 +01:00
Stefan Weil
74f6d0e7ff
Fix memory leak in apiexample_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 09:08:42 +01:00
Stefan Weil
303ac97102
Fix memory leaks and typos in progress_test
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-13 08:24:50 +01:00
Zdenko Podobný
4ef51d8bc0
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2018-11-12 12:54:33 +01:00
Ray Smith
ce88adbf32
fix issue #1192
2018-11-12 12:53:12 +01:00
zdenop
de3734a0f4
Merge pull request #2046 from stweil/tests
...
Update test submodule
2018-11-09 09:18:00 +01:00
Stefan Weil
fae47eb876
Update test submodule
...
This is needed to include the moved langtests and unlvtests.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-09 08:18:19 +01:00
zdenop
724957167e
fix typo in non VS build
2018-11-08 23:10:14 +01:00
zdenop
eb104f9fe4
VS build: fix warning C4996: The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name.
2018-11-08 22:55:04 +01:00
zdenop
cdfb768010
move langtests and unlvtests from tesseract-ocr repository to test repository
2018-11-08 22:31:32 +01:00
zdenop
cbef2ebe12
implement patches vcpkg tesseract
2018-11-08 21:37:47 +01:00