Stefan Weil
78a957b989
Remove spaces a line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-13 18:54:42 +02:00
Stefan Weil
72c874140e
Modernize code by replacing C type casts
...
This was done using clang-tidy.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 09:04:51 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
...
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Robert Schubert
25a42ea42f
fixed failure report for tesstrain commands:
...
- with `set -e` in effect, looking at stdout
to detect failure is too late
2019-04-06 08:13:03 +02:00
Robert Schubert
d5584e793e
fixed failure report for tesstrain commands:
...
- with `set -e` in effect, it does not make sense
to query `$?` indirectly
2019-04-06 08:13:03 +02:00
Stefan Weil
802f42e821
Remove BOOL8, TRUE, FALSE from host.h
...
Remove unneeded include statements for host.h, add required ones and
update the comments for the remaining include statements.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:27:20 +02:00
Stefan Weil
cbb5e729a1
classify: Use bool and replace TRUE, FALSE
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:53:50 +02:00
Stefan Weil
664811a869
Replace BOOL8, TRUE, FALSE by bool, true, false
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:28:28 +02:00
zdenop
5f06402755
python: optimize imports, reformat code
2019-03-31 16:53:39 +02:00
zdenop
2e9fd69c9e
use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"
2019-03-31 16:53:33 +02:00
zdenop
a0527b41bd
fix LGTM reports for python
2019-03-31 16:53:25 +02:00
Shreeshrii
ea36e94e58
fix Could not parse bool from flag ( #2359 )
2019-03-29 14:50:21 +01:00
Stefan Weil
f877640bc9
Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval
...
tesstrain: check failure of subjobs
2019-03-25 16:10:09 +01:00
Stefan Weil
d8d2f6f48a
Fix broken shell scripts for training
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 15:32:43 +01:00
Shreeshrii
8749f3553e
LINEDATA=false
2019-03-23 19:16:49 +05:30
Shree
bcb7cf9846
sort arguments, use true/false instead of 1/0
2019-03-23 12:28:53 +00:00
Shree
c2db272134
Modify distort_image for Boolean
2019-03-22 17:02:46 +00:00
Shree
9b915d5efb
add --distort_image
2019-03-22 05:39:38 +00:00
Shree
f7ffde99d5
add --distort_image
2019-03-22 05:34:00 +00:00
zdenop
ac7ea4322a
Merge pull request #2335 from Shreeshrii/master
...
Changes to tesstrain.py - max_workers=8, distort_image=false
2019-03-17 15:27:34 +01:00
zdenop
26877ba703
check min. python version; os.uname is not available on windows
2019-03-17 15:25:48 +01:00
Shreeshrii
f8e8521606
Update tesstrain_utils.py
2019-03-17 15:32:35 +05:30
Shree
6fa8e1bb15
Set max_workers=8
2019-03-17 09:58:11 +00:00
Shree
e21499e81e
Set default value for distort_image
2019-03-17 09:54:16 +00:00
Stefan Weil
ee2f9bf7bf
Remove old comments in file headers
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Shree
d47b0d588a
Use LATIN_FONTS for kmr
2019-03-15 15:47:56 +00:00
Shree
3eee1d217a
Add kmr and kur_ara, remove kur from training scripts
2019-03-15 15:37:49 +00:00
Shree
b2ebf0195f
Add kmr and kur_ara, remove kur from training scripts
2019-03-15 14:39:39 +00:00
Shree
37befdf6c4
Add option for --distort_image
2019-03-15 13:32:36 +00:00
Robert Schubert
14346e56b0
tesstrain: catch+handle SIGINT (to stop waiting on subjobs)
2019-03-15 00:03:16 +01:00
Robert Schubert
6cbad17e30
tesstrain: check all subjobs' retval
2019-03-14 14:38:51 +01:00
Robert Schubert
5316bcbb94
tesstrain: check failure of subjobs
2019-03-14 11:42:01 +01:00
Stefan Weil
896698a4f5
Fix runtime error (left shift of negative value)
...
Runtime error:
src/training/util.h:37:28: runtime error: left shift of negative value -17
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 06:56:54 +01:00
Stefan Weil
5202208a8c
Remove globals.h
...
It only included other files which are already included where needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-11 19:01:23 +01:00
zdenop
f80085c0bf
Merge pull request #2289 from Armyke/master
...
Added an additional optional --tmp_dir parameter to specify the tempo…
2019-03-06 15:03:14 +01:00
Stefan Weil
1c7e00611b
Add initial support for traineddata files in standard archive formats
...
This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Armyke
56b04d4ea7
Added the same --tmp_dir flag to tesstrain_utils.sh
2019-03-04 14:05:25 +00:00
Armyke
25fa392887
Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive
2019-03-04 13:26:53 +00:00
Stefan Weil
295996ed05
commandlineflags: Fix compiler warnings (signed/unsigned)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:21:04 +01:00
Stefan Weil
fb0f1bcf66
BoxChar: Fix compiler warnings (signed/unsigned)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:04:54 +01:00
Stefan Weil
0e1a1fc3cf
Validator: Fix compiler warnings (signed/unsigned)
...
This also fixes a regression in validate_grapheme_test introduced
by commit 32e9d7c8f5
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 13:05:03 +01:00
zdenop
2ba8e0061a
Merge branch 'master' into mya
2019-03-01 18:37:24 +01:00
zdenop
646b043d2c
use space instead of tab
2019-03-01 14:36:09 +01:00
Shree
5ee1deaea2
correct handling of 0BF0-0BFA Tamil numbers and symbols
2019-03-01 13:21:49 +00:00
zdenop
d7ddc4c5b7
Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER
...
Treat U_ARABIC_NUMBER as LTR
2019-02-28 09:27:54 +01:00
Shree
25b02bf1f2
Treat U_ARABIC_NUMBER as LTR
2019-02-26 09:51:21 +00:00
Shreeshrii
2f71fe280c
Use alternative way to comment a block of code (using the c preprocessor).
...
https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382
Thanks @amitdo
2019-02-26 15:05:51 +05:30
Shree
449f1cd4ba
Remove test for Word started with a combiner
2019-02-25 18:47:42 +00:00
zdenop
25c43b1e7c
Merge branch 'master' into distort
2019-02-23 18:23:14 +01:00
Stefan Weil
b3e355a682
Remove whitespace at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-23 17:49:56 +01:00