Shreeshrii
8749f3553e
LINEDATA=false
2019-03-23 19:16:49 +05:30
Shree
bcb7cf9846
sort arguments, use true/false instead of 1/0
2019-03-23 12:28:53 +00:00
Shree
c2db272134
Modify distort_image for Boolean
2019-03-22 17:02:46 +00:00
Shree
9b915d5efb
add --distort_image
2019-03-22 05:39:38 +00:00
Shree
f7ffde99d5
add --distort_image
2019-03-22 05:34:00 +00:00
zdenop
ac7ea4322a
Merge pull request #2335 from Shreeshrii/master
...
Changes to tesstrain.py - max_workers=8, distort_image=false
2019-03-17 15:27:34 +01:00
zdenop
26877ba703
check min. python version; os.uname is not available on windows
2019-03-17 15:25:48 +01:00
Shreeshrii
f8e8521606
Update tesstrain_utils.py
2019-03-17 15:32:35 +05:30
Shree
6fa8e1bb15
Set max_workers=8
2019-03-17 09:58:11 +00:00
Shree
e21499e81e
Set default value for distort_image
2019-03-17 09:54:16 +00:00
Stefan Weil
ee2f9bf7bf
Remove old comments in file headers
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Shree
d47b0d588a
Use LATIN_FONTS for kmr
2019-03-15 15:47:56 +00:00
Shree
3eee1d217a
Add kmr and kur_ara, remove kur from training scripts
2019-03-15 15:37:49 +00:00
Shree
b2ebf0195f
Add kmr and kur_ara, remove kur from training scripts
2019-03-15 14:39:39 +00:00
Shree
37befdf6c4
Add option for --distort_image
2019-03-15 13:32:36 +00:00
Stefan Weil
896698a4f5
Fix runtime error (left shift of negative value)
...
Runtime error:
src/training/util.h:37:28: runtime error: left shift of negative value -17
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 06:56:54 +01:00
Stefan Weil
5202208a8c
Remove globals.h
...
It only included other files which are already included where needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-11 19:01:23 +01:00
zdenop
f80085c0bf
Merge pull request #2289 from Armyke/master
...
Added an additional optional --tmp_dir parameter to specify the tempo…
2019-03-06 15:03:14 +01:00
Stefan Weil
1c7e00611b
Add initial support for traineddata files in standard archive formats
...
This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Armyke
56b04d4ea7
Added the same --tmp_dir flag to tesstrain_utils.sh
2019-03-04 14:05:25 +00:00
Armyke
25fa392887
Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive
2019-03-04 13:26:53 +00:00
Stefan Weil
295996ed05
commandlineflags: Fix compiler warnings (signed/unsigned)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:21:04 +01:00
Stefan Weil
fb0f1bcf66
BoxChar: Fix compiler warnings (signed/unsigned)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:04:54 +01:00
Stefan Weil
0e1a1fc3cf
Validator: Fix compiler warnings (signed/unsigned)
...
This also fixes a regression in validate_grapheme_test introduced
by commit 32e9d7c8f5
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 13:05:03 +01:00
zdenop
2ba8e0061a
Merge branch 'master' into mya
2019-03-01 18:37:24 +01:00
zdenop
646b043d2c
use space instead of tab
2019-03-01 14:36:09 +01:00
Shree
5ee1deaea2
correct handling of 0BF0-0BFA Tamil numbers and symbols
2019-03-01 13:21:49 +00:00
zdenop
d7ddc4c5b7
Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER
...
Treat U_ARABIC_NUMBER as LTR
2019-02-28 09:27:54 +01:00
Shree
25b02bf1f2
Treat U_ARABIC_NUMBER as LTR
2019-02-26 09:51:21 +00:00
Shreeshrii
2f71fe280c
Use alternative way to comment a block of code (using the c preprocessor).
...
https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382
Thanks @amitdo
2019-02-26 15:05:51 +05:30
Shree
449f1cd4ba
Remove test for Word started with a combiner
2019-02-25 18:47:42 +00:00
zdenop
25c43b1e7c
Merge branch 'master' into distort
2019-02-23 18:23:14 +01:00
Stefan Weil
b3e355a682
Remove whitespace at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-23 17:49:56 +01:00
Shreeshrii
34e4d6b1d7
Revert to 0 (50% percents of images inverted).
2019-02-23 17:59:00 +05:30
Shreeshrii
287d5341bf
TODO
2019-02-23 17:56:02 +05:30
Shreeshrii
3e3e1ed55d
Remove commented Code
2019-02-23 17:54:00 +05:30
Shree
2aded47a3c
Implement distort_image in text2image - default false
2019-02-22 12:27:27 +00:00
Shree
49ed3a72d4
implement PrepareDistortedPix as part of DegradeImage
2019-02-21 14:48:29 +00:00
Stefan Weil
b3bd23edb7
Remove whitespace at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-19 13:53:31 +01:00
Stefan Weil
b95598a0b1
Merge pull request #2070 from pndaza/master
...
add missed letters ( ၌ ၍ ၎ ၏ ) and symbols ( ၊ ။ ) - 0x104a to 0x104f -
2019-02-19 12:22:53 +01:00
Shree
a044f64375
fix Myanmar validation rules as per Unicode charts
2019-02-15 04:40:55 +00:00
Shreeshrii
c28a68115e
Merge branch 'master' into boxtiff
2019-02-02 23:42:39 +05:30
Shree Devi Kumar
d9590f8adf
allow user specified box/tiff pairs with tesstrain.sh
2019-02-02 11:35:45 +00:00
Shree Devi Kumar
323361b902
allow user specified box/tiff pairs with tesstrain.sh
2019-02-02 11:33:32 +00:00
Shree Devi Kumar
ad223296af
use --xsize instead of --x_size
...
(cherry picked from commit 94b8988b8cca3812137933db00750bd6e2e84e32)
2019-02-02 11:08:34 +00:00
Shree Devi Kumar
4d9bc11fd3
add --xsize as parameter for tesstrain
2019-01-27 07:00:25 +00:00
zdenop
059c50be8c
Merge pull request #2184 from stweil/tests
...
Fix and enable stringrenderer_test
2019-01-24 07:59:07 +01:00
Diego de la Hera
1a398a5b5d
removed reference to unbound variable
2019-01-23 15:04:16 -03:00
Stefan Weil
ecf73f5bc7
training: Don't terminate after processing 8 fonts or 8 images
...
tesstrain_utils.sh sets the shell flag -e, so it exits immediately
if a command exits with a non-zero status.
The following command returns a non-zero status as soon as counter is a
multiple of par_factor (par_factor=8, that means as soon as 8 fonts or
images are processed):
let rem=counter%par_factor
The new code fixes this undesired exit.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 17:26:40 +01:00
Stefan Weil
32e9d7c8f5
training: Fix some compiler warnings (signed/unsigned)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 13:55:13 +01:00