mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-27 12:49:35 +08:00
d9ae7ecc49
This is a lightweight, semi-Pythonic conversion of tesstrain.sh that currently supports only LSTM and not the Tesseract 3 training mode. I attempted to keep source changes minimal so it would be easy to compare bash to Python in code review and confirm equivalence. Python 3.6+ is required. Ubuntu 18.04 ships Python 3.6 and it is a mandatory package (the package manager is also written in Python), so it is available in the baseline Tesseract 4.0 system. There are minor output and behavioral changes, and advantages. Python's loggingis used. Temporary files are only deleted on success, so they can be inspected if training files. Console output is more terse and the log file is more verbose. And there are progress bars! (The python3-tqdm package is required.) Where tesstrain.sh would sometimes fail without explanation and return an error code of 1, it is much easier to find the point of failure in this version. That was also the main motivation for this work. Argument checking is also more comprehensive.
119 lines
1.5 KiB
Plaintext
119 lines
1.5 KiB
Plaintext
*~
|
|
# Windows
|
|
*.user
|
|
*.log
|
|
*.tlog
|
|
*.cache
|
|
*.obj
|
|
*.sdf
|
|
*.opensdf
|
|
*.lastbuildstate
|
|
*.unsuccessfulbuild
|
|
*.suo
|
|
*.res
|
|
*.ipch
|
|
*.manifest
|
|
|
|
# Linux
|
|
# ignore local configuration
|
|
config.*
|
|
config/*
|
|
Makefile
|
|
Makefile.in
|
|
*.m4
|
|
|
|
# ignore help scripts/files
|
|
configure
|
|
libtool
|
|
stamp-h1
|
|
tesseract.pc
|
|
config_auto.h
|
|
/doc/html/*
|
|
/doc/*.1
|
|
/doc/*.5
|
|
/doc/*.html
|
|
/doc/*.xml
|
|
|
|
# generated version file
|
|
/src/api/tess_version.h
|
|
|
|
# executables
|
|
/src/api/tesseract
|
|
/src/training/ambiguous_words
|
|
/src/training/classifier_tester
|
|
/src/training/cntraining
|
|
/src/training/combine_tessdata
|
|
/src/training/dawg2wordlist
|
|
/src/training/merge_unicharsets
|
|
/src/training/mftraining
|
|
/src/training/set_unicharset_properties
|
|
/src/training/shapeclustering
|
|
/src/training/text2image
|
|
/src/training/unicharset_extractor
|
|
/src/training/wordlist2dawg
|
|
|
|
*.patch
|
|
|
|
# files generated by libtool
|
|
/src/training/combine_lang_model
|
|
/src/training/lstmeval
|
|
/src/training/lstmtraining
|
|
|
|
# ignore compilation files
|
|
build/*
|
|
/bin
|
|
*/.deps/*
|
|
*/.libs/*
|
|
*/*/.deps/*
|
|
*/*/.libs/*
|
|
*.lo
|
|
*.la
|
|
*.o
|
|
*.Plo
|
|
*.a
|
|
*.class
|
|
*.jar
|
|
__pycache__
|
|
|
|
# tessdata
|
|
*.traineddata
|
|
|
|
# OpenCL
|
|
tesseract_opencl_profile_devices.dat
|
|
kernel*.bin
|
|
|
|
# build dirs
|
|
/build*
|
|
/.cppan
|
|
/cppan
|
|
/*.dll
|
|
/*.lib
|
|
/*.exe
|
|
/*.lnk
|
|
/win*
|
|
.vs*
|
|
.s*
|
|
|
|
# files generated by "make check"
|
|
/tests/.dirstamp
|
|
/unittest/*.trs
|
|
|
|
# test programs
|
|
/unittest/*_test
|
|
/unittest/primesbitvector
|
|
/unittest/primesmap
|
|
/unittest/tesseracttests
|
|
|
|
# generated files from unlvtests
|
|
times.txt
|
|
/unlvtests/results*
|
|
|
|
# snap packaging specific rules
|
|
/parts/
|
|
/stage/
|
|
/prime/
|
|
/snap/.snapcraft/
|
|
|
|
/*.snap
|
|
/*_source.tar.bz2
|