mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-12-11 23:19:04 +08:00
971c6e6d6b
Signed-off-by: Stefan Weil <sw@weilnetz.de> |
||
---|---|---|
.. | ||
fuzzers | ||
syntaxnet | ||
third_party/utf | ||
util/utf8 | ||
apiexample_test.cc | ||
applybox_test.cc | ||
baseapi_test.cc | ||
baseapi_thread_test.cc | ||
bitvector_test.cc | ||
capiexample_c_test.c | ||
capiexample_test.cc | ||
cleanapi_test.cc | ||
colpartition_test.cc | ||
commandlineflags_test.cc | ||
cycletimer.h | ||
dawg_test.cc | ||
denorm_test.cc | ||
doubleptr.h | ||
equationdetect_test.cc | ||
fileio_test.cc | ||
heap_test.cc | ||
imagedata_test.cc | ||
include_gunit.h | ||
indexmapbidi_test.cc | ||
intfeaturemap_test.cc | ||
intsimdmatrix_test.cc | ||
lang_model_test.cc | ||
layout_test.cc | ||
ligature_table_test.cc | ||
linlsq_test.cc | ||
list_test.cc | ||
loadlang_test.cc | ||
log.h | ||
lstm_recode_test.cc | ||
lstm_squashed_test.cc | ||
lstm_test.cc | ||
lstm_test.h | ||
lstmtrainer_test.cc | ||
mastertrainer_test.cc | ||
matrix_test.cc | ||
networkio_test.cc | ||
normstrngs_test.cc | ||
normstrngs_test.h | ||
nthitem_test.cc | ||
osd_test.cc | ||
pagesegmode_test.cc | ||
pango_font_info_test.cc | ||
paragraphs_test.cc | ||
params_model_test.cc | ||
progress_test.cc | ||
qrsequence_test.cc | ||
README.md | ||
recodebeam_test.cc | ||
rect_test.cc | ||
resultiterator_test.cc | ||
scanutils_test.cc | ||
shapetable_test.cc | ||
stats_test.cc | ||
stridemap_test.cc | ||
stringrenderer_test.cc | ||
tablefind_test.cc | ||
tablerecog_test.cc | ||
tabvector_test.cc | ||
tatweel_test.cc | ||
tesseract_leaksanitizer.supp | ||
textlineprojection_test.cc | ||
tfile_test.cc | ||
unichar_test.cc | ||
unicharcompress_test.cc | ||
unicharset_test.cc | ||
validate_grapheme_test.cc | ||
validate_indic_test.cc | ||
validate_khmer_test.cc | ||
validate_myanmar_test.cc | ||
validator_test.cc |
Unit Testing for Tesseract
Requirements
Files and structure
├── langdata_lstm
│ ├── common.punc
│ ├── common.unicharambigs
│ ├── desired_bigrams.txt
│ ├── eng
│ │ ├── desired_characters
│ │ ├── eng.config
│ │ ├── eng.numbers
│ │ ├── eng.punc
│ │ ├── eng.singles_text
│ │ ├── eng.training_text
│ │ ├── eng.unicharambigs
│ │ ├── eng.wordlist
│ │ └── okfonts.txt
│ ├── extended
│ │ └── extended.config
│ ├── extendedhin
│ │ └── extendedhin.config
│ ├── font_properties
│ ├── forbidden_characters_default
│ ├── hin
│ │ ├── hin.config
│ │ ├── hin.numbers
│ │ ├── hin.punc
│ │ └── hin.wordlist
│ ├── kan
│ │ └── kan.config
│ ├── kor
│ │ └── kor.config
│ ├── osd
│ │ └── osd.unicharset
│ └── radical-stroke.txt
├── tessdata
│ ├── ara.traineddata
│ ├── chi_tra.traineddata
│ ├── eng.traineddata
│ ├── heb.traineddata
│ ├── hin.traineddata
│ ├── jpn.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── vie.traineddata
├── tessdata_best
│ ├── eng.traineddata
│ ├── fra.traineddata
│ ├── kmr.traineddata
│ └── osd.traineddata
├── tessdata_fast
│ ├── eng.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── script
│ └── Latin.traineddata
└── tesseract
├── abseil
...
├── test
├── unittest
└── VERSION
Fonts
- Microsoft fonts: arialbi.ttf, times.ttf, verdana.ttf - instalation guide
- ae_Arab.ttf
- dejavu-fonts: DejaVuSans-ExtraLight.ttf
- Lohit-Hindi.ttf
- UnBatang.ttf
Run tests
To run the tests, do the following in tesseract folder
autoreconf -fiv
git submodule update --init
export TESSDATA_PREFIX=/prefix/to/path/to/tessdata
make check