mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-27 20:59:36 +08:00
872816897a
Avoid 1) floating point division by 127, 2) conversion of bias to double, 3) FP addition, in favour of 1) integer multiplication by 127, and 2) integer addition. (Also costs extra work in the serialisation/deserialisation of the scale values, and conversion of weights to int formats, but these are all one offs). |
||
---|---|---|
.. | ||
fuzzers | ||
syntaxnet | ||
third_party/utf | ||
util/utf8 | ||
apiexample_test.cc | ||
applybox_test.cc | ||
baseapi_test.cc | ||
baseapi_thread_test.cc | ||
bitvector_test.cc | ||
cleanapi_test.cc | ||
colpartition_test.cc | ||
commandlineflags_test.cc | ||
cycletimer.h | ||
dawg_test.cc | ||
denorm_test.cc | ||
equationdetect_test.cc | ||
fileio_test.cc | ||
heap_test.cc | ||
imagedata_test.cc | ||
include_gunit.h | ||
indexmapbidi_test.cc | ||
intfeaturemap_test.cc | ||
intsimdmatrix_test.cc | ||
lang_model_test.cc | ||
layout_test.cc | ||
ligature_table_test.cc | ||
linlsq_test.cc | ||
loadlang_test.cc | ||
log.h | ||
lstm_recode_test.cc | ||
lstm_squashed_test.cc | ||
lstm_test.cc | ||
lstm_test.h | ||
lstmtrainer_test.cc | ||
Makefile.am | ||
mastertrainer_test.cc | ||
matrix_test.cc | ||
networkio_test.cc | ||
normstrngs_test.cc | ||
normstrngs_test.h | ||
nthitem_test.cc | ||
osd_test.cc | ||
pagesegmode_test.cc | ||
pango_font_info_test.cc | ||
paragraphs_test.cc | ||
params_model_test.cc | ||
progress_test.cc | ||
qrsequence_test.cc | ||
README.md | ||
recodebeam_test.cc | ||
rect_test.cc | ||
resultiterator_test.cc | ||
scanutils_test.cc | ||
shapetable_test.cc | ||
stats_test.cc | ||
stridemap_test.cc | ||
stringrenderer_test.cc | ||
tablefind_test.cc | ||
tablerecog_test.cc | ||
tabvector_test.cc | ||
tatweel_test.cc | ||
textlineprojection_test.cc | ||
tfile_test.cc | ||
unichar_test.cc | ||
unicharcompress_test.cc | ||
unicharset_test.cc | ||
validate_grapheme_test.cc | ||
validate_indic_test.cc | ||
validate_khmer_test.cc | ||
validate_myanmar_test.cc | ||
validator_test.cc |
Unit Testing for Tesseract
Requirements
Files and structure
├── langdata_lstm
│ ├── common.punc
│ ├── common.unicharambigs
│ ├── desired_bigrams.txt
│ ├── eng
│ │ ├── desired_characters
│ │ ├── eng.config
│ │ ├── eng.numbers
│ │ ├── eng.punc
│ │ ├── eng.singles_text
│ │ ├── eng.training_text
│ │ ├── eng.unicharambigs
│ │ ├── eng.wordlist
│ │ └── okfonts.txt
│ ├── extended
│ │ └── extended.config
│ ├── extendedhin
│ │ └── extendedhin.config
│ ├── font_properties
│ ├── forbidden_characters_default
│ ├── hin
│ │ ├── hin.config
│ │ ├── hin.numbers
│ │ ├── hin.punc
│ │ └── hin.wordlist
│ ├── kan
│ │ └── kan.config
│ ├── kor
│ │ └── kor.config
│ ├── osd
│ │ └── osd.unicharset
│ └── radical-stroke.txt
├── tessdata
│ ├── ara.traineddata
│ ├── chi_tra.traineddata
│ ├── eng.traineddata
│ ├── heb.traineddata
│ ├── hin.traineddata
│ ├── jpn.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── vie.traineddata
├── tessdata_best
│ ├── eng.traineddata
│ ├── fra.traineddata
│ ├── kmr.traineddata
│ └── osd.traineddata
├── tessdata_fast
│ ├── eng.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── script
│ └── Latin.traineddata
└── tesseract
├── abseil
...
├── test
├── unittest
└── VERSION
Fonts
- Microsoft fonts: arialbi.ttf, times.ttf, verdana.ttf - instalation guide
- ae_Arab.ttf
- dejavu-fonts: DejaVuSans-ExtraLight.ttf
- Lohit-Hindi.ttf
- UnBatang.ttf
Run tests
To run the tests, do the following in tesseract folder
autoreconf -fiv
git submodule update --init
export TESSDATA_PREFIX=/prefix/to/path/to/tessdata
make check