mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-24 02:59:07 +08:00
9dfdac51c6
Currently, the size of the scales array is not rounded up in the same way as the weights are. This blocks us pushing the scale calculations into the SIMD, as when we "overread" the end of the scale array, we potentially get errors. Here, we adjust the intSimdMatrix stuff to ensure that the scales array reserves enough entries to allow such overreads to work. This doesn't make any difference for now, but opens the way for future optimisations. |
||
---|---|---|
.. | ||
fuzzers | ||
syntaxnet | ||
third_party/utf | ||
util/utf8 | ||
apiexample_test.cc | ||
applybox_test.cc | ||
baseapi_test.cc | ||
baseapi_thread_test.cc | ||
bitvector_test.cc | ||
cleanapi_test.cc | ||
colpartition_test.cc | ||
commandlineflags_test.cc | ||
cycletimer.h | ||
dawg_test.cc | ||
denorm_test.cc | ||
equationdetect_test.cc | ||
fileio_test.cc | ||
heap_test.cc | ||
imagedata_test.cc | ||
include_gunit.h | ||
indexmapbidi_test.cc | ||
intfeaturemap_test.cc | ||
intsimdmatrix_test.cc | ||
lang_model_test.cc | ||
layout_test.cc | ||
ligature_table_test.cc | ||
linlsq_test.cc | ||
loadlang_test.cc | ||
log.h | ||
lstm_recode_test.cc | ||
lstm_squashed_test.cc | ||
lstm_test.cc | ||
lstm_test.h | ||
lstmtrainer_test.cc | ||
Makefile.am | ||
mastertrainer_test.cc | ||
matrix_test.cc | ||
networkio_test.cc | ||
normstrngs_test.cc | ||
normstrngs_test.h | ||
nthitem_test.cc | ||
osd_test.cc | ||
pagesegmode_test.cc | ||
pango_font_info_test.cc | ||
paragraphs_test.cc | ||
params_model_test.cc | ||
progress_test.cc | ||
qrsequence_test.cc | ||
README.md | ||
recodebeam_test.cc | ||
rect_test.cc | ||
resultiterator_test.cc | ||
scanutils_test.cc | ||
shapetable_test.cc | ||
stats_test.cc | ||
stridemap_test.cc | ||
stringrenderer_test.cc | ||
tablefind_test.cc | ||
tablerecog_test.cc | ||
tabvector_test.cc | ||
tatweel_test.cc | ||
textlineprojection_test.cc | ||
tfile_test.cc | ||
unichar_test.cc | ||
unicharcompress_test.cc | ||
unicharset_test.cc | ||
validate_grapheme_test.cc | ||
validate_indic_test.cc | ||
validate_khmer_test.cc | ||
validate_myanmar_test.cc | ||
validator_test.cc |
Unit Testing for Tesseract
Requirements
Files and structure
├── langdata_lstm
│ ├── common.punc
│ ├── common.unicharambigs
│ ├── desired_bigrams.txt
│ ├── eng
│ │ ├── desired_characters
│ │ ├── eng.config
│ │ ├── eng.numbers
│ │ ├── eng.punc
│ │ ├── eng.singles_text
│ │ ├── eng.training_text
│ │ ├── eng.unicharambigs
│ │ ├── eng.wordlist
│ │ └── okfonts.txt
│ ├── extended
│ │ └── extended.config
│ ├── extendedhin
│ │ └── extendedhin.config
│ ├── font_properties
│ ├── forbidden_characters_default
│ ├── hin
│ │ ├── hin.config
│ │ ├── hin.numbers
│ │ ├── hin.punc
│ │ └── hin.wordlist
│ ├── kan
│ │ └── kan.config
│ ├── kor
│ │ └── kor.config
│ ├── osd
│ │ └── osd.unicharset
│ └── radical-stroke.txt
├── tessdata
│ ├── ara.traineddata
│ ├── chi_tra.traineddata
│ ├── eng.traineddata
│ ├── heb.traineddata
│ ├── hin.traineddata
│ ├── jpn.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── vie.traineddata
├── tessdata_best
│ ├── eng.traineddata
│ ├── fra.traineddata
│ ├── kmr.traineddata
│ └── osd.traineddata
├── tessdata_fast
│ ├── eng.traineddata
│ ├── kmr.traineddata
│ ├── osd.traineddata
│ └── script
│ └── Latin.traineddata
└── tesseract
├── abseil
...
├── test
├── unittest
└── VERSION
Fonts
- Microsoft fonts: arialbi.ttf, times.ttf, verdana.ttf - instalation guide
- ae_Arab.ttf
- dejavu-fonts: DejaVuSans-ExtraLight.ttf
- Lohit-Hindi.ttf
- UnBatang.ttf
Run tests
To run the tests, do the following in tesseract folder
autoreconf -fiv
git submodule update --init
export TESSDATA_PREFIX=/prefix/to/path/to/tessdata
make check