Flag neon_available_ is automatically set to true when __aarch64__ is defined,
but the actual check for neon_available_ required having also HAVE_NEON defined.
Now we check the flag also when only __aarch64__ is defined.
We also move to relying on both scales and output having been
padded to accomodate us writing more results than are actually
needed here. This was allowed for a few commits back.
Avoid 1) floating point division by 127, 2) conversion of
bias to double, 3) FP addition, in favour of 1) integer
multiplication by 127, and 2) integer addition.
(Also costs extra work in the serialisation/deserialisation of
the scale values, and conversion of weights to int formats, but
these are all one offs).
Currently, the size of the scales array is not rounded up
in the same way as the weights are. This blocks us pushing
the scale calculations into the SIMD, as when we "overread"
the end of the scale array, we potentially get errors.
Here, we adjust the intSimdMatrix stuff to ensure that the
scales array reserves enough entries to allow such overreads
to work.
This doesn't make any difference for now, but opens the way
for future optimisations.
The intsimdmatrix mechanism ensures that inputs would be
resized so that we'd only ever get "whole blocks" of data.
I'd assumed that that meant the same thing for scales/outputs
too, but this appears not to to be the case, as we can get
called (sometimes) with num_out % 8 == 7.
Possibly we could benefit from resizing those matrices so
that special cases in this innermost loop are not actually
required, but unless and until that is done, let's fix the
inner loop.
In tests on my pi3b+, a release build of my ghostscript integration
takes 2 minutes 27 seconds to render a PDF and OCR it with the
vanilla sources. With this NEON coded added the time drops to 37
seconds.
I have not tested the configure/Makefile changes as I'm not using
them.
This won't affect anything using the supplied build system. For
other projects that include tesseract within them, however, this
may make their life easier.
For example, I have an integration of Tesseract with Ghostscript,
in which tesseract is built as part of the Ghostscript build,
without using the tesseract build system.
The Ghostscript build system is makefile based, and has to work
on a range of make systems, including unix make, gnu make and
nmake. As such we have to avoid conditionals in the common
makefiles. It therefore becomes hard to build one set of files on
x86 systems, and another on (say) ARM systems.
Accordingly, this commit makes small tweaks to the architecture
specific files, so that they compile on EVERY platform; just they
only compile to anything useful on the appropriate platform.
Thus the makefiles can build all the files on all the systems, and
the preprocessor flags mean that the correct functions are actually
built.
- Replace AVX_OPT, AVX2_OPT, FMA_OPT, SSE41_OPT
- Replace AVX, AVX2, FMA, SSE4_1
- Write new HAVE_AVX, HAVE_AVX2, HAVE_FMA, HAVE_SSE4_1 into config_auto.h
- Put related conditionals in Makefile.am in one place
This makes the code clearer and fixes a log message in
IntSimdMatrixTest.AVX2.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The code was modernized using clang-tidy with "modernize-use-using".
The modified files were then formatted using clang-tidy with
"google-readability-braces-around-statements", then clang-format
was applied.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit d36231e3e4 did not distinguish
between AVX and AVX2, so AVX2 code was enabled for IntSimdMatrix
even when only AVX was supported.
This resulted in an illegal instruction.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Move IntDotProductSSE. That allows inlining of the code.
* Improve IntDotProductSSE by moving some instructions.
* Remove unused num_input_groups_ from IntSimdMatrix.
* Re-order elements in IntSimdMatrix to avoid padding.
Signed-off-by: Stefan Weil <sw@weilnetz.de>