In tests on my pi3b+, a release build of my ghostscript integration
takes 2 minutes 27 seconds to render a PDF and OCR it with the
vanilla sources. With this NEON coded added the time drops to 37
seconds.
I have not tested the configure/Makefile changes as I'm not using
them.
- Use less noinst_LTLIBRARIES (saves build time and disk space)
- Move DISABLED_LEGACY_ENGINE from compiler flags to config_auto.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- Replace AVX_OPT, AVX2_OPT, FMA_OPT, SSE41_OPT
- Replace AVX, AVX2, FMA, SSE4_1
- Write new HAVE_AVX, HAVE_AVX2, HAVE_FMA, HAVE_SSE4_1 into config_auto.h
- Put related conditionals in Makefile.am in one place
This makes the code clearer and fixes a log message in
IntSimdMatrixTest.AVX2.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
`tesseract --version` now also shows the version of libcurl and related
libraries if it was build with libcurl.
The preprocessor macro HAVE_LIBCURL is now defined in config_auto.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This allows OCR of images from the internet without downloading them first:
tesseract http://IMAGE_URL OUTPUT ...
It uses libcurl.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This allows using new features of C++17 conditionally.
Simplify also the code which checks and sets the C++ version.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AC_CHECK_FILE does not work in cross builds. Such builds aborted.
Replace it by AC_CHECK_HEADERS. This fixes cross builds.
To enable TensorFlow in cross builds, more work is needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.
Simplify also the rules for linking Tensorflow.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It expects include files in /usr/include/tensorflow.
* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
(so the Tensorflow sources are not needed for the build)
* Update Makefiles
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:
configure:4224: checking whether C++ compiler accepts -mavx
configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx conftest.cpp >&5
conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
;
^
1 error generated.
Add -Wno-extra-semi-stmt to disable those errors if possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wunused-macros. Add -Wno-unused-macros to disable those
errors if possible.
Simplify also the setting of several conditionals (AVX, AVX2, ...).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
gcc fails if an unsupported compile flag is given, but clang and clang++
normally only emit a warning "argument unused during compilation".
The old test had accepted flags like -mavx for clang++ on non Intel hosts.
This resulted in build failures because Intel code was included.
Now the check runs with -Werror, and unsupported flags are detected as
an error. This fixes the build problem with clang++ on non Intel hosts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
snprintf is a standard function which should be available
on all relevant platforms, so those checks are unnecessary.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It wrongly detects old versions of ICU as valid.
Checking with pkg-config is sufficient and also sets ICU_UC_LIBS.
Signed-off-by: Stefan Weil <sw@weilnetz.de>