This allows OCR of images from the internet without downloading them first:
tesseract http://IMAGE_URL OUTPUT ...
It uses libcurl.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This allows using new features of C++17 conditionally.
Simplify also the code which checks and sets the C++ version.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AC_CHECK_FILE does not work in cross builds. Such builds aborted.
Replace it by AC_CHECK_HEADERS. This fixes cross builds.
To enable TensorFlow in cross builds, more work is needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.
Simplify also the rules for linking Tensorflow.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It expects include files in /usr/include/tensorflow.
* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
(so the Tensorflow sources are not needed for the build)
* Update Makefiles
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:
configure:4224: checking whether C++ compiler accepts -mavx
configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx conftest.cpp >&5
conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
;
^
1 error generated.
Add -Wno-extra-semi-stmt to disable those errors if possible.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wunused-macros. Add -Wno-unused-macros to disable those
errors if possible.
Simplify also the setting of several conditionals (AVX, AVX2, ...).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
gcc fails if an unsupported compile flag is given, but clang and clang++
normally only emit a warning "argument unused during compilation".
The old test had accepted flags like -mavx for clang++ on non Intel hosts.
This resulted in build failures because Intel code was included.
Now the check runs with -Werror, and unsupported flags are detected as
an error. This fixes the build problem with clang++ on non Intel hosts.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
snprintf is a standard function which should be available
on all relevant platforms, so those checks are unnecessary.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
It wrongly detects old versions of ICU as valid.
Checking with pkg-config is sufficient and also sets ICU_UC_LIBS.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The first usage of AC_CHECK_HEADERS must be unconditional,
otherwise configure fails to detect support for shared libraries.
This fixes a regression introduced by commit a07025c993.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text
* Run AC_CHECK_HEADERS, AC_CHECK_LIB only if OpenCL support is enabled
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Remove unneeded arguments for AC_ARG_ENABLE
* Fix formatting of help text
* Remove help text for --enable-legacy
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* Remove unneeded arguments for AC_ARG_ENABLE (needs renaming of macro)
* Use [] instead of () for default in help text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like
#include "capi.h"
must now change that to
#include "tesseract/capi.h"
This avoids name conflicts with header files from other projects.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Not only version names like 4.0.0, but also version names like
v4.0.0 or tesseract-4.0.0 are now supported and give the same
GENERIC_MAJOR_VERSION = 4.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The newer macros that replace the obsolete ones are already present in configure.ac.
* AC_PROG_LIBTOOL -> LT_INIT
* AC_LANG_CPLUSPLUS -> AC_LANG([C++])
Building with G++ on Darwin breaks when either AVX, AVX2, or SSE4.1
compiler option is set, unless G++ is actually CLANG.
This commit allows to build with G++, by asking G++ to delegate assembly
to the clang integrated assembler, instead of the GNU one.
Commit f9157fd91d changed the rules for
the documentation, so make always tried to build it and failed if
asciidoc was missing since that commit.
Now configure tests whether asciidoc is available and builds the
documentation conditionally. It also reports that to the user.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
That macro disables automated updates when configure.ac or a Makefile.am
changes. Normally those updates are wanted because users typically
forget running ./autogen.sh.
See also the GNU documentation why AM_MAINTAINER_MODE should not be used:
https://www.gnu.org/software/automake/manual/html_node/maintainer_002dmode.html
Signed-off-by: Stefan Weil <sw@weilnetz.de>
AX_SPLIT_VERSION only works after AM_INIT_AUTOMAKE, so that macro had
to be moved.
GENERIC_MAJOR_VERSION, GENERIC_MINOR_VERSION and GENERIC_MICRO_VERSION
are now set automatically and can be used in further processing.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
While "echo -n" works on Debian GNU Linux, it fails to produce a valid
configure file on macOS, so try a different shorter solution.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
libtiff is no longer needed for OpenCL, so remove that dependency.
It is still suggested for Windows to redirect warning messages
from the tesseract executable to the console.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The OpenCL code of Tesseract uses TIFF functions, but the TIFF library
was not added to the linker flags for macOS.
This fixes builds with OpenCL on Mac.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
_mm256_extract_epi64 is not available for 32 bit platforms,
but it can be replaced by "a very simple workaround".
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The current implementation for AVX uses 64 bit code,
so run the AVX test only when the compiler is a 64 bit compiler.
This fixes the broken implementation for 32 bit hosts
which provide AVX but call the stub of DotProductAVX.
Simplify also the conditional code for AVX_OPT and SSE41_OPT.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
All relevant versions of Leptonica support pkg-config, so the old
configuration code can be removed.
Update also the error message for missing Leptonica.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The old settings don't work for cross compilations (wrong include path)
or require setting LIBLEPT_HEADERSDIR. They are used as fallback if
there is no pkg-config configuration.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Builds without support for OpenMP failed with the old code. Fix this:
* Add OPENMP_CXXFLAGS for ccmain.
* Replace unconditional -fopenmp by OPENMP_CXXFLAGS for lstm.
* Always use _OPENMP for conditional compilation.
* Remove OPENMP as there is already _OPENMP.
* Include omp.h conditionally.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
openclwrapper calls function TIFFReadRGBAImageOriented which is provided
by libtiff, so add that library to OPENCL_LIBS.
This fixes a linker error (unresolved symbol) when opencl is enabled.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This is not strictly necessary, but recommended in the GNU autoconf manual.
No [] was added to arguments like true or false.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The different checks had set ENABLE_TRAINING unconditionally,
thus overwriting the value from the preceding checks.
So if pango and cairo were available, but icu was missing,
users would still be offered to build the training tools.
The changes for icu and has_cpp11 are not strictly necessary,
but are made here to have uniform code patterns.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The wchar_t type is defined in `wchar.h` and if this header is not
included by autoconf the detection of the type will fail. This type is
required by `unicharset_extractor` to autogenerate the character
properties.
This problem was detected when running under Fedora 21.