Commit Graph

197 Commits

Author SHA1 Message Date
Stefan Weil
3dfd72721b Simplify configure.ac
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:34 +01:00
Stefan Weil
ca172592da Add support for image or image list by URL
This allows OCR of images from the internet without downloading them first:

    tesseract http://IMAGE_URL OUTPUT ...

It uses libcurl.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:28 +01:00
Zdenko Podobný
5e3772cad8 fix #2101 2019-11-01 12:30:15 +01:00
Stefan Weil
2e1cd1d448 Add dot product implementation for Intel FMA (double = tessdata_best)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:29:39 +01:00
Zdenko Podobný
5280bbcade 4.1.0 Release 2019-07-07 14:34:08 +02:00
Stefan Weil
3dff32e407 Fix check for icu 52.1 or newer
It detected old versions but did not disable the training build.
This completes commit 66da4df11d.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-25 14:55:03 +02:00
Stefan Weil
d7d0500030 Remove code for embedded build
That code is unrelated to Tesseract and can be easily implemented
by external projects which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-18 09:46:16 +02:00
Stefan Weil
d33ced1958 Use C++17 compiler if possible
This allows using new features of C++17 conditionally.
Simplify also the code which checks and sets the C++ version.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:32:21 +02:00
Stefan Weil
831a3e6167 configure: Fix cross builds (check for TensorFlow header)
AC_CHECK_FILE does not work in cross builds. Such builds aborted.
Replace it by AC_CHECK_HEADERS. This fixes cross builds.

To enable TensorFlow in cross builds, more work is needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:30:22 +02:00
Raphael Graf
d809200b6b Do not link librt on OpenBSD 2019-06-16 18:28:45 +02:00
Stefan Weil
227452f872 Replace Tensorflow by TensorFlow
The name is written in camel case, see https://www.tensorflow.org/.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:27:08 +02:00
Stefan Weil
6c68f08460 Implement check for Tensorflow header file
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.

Simplify also the rules for linking Tensorflow.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:26:42 +02:00
Stefan Weil
b7537e94a0 Support build with Tensorflow
It expects include files in /usr/include/tensorflow.

* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
  (so the Tensorflow sources are not needed for the build)
* Update Makefiles

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:26:18 +02:00
Stefan Weil
ecd0384a31 configure: Use a hopefully more robust way to fix AX_CHECK_COMPILE_FLAG
The check for -Wno-extra-semi-stmt failed on Linux with clang++-7.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:21:50 +02:00
Stefan Weil
0c70c2a69f configure: Fix for clang++-8 and newer
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wextra-semi-stmt:

    configure:4224: checking whether C++ compiler accepts -mavx
    configure:4243: clang++-8 -c -g -O2 -Wall -Wextra -Wpedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -march=native -Werror -Wno-unused-macros -mavx  conftest.cpp >&5
    conftest.cpp:20:3: error: empty expression statement has no effect; remove unnecessary ';' to silence this warning [-Werror,-Wextra-semi-stmt]
      ;
      ^
    1 error generated.

Add -Wno-extra-semi-stmt to disable those errors if possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-16 18:21:04 +02:00
Stefan Weil
ab695f882d configure: Fix for latest developer tools on macOS
AX_CHECK_COMPILE_FLAG fails if it is used with -Werror and the compiler
raises error -Wunused-macros. Add -Wno-unused-macros to disable those
errors if possible.

Simplify also the setting of several conditionals (AVX, AVX2, ...).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 20:19:26 +02:00
James R. Barlow
8ef392cb08 Fix CPPFLAGS configuration for icu4c and libarchive missing from configure.ac 2019-05-08 15:42:10 +02:00
Zdenko Podobný
3bbe4327c0 fix #2344 libpthread under-linking on FreeBSD 2019-03-27 15:37:14 +01:00
Stefan Weil
4ccbb9f830 configure: Check support of compile flags with -Werror
gcc fails if an unsupported compile flag is given, but clang and clang++
normally only emit a warning "argument unused during compilation".

The old test had accepted flags like -mavx for clang++ on non Intel hosts.
This resulted in build failures because Intel code was included.

Now the check runs with -Werror, and unsupported flags are detected as
an error. This fixes the build problem with clang++ on non Intel hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 16:41:52 +01:00
Stefan Weil
1c7e00611b Add initial support for traineddata files in standard archive formats
This requires libarchive-dev.

Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:

    $ unzip -l /usr/local/share/tessdata/zip.traineddata
    Archive:  /usr/local/share/tessdata/zip.traineddata
      Length      Date    Time    Name
    ---------  ---------- -----   ----
           55  2019-03-05 15:27   bagit.txt
            0  2019-03-05 15:25   data/
         1557  2019-03-05 15:28   manifest-sha256.txt
      1082890  2019-03-05 15:25   data/eng.word-dawg
      1487588  2019-03-05 15:25   data/eng.lstm
         7477  2019-03-05 15:25   data/eng.unicharset
        63346  2019-03-05 15:25   data/eng.shapetable
       976552  2019-03-05 15:25   data/eng.inttemp
        13408  2019-03-05 15:25   data/eng.normproto
         4322  2019-03-05 15:25   data/eng.punc-dawg
         4738  2019-03-05 15:25   data/eng.lstm-number-dawg
         1410  2019-03-05 15:25   data/eng.freq-dawg
          844  2019-03-05 15:25   data/eng.pffmtable
         6360  2019-03-05 15:25   data/eng.lstm-unicharset
         1012  2019-03-05 15:25   data/eng.lstm-recoder
         1047  2019-03-05 15:25   data/eng.unicharambigs
         4322  2019-03-05 15:25   data/eng.lstm-punc-dawg
     16109842  2019-03-05 15:25   data/eng.bigram-dawg
           80  2019-03-05 15:25   data/eng.version
         6426  2019-03-05 15:25   data/eng.number-dawg
      3694794  2019-03-05 15:25   data/eng.lstm-word-dawg
    ---------                     -------
     23468070                     21 files

`combine_tessdata -d` and `combine_tessdata -u` also work.

The traineddata files in the new format can be generated with
standard tools like zip or tar.

More work is needed for other training tools and big endian support.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Stefan Weil
42ea432418 configure: Check for xsltproc (needed to generate manpages)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-15 22:19:52 +01:00
Stefan Weil
fd6e281c61 Use C++14 compiler if possible
This allows using new features of C++14 conditionally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-13 11:05:34 +01:00
Stefan Weil
b3327f4e90 Remove unneeded checks for snprintf
snprintf is a standard function which should be available
on all relevant platforms, so those checks are unnecessary.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-13 08:04:52 +01:00
Stefan Weil
66da4df11d configure: Remove header check for ICU
It wrongly detects old versions of ICU as valid.
Checking with pkg-config is sufficient and also sets ICU_UC_LIBS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-01 10:06:34 +01:00
Stefan Weil
2ccc5810f3 Add check whether compiler supports -march=native flag
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-05 20:13:28 +01:00
Guillaume Gigaud
92b8833838
fix(configure) Don't add rt on Android
Library rt is included in the libc on Android: https://developer.android.com/ndk/guides/stable_apis#a3
2018-11-15 13:56:28 +01:00
zdenop
cdfb768010 move langtests and unlvtests from tesseract-ocr repository to test repository 2018-11-08 22:31:32 +01:00
zdenop
51316994cc 4.0.0 Release 2018-10-29 09:53:12 +01:00
Marco Atzeri
ebbd4e3efc fixes #426; define NOUNDEFINED for cygwin 2018-10-20 11:25:28 +02:00
zdenop
d9372662ec add "sudo ldconfig" to install instruction. fixes #1212 2018-09-29 13:33:36 +02:00
Stefan Weil
be1393b1e8 Replace macro MINGW by __MINGW32__
MINGW is no longer used and now removed from configure.ac.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 16:05:27 +02:00
Shree Devi Kumar
92922b421c Add langtests framework with frk example 2018-08-30 14:28:34 +00:00
Stefan Weil
b15624eb2f Fix regression (shared libraries no longer supported)
The first usage of AC_CHECK_HEADERS must be unconditional,
otherwise configure fails to detect support for shared libraries.

This fixes a regression introduced by commit a07025c993.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-21 11:06:38 +02:00
Stefan Weil
58208522f0 configure: Clean code for --enable-visibility
* Remove unneeded arguments for AC_ARG_ENABLE
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
a07025c993 configure: Clean code for --enable-opencl
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text
* Run AC_CHECK_HEADERS, AC_CHECK_LIB only if OpenCL support is enabled

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
0ad6e3e77f configure: Clean code for --enable-legacy
* Remove unneeded arguments for AC_ARG_ENABLE
* Fix formatting of help text
* Remove help text for --enable-legacy

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
e47a9272d7 configure: Clean code for --enable-graphics
* Remove unneeded arguments for AC_ARG_ENABLE
* Remove help text for --enable-graphics

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
cfc5ef65a2 configure: Clean code for --enable-embedded
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
11cafd7673 configure: Clean code for --enable-debug
* Remove unneeded arguments for AC_ARG_ENABLE (needs renaming of macro)
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
11d9d8e59a configure: Remove macro AC_SYS_INTERPRETER
The macro sets interpval which is not used by Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Stefan Weil
0a4edf618a configure: Remove large file support
Tesseract does not handle large files (more than 2 GiB).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Stefan Weil
4bbebd3f7e Remove tests for function getline
The Tesseract code does not use getline.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Stefan Weil
081793ff48 Fix build with legacy engine disabled
Instead of defining the DISABLED_LEGACY_ENGINE macro in config_auto.h
(which is not included by all source files), define it as a preprocessor
option for those parts of the code which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-04 17:56:42 +02:00
amitdo
aa9f4b4861 Add an option to compile tesseract without the code of the legacy OCR engine 2018-07-03 18:49:42 +03:00
Stefan Weil
c1c87d73ee Require tesseract/ for API header files (fixes potential name conflicts)
The tesseract/ subdirectory is no longer automatically added to the
include path of the compiler. Therefore old code which used code like

    #include "capi.h"

must now change that to

    #include "tesseract/capi.h"

This avoids name conflicts with header files from other projects.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-06-17 22:01:19 +02:00
Shree Devi Kumar
2563380d51 move testing and testdata to test, add unlvtests 2018-06-06 12:20:14 +00:00
Egor Pugin
104fe7931c Move training to src. 2018-04-25 11:35:26 +03:00
Egor Pugin
e95ff1159e Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00
Eric Platon
4ded0d066e Revert failed attempt to support MacPort's g++
The support will require more work, and postpone for now.
2018-04-24 08:38:17 +09:00
Eric Platon
54b048fa0d Fix wrong environment test that breaks clang++ builds.
g++ builds require extra flags rejected by clang++. The bug is that the
flags are actually added unconditionally. This commit fixes the
condition.

See https://github.com/tesseract-ocr/tesseract/pull/1474
2018-04-23 16:11:24 +09:00