Commit Graph

142 Commits

Author SHA1 Message Date
Stefan Weil
e12b99d49b Remove host.h from Tesseract API
It is not needed by other API header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
09255ebe44 Don't include windows.h from platform.h
This partially reverts commit c150b9832d.
Now params.cpp includes host.h which also gets the definition for MAX_PATH.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-21 22:20:13 +02:00
Stefan Weil
217c2530e6 Remove strtofloat
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-19 11:19:04 +02:00
Stefan Weil
7c3f9000cd Replace sscanf by std::stringstream
Using std::stringstream allows working with the C locale, independent
of the current locale settings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-19 11:19:04 +02:00
Stefan Weil
a1ffcd3654 Use std::stringstream for add_str_double
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-14 16:16:16 +02:00
Stefan Weil
12ca2513d4 Revert "e" flag for fopen
clang-tidy added it in commit ac0b191f6b.

The "e" flag is an extension for glibc which sets the O_CLOEXEC flag,
so the file handle is not leaked to child processes. It is not needed
here.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-13 18:53:57 +02:00
Samuel Lee
e32b3360aa
Fix for MSVC
LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.
2019-04-11 02:33:51 +09:00
Stefan Weil
72c874140e Modernize code by replacing C type casts
This was done using clang-tidy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 09:04:51 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Stefan Weil
20d5eedd45 Modernize code (clang-tidy check modernize-loop-convert)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
amitdo
fab9a54981 Remove unneeded 'SUBDIRS=' from 3 Makefile.am files 2019-04-04 19:31:39 +02:00
Stefan Weil
ab009fae94 Remove macro WINDLLNAME
It is now no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 20:05:41 +02:00
Stefan Weil
77a5f2623e Remove unused config variable tessedit_module_name
It was only defined for Windows builds.

Use also false instead of 0 to set the default value of
two boolean config variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 20:04:00 +02:00
Stefan Weil
c150b9832d Add missing include statements for Windows build
The last commits which removed BOOL8 had broken the Windows build.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 19:02:29 +02:00
Stefan Weil
802f42e821 Remove BOOL8, TRUE, FALSE from host.h
Remove unneeded include statements for host.h, add required ones and
update the comments for the remaining include statements.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:27:20 +02:00
Stefan Weil
be96b7b660 bits16: Format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:26:50 +02:00
Stefan Weil
4e0c726d6c ccutil: replace TRUE, FALSE by true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:56:47 +02:00
Stefan Weil
89ba48b106 strngs: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:13:38 +02:00
Stefan Weil
127d0e31f0 serialis: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:12:11 +02:00
Stefan Weil
8b663e7620 helpers: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:06:19 +02:00
Stefan Weil
1948f0d520 ocrclass: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:39:44 +02:00
Stefan Weil
83d4433d3b Modernize and format unichar.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:30:15 +02:00
Stefan Weil
ac0b191f6b Modernize and format genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:21:32 +02:00
Stefan Weil
36ed08636b Modernize and format tesscallback.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:16:00 +02:00
Stefan Weil
a44bf41f14 Modernize C++ loops
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-loop-convert' -fix

Then the resulting code was cleaned manually.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 08:38:21 +01:00
Stefan Weil
a0fd90583b Modernize C++ code using auto
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
36f768853a Modernize C++ code using override
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-override' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:37:52 +01:00
Stefan Weil
631882a346 Fix compiler warnings (signed / unsigned mismatch)
clang warnings:

    src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 08:36:07 +01:00
Stefan Weil
f9860cda41 Optimize functions ResetFrom
The loop can terminate as soon as the parameter name was found.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:21:23 +01:00
Stefan Weil
41da5afe9d UNICHARSET: Fix compiler warning (signed/unsigned mismatch)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:18:21 +01:00
Stefan Weil
91e2b253c0 Format modified code with clang-format
Format the files which were changed in
commit 297d7d86ce.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:10:29 +01:00
Stefan Weil
58423d2f6c
Merge pull request #2328 from bertsky/lstm-with-user-patterns2
Add user words / patterns again
2019-03-24 19:38:40 +01:00
Stefan Weil
da6305b632 Fix compiler warnings caused by ASSERT_HOST
The modified definition avoids warnings caused by redundant semicolons.
Now a semicolon is required when using the macro, so a few code locations
had to be updated.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 17:47:04 +01:00
Stefan Weil
ee2f9bf7bf Remove old comments in file headers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Robert Schubert
297d7d86ce trying to add user words/patterns again:
- pass in ParamsVectors from Tesseract
  (carrying values from langdata/config/api)
  into LSTMRecognizer::Load and LoadDictionary
- after LSTMRecognizer's Dict is initialised
  (with default values), reset the variables
  user_{words,patterns}_{suffix,file} from the
  corresponding entries in the passed vector
2019-03-15 16:06:19 +01:00
Robert Schubert
3912cb1c33 LSTM char_whitelist/blacklist (6ac2ff0): more robust
- unicharset can be null too
2019-03-09 10:40:40 +01:00
Stefan Weil
1c7e00611b Add initial support for traineddata files in standard archive formats
This requires libarchive-dev.

Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:

    $ unzip -l /usr/local/share/tessdata/zip.traineddata
    Archive:  /usr/local/share/tessdata/zip.traineddata
      Length      Date    Time    Name
    ---------  ---------- -----   ----
           55  2019-03-05 15:27   bagit.txt
            0  2019-03-05 15:25   data/
         1557  2019-03-05 15:28   manifest-sha256.txt
      1082890  2019-03-05 15:25   data/eng.word-dawg
      1487588  2019-03-05 15:25   data/eng.lstm
         7477  2019-03-05 15:25   data/eng.unicharset
        63346  2019-03-05 15:25   data/eng.shapetable
       976552  2019-03-05 15:25   data/eng.inttemp
        13408  2019-03-05 15:25   data/eng.normproto
         4322  2019-03-05 15:25   data/eng.punc-dawg
         4738  2019-03-05 15:25   data/eng.lstm-number-dawg
         1410  2019-03-05 15:25   data/eng.freq-dawg
          844  2019-03-05 15:25   data/eng.pffmtable
         6360  2019-03-05 15:25   data/eng.lstm-unicharset
         1012  2019-03-05 15:25   data/eng.lstm-recoder
         1047  2019-03-05 15:25   data/eng.unicharambigs
         4322  2019-03-05 15:25   data/eng.lstm-punc-dawg
     16109842  2019-03-05 15:25   data/eng.bigram-dawg
           80  2019-03-05 15:25   data/eng.version
         6426  2019-03-05 15:25   data/eng.number-dawg
      3694794  2019-03-05 15:25   data/eng.lstm-word-dawg
    ---------                     -------
     23468070                     21 files

`combine_tessdata -d` and `combine_tessdata -u` also work.

The traineddata files in the new format can be generated with
standard tools like zip or tar.

More work is needed for other training tools and big endian support.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Stefan Weil
2cbe723d03 Fix doxygen comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-20 21:11:38 +01:00
Stefan Weil
38861be639 Use __builtin_trap instead of null pointer dereference to abort
This fixes a warning from Apple's clang compiler:

    [ 34%] Building CXX object CMakeFiles/libtesseract.dir/src/ccutil/errcode.cpp.o
    /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
          *reinterpret_cast<int*>(0) = 0;
          ^~~~~~~~~~~~~~~~~~~~~~~~~~
    /Users/travis/build/stweil/tesseract/src/ccutil/errcode.cpp:83:7: note: consider using __builtin_trap() or qualifying pointer with 'volatile'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-18 10:49:51 +01:00
Stefan Weil
2a355ea103 Fix compiler warnings (-Wimplicit-fallthrough)
gcc warnings:

    src/ccmain/docqual.cpp:734:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    src/ccmain/docqual.cpp:764:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    src/ccmain/docqual.cpp:782:26: warning: this statement may fall through [-Wimplicit-fallthrough=]
    [...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Stefan Weil
aa2dcca295 Fix compiler warnings (-Wstringop-truncation)
gcc warnings:

    src/api/tesseractmain.cpp:252:14: warning:
        ‘char* strncpy(char*, const char*, size_t)’ specified bound 255
        equals destination size [-Wstringop-truncation]
    src/ccutil/unicharset.h:66:12: warning:
        ‘char* strncpy(char*, const char*, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation]
    src/ccutil/unicharset.cpp:806:12: warning:
        ‘char* strncpy(char*, const char*, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:09 +01:00
Stefan Weil
e2419b1968 Fix potential crash in tprintf
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
6b6d9de497 Fix potential crash in STRING class
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
9b783822a0 Remove unused include statements for tprintf.h
Format also a call of tprintf and add a missing explicit include statement.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-18 17:25:01 +01:00
Stefan Weil
a93426c9ff Fix wrong results from function streamtofloat
The local variable k should be 10 ^ (number of digits after comma),
but will overflow when there are more than 9 digits after the comma
because an int value cannot store 10000000000.

This results in wrong double values read from .tr files for example
(or in a runtime exception if Tesseract was compiled with -ftrapv).

Using uint64_t does not fix the general problem but allows more digits
which should be sufficient for the data read by Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-17 20:02:21 +01:00
zdenop
724957167e fix typo in non VS build 2018-11-08 23:10:14 +01:00
zdenop
eb104f9fe4 VS build: fix warning C4996: The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name. 2018-11-08 22:55:04 +01:00
zdenop
7a7f226228 ocrclass: Remove unused macros
Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccutil/ocrclass.h
2018-11-08 20:23:36 +01:00
Zdenko Podobný
2dd753ee4c replace VS implementation of gettimeofday with std::chrono::steady_clock::now(); fixes #2038 2018-11-08 19:43:46 +01:00
chrismamo1
30be5aaaac fix a couple minor compiler warnings 2018-10-30 18:00:32 -06:00