tesseract/src/ccutil
Stefan Weil 1c7e00611b Add initial support for traineddata files in standard archive formats
This requires libarchive-dev.

Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:

    $ unzip -l /usr/local/share/tessdata/zip.traineddata
    Archive:  /usr/local/share/tessdata/zip.traineddata
      Length      Date    Time    Name
    ---------  ---------- -----   ----
           55  2019-03-05 15:27   bagit.txt
            0  2019-03-05 15:25   data/
         1557  2019-03-05 15:28   manifest-sha256.txt
      1082890  2019-03-05 15:25   data/eng.word-dawg
      1487588  2019-03-05 15:25   data/eng.lstm
         7477  2019-03-05 15:25   data/eng.unicharset
        63346  2019-03-05 15:25   data/eng.shapetable
       976552  2019-03-05 15:25   data/eng.inttemp
        13408  2019-03-05 15:25   data/eng.normproto
         4322  2019-03-05 15:25   data/eng.punc-dawg
         4738  2019-03-05 15:25   data/eng.lstm-number-dawg
         1410  2019-03-05 15:25   data/eng.freq-dawg
          844  2019-03-05 15:25   data/eng.pffmtable
         6360  2019-03-05 15:25   data/eng.lstm-unicharset
         1012  2019-03-05 15:25   data/eng.lstm-recoder
         1047  2019-03-05 15:25   data/eng.unicharambigs
         4322  2019-03-05 15:25   data/eng.lstm-punc-dawg
     16109842  2019-03-05 15:25   data/eng.bigram-dawg
           80  2019-03-05 15:25   data/eng.version
         6426  2019-03-05 15:25   data/eng.number-dawg
      3694794  2019-03-05 15:25   data/eng.lstm-word-dawg
    ---------                     -------
     23468070                     21 files

`combine_tessdata -d` and `combine_tessdata -u` also work.

The traineddata files in the new format can be generated with
standard tools like zip or tar.

More work is needed for other training tools and big endian support.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
..
ambigs.cpp Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
ambigs.h Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
basedir.cpp replace deprecated C++ headers (reported by clan-tidy) - partially supersedes PR #1605 2018-09-18 18:51:11 +02:00
basedir.h Fix line endings 2018-04-25 19:04:50 +02:00
bits16.h Replace tabs by blanks in source code 2018-07-03 16:29:14 +02:00
bitvector.cpp BitVector: Use new serialization API 2018-07-18 17:07:03 +02:00
bitvector.h Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
ccutil.cpp CCUtil: Define virtual destructor in .cpp file 2018-09-04 07:44:27 +02:00
ccutil.h CCUtil: Define virtual destructor in .cpp file 2018-09-04 07:44:27 +02:00
clst.cpp Remove unneeded type casts 2018-07-04 14:23:55 +02:00
clst.h Format code (replace ( xxx ) by (xxx)) 2018-09-29 08:21:25 +02:00
doubleptr.h Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00
elst2.cpp Format code (replace ( xxx ) by (xxx)) 2018-09-29 08:21:25 +02:00
elst2.h Format code (replace ( xxx ) by (xxx)) 2018-09-29 08:21:25 +02:00
elst.cpp Remove unneeded type casts 2018-07-04 14:23:55 +02:00
elst.h Format code (replace ( xxx ) by (xxx)) 2018-09-29 08:21:25 +02:00
errcode.cpp Use __builtin_trap instead of null pointer dereference to abort 2019-02-18 10:49:51 +01:00
errcode.h Fix line endings 2018-04-25 19:04:50 +02:00
fileerr.h Fix line endings 2018-04-25 19:04:50 +02:00
fileio.cpp fix typo in non VS build 2018-11-08 23:10:14 +01:00
fileio.h Move class tesseract::File from training to ccutil 2018-08-25 18:16:46 +02:00
genericheap.h Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
genericvector.h fix a couple minor compiler warnings 2018-10-30 18:00:32 -06:00
globaloc.cpp Add missing 'static' keyword 2018-10-22 17:48:17 +02:00
globaloc.h Fix line endings 2018-04-25 19:04:50 +02:00
helpers.h Clean use of qsort function sort_floats 2018-08-31 23:17:27 +02:00
host.h Replace FLOAT32 by float data type 2018-07-02 13:29:39 +02:00
indexmapbidi.cpp IndexMapBiDi: Define virtual destructor in .cpp file 2018-09-04 13:08:29 +02:00
indexmapbidi.h IndexMapBiDi: Define virtual destructor in .cpp file 2018-09-04 13:08:29 +02:00
kdpair.h Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
lsterr.h Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00
mainblk.cpp Remove unused macros 2018-10-22 17:48:17 +02:00
Makefile.am Add initial support for traineddata files in standard archive formats 2019-03-05 17:18:48 +01:00
object_cache.h Use default keyword instead of empty ctors/dtors. 2018-05-21 13:35:46 +03:00
ocrclass.h ocrclass: Remove unused macros 2018-11-08 20:23:36 +01:00
params.cpp Don't call exit when parameter in file is unknown 2018-09-20 08:37:33 +02:00
params.h replace deprecated C++ headers (reported by clan-tidy) - partially supersedes PR #1605 2018-09-18 18:51:11 +02:00
platform.h Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-10-09 15:37:40 +02:00
qrsequence.h Fix some typos (most found by codespell) 2018-05-27 18:49:43 +02:00
scanutils.cpp Remove unused include statements for tprintf.h 2018-11-18 17:25:01 +01:00
scanutils.h scanutils: Fix typos in comments 2018-06-09 07:53:20 +02:00
serialis.cpp TFile: Add helper functions for serialization of simple data types 2018-07-18 11:19:37 +02:00
serialis.h Format code (replace ( xxx ) by (xxx)) 2018-09-29 08:21:25 +02:00
sorthelper.h Rename all C-style headers (e.g. <stdio.h>) to C++ style (<cstdio>). 2018-05-20 00:52:04 +03:00
strngs.cpp Fix potential crash in STRING class 2018-11-30 23:14:11 +01:00
strngs.h Replace tabs by blanks in source code 2018-07-03 16:29:14 +02:00
tesscallback.h Fix compiler warning 2018-10-23 17:01:53 +02:00
tessdatamanager.cpp Add initial support for traineddata files in standard archive formats 2019-03-05 17:18:48 +01:00
tessdatamanager.h Add initial support for traineddata files in standard archive formats 2019-03-05 17:18:48 +01:00
tprintf.cpp Fix potential crash in tprintf 2018-11-30 23:14:11 +01:00
tprintf.h Replace tprintf_internal by tprintf and clean tprintf code 2018-07-07 21:47:10 +02:00
unichar.cpp Fix compiler warnings (-Wimplicit-fallthrough) 2019-02-09 16:32:20 +01:00
unichar.h Replace string.h by standard C++ cstring 2018-06-21 20:40:26 +02:00
unicharcompress.cpp Use using instead of typedef. Reason: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rt-using 2018-05-20 01:31:03 +03:00
unicharcompress.h RecodedCharID: Use new serialization API 2018-07-18 16:22:01 +02:00
unicharmap.cpp UNICHARMAP: Remove comparison which is always false 2018-10-08 14:15:17 +02:00
unicharmap.h Fix line endings 2018-04-25 19:04:50 +02:00
unicharset.cpp Fix compiler warnings (-Wstringop-truncation) 2019-02-09 16:32:09 +01:00
unicharset.h Fix compiler warnings (-Wstringop-truncation) 2019-02-09 16:32:09 +01:00
unicity_table.h Fix compiler warnings [-Wzero-as-null-pointer-constant] 2018-07-04 20:40:56 +02:00
unicodes.cpp Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00
unicodes.h Fix doxygen comments 2019-02-20 21:11:38 +01:00
universalambigs.cpp Fix line endings 2018-04-25 19:04:50 +02:00
universalambigs.h Fix line endings 2018-04-25 19:04:50 +02:00