This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>