Stefan Weil
51a2c2eae8
Format code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:24:02 +02:00
Stefan Weil
95ea778745
capi: Replace FALSE, TRUE and simplify and format code
...
Format code using clang-format and clang-tidy.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:19:04 +02:00
zdenop
f47c7c92dd
fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer;
...
CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142
2019-03-31 12:26:49 +02:00
Stefan Weil
a0fd90583b
Modernize C++ code using auto
...
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
36f768853a
Modernize C++ code using override
...
The modifications were done using this command:
run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-override' -fix
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:37:52 +01:00
Tadahito Yao
bbbd262a8d
Added missing linker flags for MinGW.
2019-03-13 22:10:36 +09:00
Noah Metzger
5b3e2fe812
Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Noah Metzger
754e38d2b4
Added the option to get the timesteps separated by the suggested segmentation
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
zdenop
e817607280
archive_version_details is available from libArchive version 3.2.0
2019-03-10 22:57:48 +01:00
zdenop
5cfe4cc1f0
Merge pull request #2286 from Shreeshrii/lstmbox
...
Rename function to TessBaseAPIGetTsvText to be consistent to Create method
2019-03-10 21:41:52 +01:00
zdenop
02a1ffe87a
Report libArchive support
2019-03-10 20:08:45 +01:00
Stefan Weil
1c7e00611b
Add initial support for traineddata files in standard archive formats
...
This requires libarchive-dev.
Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:
$ unzip -l /usr/local/share/tessdata/zip.traineddata
Archive: /usr/local/share/tessdata/zip.traineddata
Length Date Time Name
--------- ---------- ----- ----
55 2019-03-05 15:27 bagit.txt
0 2019-03-05 15:25 data/
1557 2019-03-05 15:28 manifest-sha256.txt
1082890 2019-03-05 15:25 data/eng.word-dawg
1487588 2019-03-05 15:25 data/eng.lstm
7477 2019-03-05 15:25 data/eng.unicharset
63346 2019-03-05 15:25 data/eng.shapetable
976552 2019-03-05 15:25 data/eng.inttemp
13408 2019-03-05 15:25 data/eng.normproto
4322 2019-03-05 15:25 data/eng.punc-dawg
4738 2019-03-05 15:25 data/eng.lstm-number-dawg
1410 2019-03-05 15:25 data/eng.freq-dawg
844 2019-03-05 15:25 data/eng.pffmtable
6360 2019-03-05 15:25 data/eng.lstm-unicharset
1012 2019-03-05 15:25 data/eng.lstm-recoder
1047 2019-03-05 15:25 data/eng.unicharambigs
4322 2019-03-05 15:25 data/eng.lstm-punc-dawg
16109842 2019-03-05 15:25 data/eng.bigram-dawg
80 2019-03-05 15:25 data/eng.version
6426 2019-03-05 15:25 data/eng.number-dawg
3694794 2019-03-05 15:25 data/eng.lstm-word-dawg
--------- -------
23468070 21 files
`combine_tessdata -d` and `combine_tessdata -u` also work.
The traineddata files in the new format can be generated with
standard tools like zip or tar.
More work is needed for other training tools and big endian support.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Stefan Weil
7fbde96a04
Format new code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 20:26:07 +01:00
Stefan Weil
38fac625cd
Format new code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 20:01:48 +01:00
Shree
a0202bac70
Rename function to TessBaseAPIGetTsvText to be consistent to the Create method
2019-03-02 16:29:53 +00:00
Shree
c7e8131efc
Add TSV option to C-API
2019-03-02 09:50:54 +00:00
Shree
22c099348b
rename LSTMBOX to LSTMBox
2019-03-02 09:11:47 +00:00
Shree
c33f03e33e
Add lstmboxand wordstrbox to capi.h
2019-03-01 17:16:59 +00:00
Shree
76ec21df3d
Add lstmbox and wordstrbox to C-API
2019-03-01 16:40:41 +00:00
Michal Čihař
14c4494f42
Allow UTF-8 variant of C locale
...
It behaves same in scanf, but it allows proper handling of unicode
chars.
2019-02-26 21:37:33 +01:00
Stefan Weil
b3bd23edb7
Remove whitespace at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-19 13:53:31 +01:00
Shree Devi Kumar
f3362a4b5b
Add renderer to create WordStr box files from images
2019-02-10 19:59:17 +00:00
zdenop
2ae65b2493
Merge pull request #2216 from Shreeshrii/lstmbox
...
Lstmbox
2019-02-10 13:53:41 +01:00
Shree Devi Kumar
311053681c
put common code in AddBoxToLSTM
2019-02-10 09:16:45 +00:00
zdenop
e51f1885e6
Merge pull request #2229 from stweil/warn
...
Fix some compiler warnings
2019-02-10 08:20:23 +01:00
Shree Devi Kumar
b51c1bf05a
change to const char* as suggested by @stweil
2019-02-10 05:13:18 +00:00
Stefan Weil
aa2dcca295
Fix compiler warnings (-Wstringop-truncation)
...
gcc warnings:
src/api/tesseractmain.cpp:252:14: warning:
‘char* strncpy(char*, const char*, size_t)’ specified bound 255
equals destination size [-Wstringop-truncation]
src/ccutil/unicharset.h:66:12: warning:
‘char* strncpy(char*, const char*, size_t)’ output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation]
src/ccutil/unicharset.cpp:806:12: warning:
‘char* strncpy(char*, const char*, size_t)’ specified bound 64 equals destination size [-Wstringop-truncation]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:09 +01:00
Stefan Weil
d42413dd17
OpenCL: Remove PERF_COUNT framework
...
It was rarely used, but added a lot of code and an unconditional
dependency on openclwrapper.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 10:58:15 +01:00
Shree Devi Kumar
0f42fd8c69
change to use bbox coordinates for TEXTLINE for all characters
...
(cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)
2019-02-05 14:03:29 +00:00
Shree Devi Kumar
9c89cd51cf
Add a new renderer to create box files from images for LSTM training
...
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)
fix typo
(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)
Add lstmboxrenderer to CMakeLists
(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)
fix formatting
(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Mikhail Akopov
7be04342cf
Fix typo
2019-02-01 09:58:44 +01:00
Stefan Weil
9e6e3a0232
Fix memory leak for PNG images
...
Commit 5fe1390748
used an implementation
which created a new Pix object. That object was never destroyed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 20:05:10 +01:00
Stefan Weil
7fc7d28dd0
Compile files for AVX, AVX2 or SSE only when needed
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
zdenop
f75b2c1948
Merge pull request #310 from nickjwhite/hocrcharboxes
...
Character boxes in hOCR output
2019-01-14 19:19:04 +01:00
Nick White
ebbf907c56
Fix typo in hocr character box output
2019-01-13 16:28:31 +00:00
Nick White
4ce797b6f6
Fix hocr character box info to use new hocr renderer correctly
2019-01-13 13:01:14 +00:00
Nick White
c43e4501e3
Merge remote-tracking branch 'origin/master' into hocrcharboxes
2019-01-13 12:41:42 +00:00
zdenop
238cb219d5
Merge pull request #2152 from stweil/clean
...
Remove opencl_device_selection.h
2019-01-09 15:02:59 +01:00
Stefan Weil
a0e6586e63
Fix documentation for page segmentation mode 2
...
It never worked, so add a comment that the implementation is missing.
Add also a to-do comment.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 13:51:44 +01:00
Stefan Weil
0fae848b58
OpenCL: Add comments to users of openclwrapper.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:11:00 +01:00
Stefan Weil
e0fc4f2945
Remove opencl_device_selection.h
...
Always use OpenCL device selection if OpenCL is enabled.
This fixes a regression which was introduced by commit
5c6a57b727
which removed
the definition for USE_DEVICE_SELECTION.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:09:56 +01:00
zdenop
d3065520fa
fix 2 clang warnings
2018-12-30 20:25:24 +01:00
Stefan Weil
cb049133cd
Fix compiler warning
...
clang warning:
tesseractmain.cpp(512,21): warning: '&&' within '||' [-Wlogical-op-parentheses]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-29 22:17:33 +01:00
zdenop
420fb0ced0
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2018-12-29 10:31:33 +01:00
zdenop
8885fe2ccb
provide info about compiled openmp version
2018-12-29 10:18:27 +01:00
Stefan Weil
993e56ffde
Don't try to create text output if other renderers failed (fix regression)
...
Commit 49d7df6dc3
added error handling,
but since that commit Tesseract used the text fallback if the user
selected output failed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-27 10:23:28 +01:00
zdenop
cc997b53c7
add missing the implementation for TessBaseAPIGetAltoText method in C-API
2018-12-26 21:35:47 +01:00
Stefan Weil
db9c7e0312
Use std::stringstream to generate hOCR output
...
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
...
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
Stefan Weil
c7e8d30280
Fix value for PHYSICAL_IMG_NR in ALTO output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00