Commit Graph

2874 Commits

Author SHA1 Message Date
zdenop
e9b4e21e6f
Merge pull request #1822 from stweil/clean
ColPartition: Rename median_size_ -> median_height_
2018-08-03 10:06:03 +02:00
Stefan Weil
6a0f8e8c07 ColPartition: Rename median_size_ -> median_height_
This implements a TODO. Rename also some related items.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-03 08:46:38 +02:00
Egor Pugin
4370714779
Merge pull request #1819 from stweil/ocl
Fix ImageThresholder::OtsuThresholdRectToPix for OpenCL
2018-08-02 02:01:32 +03:00
Stefan Weil
8af80b7ba6 Fix ImageThresholder::OtsuThresholdRectToPix for OpenCL
The ThresholdRectToPix OpenCL kernel only supports 4 channels.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 22:49:28 +02:00
zdenop
c044b8c916
Merge pull request #1818 from stweil/psm
Fix potential crash with --psm 0 and use osd.traineddata automatically
2018-08-01 16:56:56 +02:00
zdenop
d22ca6bb06
Merge pull request #1817 from noahmetzger/winfix
Fix issue detected by Coverity Scan
2018-08-01 16:55:56 +02:00
Stefan Weil
27ce472666 Fix potential crash with --psm 0 and use osd.traineddata automatically
Page segmentation mode "OSD only" requires osd.traineddata,
so use it automatically.

Report a warning if the user specified a different language.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 16:52:37 +02:00
Noah Metzger
65997bed16 Fix issue detected by Coverity Scan
CID: 1340285 (Division or modulo by zero)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-01 15:56:19 +02:00
zdenop
b23568f3d1
Merge pull request #1816 from noahmetzger/winfix
Fix issues detected by Coverity Scan
2018-08-01 14:45:00 +02:00
Noah Metzger
d28631a274 Fix issues detected by Coverity Scan
CID: 1164604 (Nesting level does not match indentation)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-01 14:30:13 +02:00
Egor Pugin
8bb8b75692
Merge pull request #1815 from stweil/whitespace
Fix whitespace issues
2018-08-01 14:54:35 +03:00
Stefan Weil
6a28cce96b Fix whitespace issues
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
zdenop
3af2773d0e
Merge pull request #1814 from noahmetzger/winfix
Fix issue detected by Coverity Scan
2018-08-01 11:20:13 +02:00
Noah Metzger
2d96c66126 Fix issue detected by Coverity Scan
CID: 1164533 (Logically dead code)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-01 10:30:52 +02:00
zdenop
10259698d8
Merge pull request #1813 from stweil/fix
TessPDFRenderer: Improve robustness of API (issue #1804)
2018-08-01 09:17:55 +02:00
Stefan Weil
eb69dd0201 TessPDFRenderer: Improve robustness of API (issue #1804)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 09:11:04 +02:00
Egor Pugin
9ce4d05188
Merge pull request #1812 from noahmetzger/winfix
Fix issue reported by Coverity Scan
2018-07-31 13:52:05 +03:00
Noah Metzger
d4490af06d Fix issue reported by Coverity Scan
CID: 1375395 (Dereference after null check)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-31 10:43:39 +02:00
zdenop
7d99cb4e28
Merge pull request #1811 from noahmetzger/winfix
Fix issue reported by Coverity Scan
2018-07-31 09:53:33 +02:00
Noah Metzger
83a4eb3b44 Fix issue reported by Coverity Scan
CID: 1391264 (Improper use of negative value)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-31 09:43:30 +02:00
zdenop
18787ea12b
Merge pull request #1808 from stweil/fix
Revert "Change default width for images output by text2image"
2018-07-27 08:10:37 +02:00
Stefan Weil
9cf170cb7a Revert "Change default width for images output by text2image"
This reverts commit fdc243b363 because
it caused a regression reported in issue #1798.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-27 07:29:30 +02:00
Egor Pugin
57224bc9b5
Merge pull request #1805 from kant/patch-3
Minor formatting proposals
2018-07-26 20:06:02 +03:00
Egor Pugin
51c1950129
Merge pull request #1806 from stweil/training
training: Add new flag --workspace_dir to tesstraining_utils.sh
2018-07-26 20:05:34 +03:00
Stefan Weil
b19e69086c training: Add new flag --workspace_dir to tesstraining_utils.sh
By default, that script creates two new temporary directories with random
names in /tmp.

The new command line flag --workspace_dir PATH uses the given path as
a base directory for all temporary files.

That allows better reproducable training results (no random directory
names in log files).

Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
2018-07-26 17:14:19 +02:00
Darío Hereñú
b50073ec48
Minor formatting proposals 2018-07-26 12:00:14 -03:00
Egor Pugin
fbff323d6a
Merge pull request #1802 from noahmetzger/winfix
Added a feature to enrich the hOCR output with glyph confidences
2018-07-26 12:29:47 +03:00
zdenop
fc6d6fb25d
Merge pull request #1803 from kant/patch-2
Minor formatting proposals
2018-07-26 07:51:55 +02:00
Darío Hereñú
2315fe2a77
Minor formatting proposals 2018-07-25 22:13:50 -03:00
Noah Metzger
91c7504a35 Added a feature to enrich the hOCR output with glyph confidences
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.

The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-25 18:18:58 +02:00
zdenop
607e8fd85c
Merge pull request #1795 from stweil/fix
Fix regression (shared libraries no longer supported)
2018-07-21 13:01:15 +02:00
zdenop
390f9ed55b
Merge pull request #1796 from stweil/limit
Increase limit for deserialization of large arrays
2018-07-21 13:00:37 +02:00
Stefan Weil
132c540c85 Increase limit for deserialization of large arrays
The last limit was still too small.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-21 11:10:09 +02:00
Stefan Weil
b15624eb2f Fix regression (shared libraries no longer supported)
The first usage of AC_CHECK_HEADERS must be unconditional,
otherwise configure fails to detect support for shared libraries.

This fixes a regression introduced by commit a07025c993.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-21 11:06:38 +02:00
Egor Pugin
0e1e68d843
Merge pull request #1794 from stweil/fix
Increase limit and add assertions for deserialization of large arrays
2018-07-20 14:04:33 +03:00
Stefan Weil
f577e292c2 Increase limit and add assertions for deserialization of large arrays
One of the checks was too restrictive, as lstmeval deserializes
char arrays with 14000000 elements, so raise the limit to 30000000.
That check was added in commit 992031e824.

Add also assertions which help finding such problems in debug mode.

Signed-off-by: Stefan Weil <stweil@ub-backup.bib.uni-mannheim.de>
2018-07-20 11:47:49 +02:00
zdenop
364ffeb0ab
Merge pull request #1792 from stweil/mode
Add missing execute permission for script files
2018-07-19 20:53:36 +02:00
zdenop
62be158fd0
Merge pull request #1790 from stweil/configure
Clean configuration code
2018-07-19 20:53:19 +02:00
Stefan Weil
ca25d88538 Add missing execute permission for script files
It is needed for running the training tutorial on Linux.

The correct mode was lost when moving the files in
commit 104fe7931c.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 20:25:41 +02:00
Stefan Weil
58208522f0 configure: Clean code for --enable-visibility
* Remove unneeded arguments for AC_ARG_ENABLE
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
a07025c993 configure: Clean code for --enable-opencl
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text
* Run AC_CHECK_HEADERS, AC_CHECK_LIB only if OpenCL support is enabled

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
0ad6e3e77f configure: Clean code for --enable-legacy
* Remove unneeded arguments for AC_ARG_ENABLE
* Fix formatting of help text
* Remove help text for --enable-legacy

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
e47a9272d7 configure: Clean code for --enable-graphics
* Remove unneeded arguments for AC_ARG_ENABLE
* Remove help text for --enable-graphics

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
cfc5ef65a2 configure: Clean code for --enable-embedded
* Remove unneeded arguments for AC_ARG_ENABLE
* Use AS_HELP_STRING
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
11cafd7673 configure: Clean code for --enable-debug
* Remove unneeded arguments for AC_ARG_ENABLE (needs renaming of macro)
* Use [] instead of () for default in help text

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:33:28 +02:00
Stefan Weil
11d9d8e59a configure: Remove macro AC_SYS_INTERPRETER
The macro sets interpval which is not used by Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Stefan Weil
0a4edf618a configure: Remove large file support
Tesseract does not handle large files (more than 2 GiB).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Stefan Weil
4bbebd3f7e Remove tests for function getline
The Tesseract code does not use getline.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-19 16:19:58 +02:00
Egor Pugin
3a7f5e4de4
Merge pull request #1786 from stweil/serialize
Use new serialization API
2018-07-18 23:28:30 +03:00
Stefan Weil
b7b8dba5db LSTMTrainer: Use new serialization API
Improve also portability by using int32_t instead of int
for a serialized member variable.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-07-18 19:28:05 +02:00