Commit Graph

3524 Commits

Author SHA1 Message Date
zdenop
420fb0ced0 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-12-29 10:31:33 +01:00
zdenop
3e6ec97ac3 Remove altorenderer.cpp from resource compiling (already included in tesseract_src) 2018-12-29 10:30:56 +01:00
zdenop
8885fe2ccb provide info about compiled openmp version 2018-12-29 10:18:27 +01:00
zdenop
d44b58323f
Merge pull request #2133 from stweil/fix
Don't try to create text output if other renderers failed (fix regres…
2018-12-27 14:10:34 +01:00
Stefan Weil
993e56ffde Don't try to create text output if other renderers failed (fix regression)
Commit 49d7df6dc3 added error handling,
but since that commit Tesseract used the text fallback if the user
selected output failed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-27 10:23:28 +01:00
zdenop
cc997b53c7 add missing the implementation for TessBaseAPIGetAltoText method in C-API 2018-12-26 21:35:47 +01:00
Egor Pugin
e3a39c3577 Revert "Switch windows builds to SW."
This reverts commit 0967a32498.
2018-12-18 18:33:56 +03:00
Egor Pugin
0967a32498
Switch windows builds to SW. 2018-12-18 01:21:24 +03:00
zdenop
a3b2d74ce3
Merge pull request #2123 from stweil/hocr
Use std::stringstream to generate hOCR output
2018-12-16 22:21:14 +01:00
Stefan Weil
db9c7e0312 Use std::stringstream to generate hOCR output
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
zdenop
c7247a8be5
Merge pull request #2122 from stweil/alto
Fix value for PHYSICAL_IMG_NR in ALTO output
2018-12-16 16:24:27 +01:00
Stefan Weil
c7e8d30280 Fix value for PHYSICAL_IMG_NR in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00
Stefan Weil
fda0fa4e7e Add new hocrrenderer.cpp to CMakeList.txt and Android.mk
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 22:58:51 +01:00
Stefan Weil
457c53026d Fix indentation of hOCR output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb Format code in new file hocrrenderer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2 Move code for hOCR renderer to new file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00
Jake Sebright
e398601bf5 Include ALTO in list of supported output formats 2018-12-15 10:41:24 +01:00
zdenop
1f5fb15af3 remove setting constant resolution from ImageThresholder::SetImage.
Credible resolution with be set afterward. Fixes #2080.
2018-12-14 19:23:22 +01:00
zdenop
6d06d39bf4
Merge pull request #2118 from stweil/clean
protos: Remove several unused macros, functions and global variables
2018-12-14 09:20:53 +01:00
Stefan Weil
b8c4f1b9fc protos: Remove unused config variable
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-13 21:37:33 +01:00
zdenop
945216af58
Merge pull request #2117 from stweil/alto
Use std::stringstream to generate ALTO output and add <SP> element
2018-12-13 21:37:27 +01:00
Stefan Weil
f35eeb3b4a protos: Remove several unused macros, functions and global variables
The unused global variable TrainingData used a lot of runtime memory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-13 21:32:56 +01:00
Stefan Weil
fbbbdb4565 Use std::stringstream to generate ALTO output and add <SP> element
Using std::stringstream simplifies the code.
The <SP> element is needed between two >String> elements.
Remove also some unneeded spaces in the ALTO output.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-12 22:29:35 +01:00
Egor Pugin
5307b2c111
Merge pull request #2113 from stweil/typo
Fix several typos (most of them found by codespell)
2018-12-11 12:59:08 +03:00
Stefan Weil
7ebd3153ae Fix several typos (most of them found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-10 18:59:58 +01:00
Egor Pugin
5b65982fa4
Merge pull request #2111 from stweil/clean
Remove several unused methods
2018-12-08 20:07:35 +03:00
Stefan Weil
81ab302d52 FPRow: Remove three unused methods
This fixes warnings from the Intel compiler:

    src/textord/cjkpitch.cpp(319): warning #177:
      function "<unnamed>::FPRow::good_gaps" was declared but never referenced
    src/textord/cjkpitch.cpp(383): warning #177:
      function "<unnamed>::FPRow::is_bad" was declared but never referenced
    src/textord/cjkpitch.cpp(387): warning #177:
      function "<unnamed>::FPRow::is_unknown" was declared but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-08 16:43:52 +01:00
Stefan Weil
404f9cd147 SimpleStats: Remove unused method
This fixes a warning from the Intel compiler:

    src/textord/cjkpitch.cpp(79): warning #177:
      function "<unnamed>::SimpleStats::maximum" was declared
      but never referenced

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-08 16:39:46 +01:00
Stefan Weil
a9121d28f3
Merge pull request #2107 from stweil/march
Add check whether compiler supports -march=native flag
2018-12-08 10:53:09 +01:00
Stefan Weil
2c044df959 Fix wrong x_fsize in hOCR output (regression)
The regression was caused by the latest commit
c9e85ab78f.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-08 10:39:31 +01:00
Stefan Weil
2ccc5810f3 Add check whether compiler supports -march=native flag
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-05 20:13:28 +01:00
zdenop
ad40131385
Merge pull request #2104 from stweil/fix
Fix two runtime errors
2018-12-04 11:27:48 +01:00
Stefan Weil
c9e85ab78f Fix wrong font attributes in hOCR output
Instrumented code throws this runtime error during OCR:

    ../../src/api/baseapi.cpp:1616:5: runtime error: load of value 128,
      which is not a valid value for type 'bool'
    ../../src/api/baseapi.cpp:1627:5: runtime error: load of value 128,
      which is not a valid value for type 'bool'

If there is no font information (typical for Tesseract with a LSTM model),
the font attributes got random values resulting in wrong hOCR output.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-04 10:52:46 +01:00
Stefan Weil
0bdae8f8bf GENERIC_2D_ARRAY: Fix runtime error in assignment operator
Instrumented code throws this runtime error during OCR:

    ../../src/ccstruct/matrix.h:84:11: runtime error:
      null pointer passed as argument 2, which is declared to never be null

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-04 10:48:46 +01:00
zdenop
bee874a1c7
Merge pull request #2098 from stweil/dp-perf
Add config variable for selection of dot product function
2018-12-01 07:50:56 +01:00
Stefan Weil
f0a4d04187 Add config variable for selection of dot product function
All also a C++ implementation with more aggressive compiler options
which is optimized for the CPU where the software was built.

It is now possible to select the function used for the dot product
with -c dotproduct=FUNCTION where FUNCTION can be one of those values:

* auto      selection based on detected hardware (default)
* generic   C++ code with default compiler options
* native    C++ code optimized for build host
* avx       optimized code for AVX
* sse       optimized code for SSE

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-01 00:19:28 +01:00
zdenop
b527b37825
Merge pull request #2097 from stweil/namespace
SIMDDetect: Use tesseract namespace and format code
2018-12-01 00:02:18 +01:00
zdenop
d69dc27fa3
Merge pull request #2096 from stweil/clean
Clean code and fix crash with early use of tprintf
2018-12-01 00:02:08 +01:00
Stefan Weil
1910b1a72b SIMDDetect: Use tesseract namespace and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:36:39 +01:00
Stefan Weil
66d3275d0b IntSimdMatrixSSE: Remove unused include statement and simplify code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
048eb34934 Add missing static attribute to local inline functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
b73370aac9 Remove unneeded test for nullptr
IntSimdMatrix::GetFastestMultiplier never returns a nullptr.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
e2419b1968 Fix potential crash in tprintf
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
Stefan Weil
6b6d9de497 Fix potential crash in STRING class
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:14:11 +01:00
zdenop
b6057f5755
Merge pull request #2095 from stweil/optimize
Use -ffast-math for calculation of dot product
2018-11-30 23:04:59 +01:00
Stefan Weil
59fb3370bb Use -ffast-math for calculation of dot product
This reduces the code size for intsimdmatrixavx2 from 2700 to 2668
and slightly improves the performance for fast models with AVX2.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 22:52:04 +01:00
Stefan Weil
fda3ba9009 IntSimdMatrixSSE: Fix comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 22:13:32 +01:00
zdenop
07b140364f
Merge pull request #2093 from stweil/python
Updates for Python scripts
2018-11-30 08:10:20 +01:00
zdenop
53600c677e
Merge pull request #2092 from stweil/format
Format new ALTO code with clang-format
2018-11-30 08:08:52 +01:00