zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
...
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
Stefan Weil
c7e8d30280
Fix value for PHYSICAL_IMG_NR in ALTO output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00
Stefan Weil
457c53026d
Fix indentation of hOCR output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb
Format code in new file hocrrenderer.cpp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2
Move code for hOCR renderer to new file
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00
Stefan Weil
fbbbdb4565
Use std::stringstream to generate ALTO output and add <SP> element
...
Using std::stringstream simplifies the code.
The <SP> element is needed between two >String> elements.
Remove also some unneeded spaces in the ALTO output.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-12 22:29:35 +01:00
Stefan Weil
f0a4d04187
Add config variable for selection of dot product function
...
All also a C++ implementation with more aggressive compiler options
which is optimized for the CPU where the software was built.
It is now possible to select the function used for the dot product
with -c dotproduct=FUNCTION where FUNCTION can be one of those values:
* auto selection based on detected hardware (default)
* generic C++ code with default compiler options
* native C++ code optimized for build host
* avx optimized code for AVX
* sse optimized code for SSE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-01 00:19:28 +01:00
Stefan Weil
1910b1a72b
SIMDDetect: Use tesseract namespace and format code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:36:39 +01:00
Stefan Weil
ed48b2a8f5
Format new ALTO code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Jake Sebright
d7cee03a94
Add support for ALTO output
2018-11-30 06:09:36 +01:00
Egor Pugin
685b136d89
Fix incorrect condition.
2018-11-29 19:02:54 +03:00
Zdenko Podobný
3d508a65a7
set unlv_tilde_crunching to false; fixes #1449 #948
2018-10-23 09:26:32 +02:00
Stefan Weil
be0cf03778
tesseractmain: Fix memory leak
...
Commit 49d7df6dc3
introduced a memory leak
when the output file could not be created.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 18:50:47 +02:00
Stefan Weil
d75ef80f12
Get sorted list of available languages
...
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.
Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 14:07:03 +02:00
Stefan Weil
e232114089
Fix use of undefined macro USE_DEVICE_SELECTION
...
This fixes compiler warnings.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 13:58:12 +02:00
Stefan Weil
d364750cb3
Remove type cast and fix compiler warning (-Wcast-qual)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 12:04:46 +02:00
Marco Atzeri
ebbd4e3efc
fixes #426 ; define NOUNDEFINED for cygwin
2018-10-20 11:25:28 +02:00
Stefan Weil
bb181ec8d3
Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
df7d1e1f97
Rename API function for getting LSTM choices
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
49d7df6dc3
tesseractmain: Show error message when output file could not be created
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:49 +02:00
Stefan Weil
b0b8dfbc81
TessResultRenderer: Extend API to access status of renderer
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:48 +02:00
Noah Metzger
c13371d6e0
Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
...
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
32e1e4b6b4
TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396172)
...
This fixes a warning from Coverity Scan
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d89ec15571
Revert "Fix CID 1396172 (Uninitialized members)"
...
This reverts commit cbd09de7fe
.
The variable can be removed as it is not used.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Zdenko Podobný
cbd09de7fe
Fix CID 1396172 (Uninitialized members)
2018-10-16 12:24:10 +02:00
Stefan Weil
6ffb53f815
win32: Show TIFF errors on console
...
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.
Such error messages can be triggered by TIFF files which include
vendor specific tags.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-13 20:42:14 +02:00
Stefan Weil
d86d520fd0
Remove tab character in source files
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
ca5d285a28
hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470
2018-10-09 16:34:50 +02:00
zdenop
956525f5a4
fix uninitialized variable, remove unused variable
2018-10-09 15:47:20 +02:00
zdenop
c375f4fbf7
keep API compatibility with #1265
2018-10-09 11:22:15 +02:00
zdenop
f794571195
use pdf L_FLATE_ENCODE only for png input; fixes #1961
2018-10-07 20:57:19 +02:00
Stefan Weil
67bf9062df
Rework check for readable input file
...
This reverts commit 1a096441d0
and
implements an alternate check which allows input from stdin.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 22:33:02 +02:00
Stefan Weil
8dc9e9fd14
Fix use of wrong UNICHARSET
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 13:21:09 +02:00
Stefan Weil
26bfd2b9d3
Allow orientation detection with any traineddata
...
While orientation and script detection (OSD) normally requires
osd.traineddata to detect both, it must also be possible to do
only orientation detection with eng.traineddata or any other
traineddata.
Enforce osd.traineddata only if there was no `-l` command line option.
Commit 27ce472666
was too restrictive.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 17:07:14 +02:00
Egor Pugin
6ee7f4eac2
Fix typo.
2018-09-29 17:04:25 +03:00
zdenop
d5b6222856
Merge pull request #1935 from stweil/style
...
Format code and fix some style issues
2018-09-29 09:32:56 +02:00
zdenop
1a096441d0
tesseract app: check if input file exists; fixes #1023
2018-09-29 08:51:00 +02:00
Stefan Weil
0f3206d5fe
Format code (replace ( xxx ) by (xxx))
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-29 08:21:25 +02:00
zdenop
a0564fd4ec
Allow user to specify dpi for input image
2018-09-28 20:28:52 +02:00
zdenop
5fe1390748
remove alpha channel from png: issue #1914
2018-09-27 19:40:15 +02:00
zdenop
971fe50031
fixed #714 : use binary mode when generating pdf to stdout on Windows
2018-09-27 18:35:15 +02:00
Zdenko Podobný
5dfce7471c
fix #1889 : part 2
2018-09-26 09:28:22 +02:00
zdenop
4ca179d3fa
remove condition because fontsize is always > 0
2018-09-20 21:48:44 +02:00
Zdenko Podobný
5d22fdfeed
replace deprecated C++ headers (reported by clan-tidy) - partially supersedes PR #1605
2018-09-18 18:51:11 +02:00
David Thornley
92e291250a
Fix missing default parameter value cause compile to fail.
2018-09-14 09:56:06 +02:00
David Thornley
31aeb534d9
Fix merge conflicts
...
Merge branch 'master' into jpg_quality_option
* master: (577 commits)
fix issue #1889
Add badges for download , licence and lgtm
Replace macro MINGW by __MINGW32__
EquationDetectBase: Define virtual destructor in .cpp file
BlobGrid: Define virtual destructor in .cpp file
GridBase: Define virtual destructor in .cpp file
AlignedBlob: Define virtual destructor in .cpp file
TransposedArray: Define virtual destructor in .cpp file
IndexMapBiDi: Define virtual destructor in .cpp file
Add missing include file (fixes linker error for Visual Studio)
NthItemTest: Add definition for virtual destructor
HeapTest: Add definition for virtual destructor
IcuErrorCode: Define virtual destructor in .cpp file
Validator: Define virtual destructor in .cpp file
Dawg: Define virtual destructor in .cpp file
CUtil: Define virtual destructor in .cpp file
IndexMap: Define virtual destructor in .cpp file
CCUtil: Define virtual destructor in .cpp file
MATRIX: Define virtual destructor in .cpp file
CCStruct: Define virtual destructor in .cpp file
...
2018-09-13 16:03:24 +02:00
Zdenko Podobný
59e42fcef6
fix issue #1889
2018-09-13 07:26:37 +02:00
Stefan Weil
be1393b1e8
Replace macro MINGW by __MINGW32__
...
MINGW is no longer used and now removed from configure.ac.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 16:05:27 +02:00
Stefan Weil
9f8ed31a26
api/pdfrenderer.cpp: Fix compiler warning
...
Compiler warning from clang:
src/api/pdfrenderer.cpp:848:28: warning:
cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-03 12:32:35 +02:00
Noah Metzger
663be426f6
Added the option for character accumulated glyph confidences.
...
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-20 10:43:58 +02:00