Commit Graph

96 Commits

Author SHA1 Message Date
zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
Stefan Weil
c7e8d30280 Fix value for PHYSICAL_IMG_NR in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00
Stefan Weil
457c53026d Fix indentation of hOCR output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb Format code in new file hocrrenderer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2 Move code for hOCR renderer to new file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00
Stefan Weil
fbbbdb4565 Use std::stringstream to generate ALTO output and add <SP> element
Using std::stringstream simplifies the code.
The <SP> element is needed between two >String> elements.
Remove also some unneeded spaces in the ALTO output.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-12 22:29:35 +01:00
Stefan Weil
f0a4d04187 Add config variable for selection of dot product function
All also a C++ implementation with more aggressive compiler options
which is optimized for the CPU where the software was built.

It is now possible to select the function used for the dot product
with -c dotproduct=FUNCTION where FUNCTION can be one of those values:

* auto      selection based on detected hardware (default)
* generic   C++ code with default compiler options
* native    C++ code optimized for build host
* avx       optimized code for AVX
* sse       optimized code for SSE

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-01 00:19:28 +01:00
Stefan Weil
1910b1a72b SIMDDetect: Use tesseract namespace and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:36:39 +01:00
Stefan Weil
ed48b2a8f5 Format new ALTO code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Jake Sebright
d7cee03a94 Add support for ALTO output 2018-11-30 06:09:36 +01:00
Egor Pugin
685b136d89
Fix incorrect condition. 2018-11-29 19:02:54 +03:00
Zdenko Podobný
3d508a65a7 set unlv_tilde_crunching to false; fixes #1449 #948 2018-10-23 09:26:32 +02:00
Stefan Weil
be0cf03778 tesseractmain: Fix memory leak
Commit 49d7df6dc3 introduced a memory leak
when the output file could not be created.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 18:50:47 +02:00
Stefan Weil
d75ef80f12 Get sorted list of available languages
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.

Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 14:07:03 +02:00
Stefan Weil
e232114089 Fix use of undefined macro USE_DEVICE_SELECTION
This fixes compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 13:58:12 +02:00
Stefan Weil
d364750cb3 Remove type cast and fix compiler warning (-Wcast-qual)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 12:04:46 +02:00
Marco Atzeri
ebbd4e3efc fixes #426; define NOUNDEFINED for cygwin 2018-10-20 11:25:28 +02:00
Stefan Weil
bb181ec8d3 Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
df7d1e1f97 Rename API function for getting LSTM choices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
49d7df6dc3 tesseractmain: Show error message when output file could not be created
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:49 +02:00
Stefan Weil
b0b8dfbc81 TessResultRenderer: Extend API to access status of renderer
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:48 +02:00
Noah Metzger
c13371d6e0 Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
32e1e4b6b4 TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396172)
This fixes a warning from Coverity Scan

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d89ec15571 Revert "Fix CID 1396172 (Uninitialized members)"
This reverts commit cbd09de7fe.
The variable can be removed as it is not used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Zdenko Podobný
cbd09de7fe Fix CID 1396172 (Uninitialized members) 2018-10-16 12:24:10 +02:00
Stefan Weil
6ffb53f815 win32: Show TIFF errors on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Such error messages can be triggered by TIFF files which include
vendor specific tags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-13 20:42:14 +02:00
Stefan Weil
d86d520fd0 Remove tab character in source files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
ca5d285a28 hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470 2018-10-09 16:34:50 +02:00
zdenop
956525f5a4 fix uninitialized variable, remove unused variable 2018-10-09 15:47:20 +02:00
zdenop
c375f4fbf7 keep API compatibility with #1265 2018-10-09 11:22:15 +02:00
zdenop
f794571195 use pdf L_FLATE_ENCODE only for png input; fixes #1961 2018-10-07 20:57:19 +02:00
Stefan Weil
67bf9062df Rework check for readable input file
This reverts commit 1a096441d0 and
implements an alternate check which allows input from stdin.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 22:33:02 +02:00
Stefan Weil
8dc9e9fd14 Fix use of wrong UNICHARSET
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 13:21:09 +02:00
Stefan Weil
26bfd2b9d3 Allow orientation detection with any traineddata
While orientation and script detection (OSD) normally requires
osd.traineddata to detect both, it must also be possible to do
only orientation detection with eng.traineddata or any other
traineddata.

Enforce osd.traineddata only if there was no `-l` command line option.

Commit 27ce472666 was too restrictive.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 17:07:14 +02:00
Egor Pugin
6ee7f4eac2
Fix typo. 2018-09-29 17:04:25 +03:00
zdenop
d5b6222856
Merge pull request #1935 from stweil/style
Format code and fix some style issues
2018-09-29 09:32:56 +02:00
zdenop
1a096441d0 tesseract app: check if input file exists; fixes #1023 2018-09-29 08:51:00 +02:00
Stefan Weil
0f3206d5fe Format code (replace ( xxx ) by (xxx))
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-29 08:21:25 +02:00
zdenop
a0564fd4ec Allow user to specify dpi for input image 2018-09-28 20:28:52 +02:00
zdenop
5fe1390748 remove alpha channel from png: issue #1914 2018-09-27 19:40:15 +02:00
zdenop
971fe50031 fixed #714: use binary mode when generating pdf to stdout on Windows 2018-09-27 18:35:15 +02:00
Zdenko Podobný
5dfce7471c fix #1889: part 2 2018-09-26 09:28:22 +02:00
zdenop
4ca179d3fa remove condition because fontsize is always > 0 2018-09-20 21:48:44 +02:00
Zdenko Podobný
5d22fdfeed replace deprecated C++ headers (reported by clan-tidy) - partially supersedes PR #1605 2018-09-18 18:51:11 +02:00
David Thornley
92e291250a Fix missing default parameter value cause compile to fail. 2018-09-14 09:56:06 +02:00
David Thornley
31aeb534d9 Fix merge conflicts
Merge branch 'master' into jpg_quality_option

* master: (577 commits)
  fix issue #1889
  Add badges for download , licence and lgtm
  Replace macro MINGW by __MINGW32__
  EquationDetectBase: Define virtual destructor in .cpp file
  BlobGrid: Define virtual destructor in .cpp file
  GridBase: Define virtual destructor in .cpp file
  AlignedBlob: Define virtual destructor in .cpp file
  TransposedArray: Define virtual destructor in .cpp file
  IndexMapBiDi: Define virtual destructor in .cpp file
  Add missing include file (fixes linker error for Visual Studio)
  NthItemTest: Add definition for virtual destructor
  HeapTest: Add definition for virtual destructor
  IcuErrorCode: Define virtual destructor in .cpp file
  Validator: Define virtual destructor in .cpp file
  Dawg: Define virtual destructor in .cpp file
  CUtil: Define virtual destructor in .cpp file
  IndexMap: Define virtual destructor in .cpp file
  CCUtil: Define virtual destructor in .cpp file
  MATRIX: Define virtual destructor in .cpp file
  CCStruct: Define virtual destructor in .cpp file
  ...
2018-09-13 16:03:24 +02:00
Zdenko Podobný
59e42fcef6 fix issue #1889 2018-09-13 07:26:37 +02:00
Stefan Weil
be1393b1e8 Replace macro MINGW by __MINGW32__
MINGW is no longer used and now removed from configure.ac.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-04 16:05:27 +02:00
Stefan Weil
9f8ed31a26 api/pdfrenderer.cpp: Fix compiler warning
Compiler warning from clang:

src/api/pdfrenderer.cpp:848:28: warning:
 cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-09-03 12:32:35 +02:00
Noah Metzger
663be426f6 Added the option for character accumulated glyph confidences.
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-20 10:43:58 +02:00