Commit Graph

117 Commits

Author SHA1 Message Date
Shree Devi Kumar
b51c1bf05a change to const char* as suggested by @stweil 2019-02-10 05:13:18 +00:00
Shree Devi Kumar
0f42fd8c69 change to use bbox coordinates for TEXTLINE for all characters
(cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)
2019-02-05 14:03:29 +00:00
Shree Devi Kumar
9c89cd51cf Add a new renderer to create box files from images for LSTM training
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)

fix typo

(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)

Add lstmboxrenderer to CMakeLists

(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)

fix formatting

(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Mikhail Akopov
7be04342cf Fix typo 2019-02-01 09:58:44 +01:00
Stefan Weil
9e6e3a0232 Fix memory leak for PNG images
Commit 5fe1390748 used an implementation
which created a new Pix object. That object was never destroyed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 20:05:10 +01:00
Stefan Weil
7fc7d28dd0 Compile files for AVX, AVX2 or SSE only when needed
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
zdenop
f75b2c1948
Merge pull request #310 from nickjwhite/hocrcharboxes
Character boxes in hOCR output
2019-01-14 19:19:04 +01:00
Nick White
ebbf907c56 Fix typo in hocr character box output 2019-01-13 16:28:31 +00:00
Nick White
4ce797b6f6 Fix hocr character box info to use new hocr renderer correctly 2019-01-13 13:01:14 +00:00
Nick White
c43e4501e3 Merge remote-tracking branch 'origin/master' into hocrcharboxes 2019-01-13 12:41:42 +00:00
zdenop
238cb219d5
Merge pull request #2152 from stweil/clean
Remove opencl_device_selection.h
2019-01-09 15:02:59 +01:00
Stefan Weil
a0e6586e63 Fix documentation for page segmentation mode 2
It never worked, so add a comment that the implementation is missing.
Add also a to-do comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 13:51:44 +01:00
Stefan Weil
0fae848b58 OpenCL: Add comments to users of openclwrapper.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:11:00 +01:00
Stefan Weil
e0fc4f2945 Remove opencl_device_selection.h
Always use OpenCL device selection if OpenCL is enabled.

This fixes a regression which was introduced by commit
5c6a57b727 which removed
the definition for USE_DEVICE_SELECTION.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:09:56 +01:00
zdenop
d3065520fa fix 2 clang warnings 2018-12-30 20:25:24 +01:00
Stefan Weil
cb049133cd Fix compiler warning
clang warning:

    tesseractmain.cpp(512,21): warning: '&&' within '||' [-Wlogical-op-parentheses]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-29 22:17:33 +01:00
zdenop
420fb0ced0 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2018-12-29 10:31:33 +01:00
zdenop
8885fe2ccb provide info about compiled openmp version 2018-12-29 10:18:27 +01:00
Stefan Weil
993e56ffde Don't try to create text output if other renderers failed (fix regression)
Commit 49d7df6dc3 added error handling,
but since that commit Tesseract used the text fallback if the user
selected output failed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-27 10:23:28 +01:00
zdenop
cc997b53c7 add missing the implementation for TessBaseAPIGetAltoText method in C-API 2018-12-26 21:35:47 +01:00
Stefan Weil
db9c7e0312 Use std::stringstream to generate hOCR output
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
Stefan Weil
c7e8d30280 Fix value for PHYSICAL_IMG_NR in ALTO output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00
Stefan Weil
457c53026d Fix indentation of hOCR output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb Format code in new file hocrrenderer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2 Move code for hOCR renderer to new file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00
Stefan Weil
fbbbdb4565 Use std::stringstream to generate ALTO output and add <SP> element
Using std::stringstream simplifies the code.
The <SP> element is needed between two >String> elements.
Remove also some unneeded spaces in the ALTO output.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-12 22:29:35 +01:00
Stefan Weil
f0a4d04187 Add config variable for selection of dot product function
All also a C++ implementation with more aggressive compiler options
which is optimized for the CPU where the software was built.

It is now possible to select the function used for the dot product
with -c dotproduct=FUNCTION where FUNCTION can be one of those values:

* auto      selection based on detected hardware (default)
* generic   C++ code with default compiler options
* native    C++ code optimized for build host
* avx       optimized code for AVX
* sse       optimized code for SSE

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-01 00:19:28 +01:00
Stefan Weil
1910b1a72b SIMDDetect: Use tesseract namespace and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:36:39 +01:00
Stefan Weil
ed48b2a8f5 Format new ALTO code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Jake Sebright
d7cee03a94 Add support for ALTO output 2018-11-30 06:09:36 +01:00
Egor Pugin
685b136d89
Fix incorrect condition. 2018-11-29 19:02:54 +03:00
Zdenko Podobný
3d508a65a7 set unlv_tilde_crunching to false; fixes #1449 #948 2018-10-23 09:26:32 +02:00
Stefan Weil
be0cf03778 tesseractmain: Fix memory leak
Commit 49d7df6dc3 introduced a memory leak
when the output file could not be created.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 18:50:47 +02:00
Stefan Weil
d75ef80f12 Get sorted list of available languages
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.

Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 14:07:03 +02:00
Stefan Weil
e232114089 Fix use of undefined macro USE_DEVICE_SELECTION
This fixes compiler warnings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 13:58:12 +02:00
Stefan Weil
d364750cb3 Remove type cast and fix compiler warning (-Wcast-qual)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 12:04:46 +02:00
Marco Atzeri
ebbd4e3efc fixes #426; define NOUNDEFINED for cygwin 2018-10-20 11:25:28 +02:00
Stefan Weil
bb181ec8d3 Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
df7d1e1f97 Rename API function for getting LSTM choices
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
49d7df6dc3 tesseractmain: Show error message when output file could not be created
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:49 +02:00
Stefan Weil
b0b8dfbc81 TessResultRenderer: Extend API to access status of renderer
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:48 +02:00
Noah Metzger
c13371d6e0 Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
32e1e4b6b4 TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396172)
This fixes a warning from Coverity Scan

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d89ec15571 Revert "Fix CID 1396172 (Uninitialized members)"
This reverts commit cbd09de7fe.
The variable can be removed as it is not used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Zdenko Podobný
cbd09de7fe Fix CID 1396172 (Uninitialized members) 2018-10-16 12:24:10 +02:00
Stefan Weil
6ffb53f815 win32: Show TIFF errors on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Such error messages can be triggered by TIFF files which include
vendor specific tags.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-13 20:42:14 +02:00
Stefan Weil
d86d520fd0 Remove tab character in source files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
ca5d285a28 hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470 2018-10-09 16:34:50 +02:00
zdenop
956525f5a4 fix uninitialized variable, remove unused variable 2018-10-09 15:47:20 +02:00