Shree Devi Kumar
b51c1bf05a
change to const char* as suggested by @stweil
2019-02-10 05:13:18 +00:00
Shree Devi Kumar
0f42fd8c69
change to use bbox coordinates for TEXTLINE for all characters
...
(cherry picked from commit 049db108b2d6cd3a7f52e480212320613117d50b)
2019-02-05 14:03:29 +00:00
Shree Devi Kumar
9c89cd51cf
Add a new renderer to create box files from images for LSTM training
...
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)
fix typo
(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)
Add lstmboxrenderer to CMakeLists
(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)
fix formatting
(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Mikhail Akopov
7be04342cf
Fix typo
2019-02-01 09:58:44 +01:00
Stefan Weil
9e6e3a0232
Fix memory leak for PNG images
...
Commit 5fe1390748
used an implementation
which created a new Pix object. That object was never destroyed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-23 20:05:10 +01:00
Stefan Weil
7fc7d28dd0
Compile files for AVX, AVX2 or SSE only when needed
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-14 21:34:37 +01:00
zdenop
f75b2c1948
Merge pull request #310 from nickjwhite/hocrcharboxes
...
Character boxes in hOCR output
2019-01-14 19:19:04 +01:00
Nick White
ebbf907c56
Fix typo in hocr character box output
2019-01-13 16:28:31 +00:00
Nick White
4ce797b6f6
Fix hocr character box info to use new hocr renderer correctly
2019-01-13 13:01:14 +00:00
Nick White
c43e4501e3
Merge remote-tracking branch 'origin/master' into hocrcharboxes
2019-01-13 12:41:42 +00:00
zdenop
238cb219d5
Merge pull request #2152 from stweil/clean
...
Remove opencl_device_selection.h
2019-01-09 15:02:59 +01:00
Stefan Weil
a0e6586e63
Fix documentation for page segmentation mode 2
...
It never worked, so add a comment that the implementation is missing.
Add also a to-do comment.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 13:51:44 +01:00
Stefan Weil
0fae848b58
OpenCL: Add comments to users of openclwrapper.h
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:11:00 +01:00
Stefan Weil
e0fc4f2945
Remove opencl_device_selection.h
...
Always use OpenCL device selection if OpenCL is enabled.
This fixes a regression which was introduced by commit
5c6a57b727
which removed
the definition for USE_DEVICE_SELECTION.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 12:09:56 +01:00
zdenop
d3065520fa
fix 2 clang warnings
2018-12-30 20:25:24 +01:00
Stefan Weil
cb049133cd
Fix compiler warning
...
clang warning:
tesseractmain.cpp(512,21): warning: '&&' within '||' [-Wlogical-op-parentheses]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-29 22:17:33 +01:00
zdenop
420fb0ced0
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2018-12-29 10:31:33 +01:00
zdenop
8885fe2ccb
provide info about compiled openmp version
2018-12-29 10:18:27 +01:00
Stefan Weil
993e56ffde
Don't try to create text output if other renderers failed (fix regression)
...
Commit 49d7df6dc3
added error handling,
but since that commit Tesseract used the text fallback if the user
selected output failed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-27 10:23:28 +01:00
zdenop
cc997b53c7
add missing the implementation for TessBaseAPIGetAltoText method in C-API
2018-12-26 21:35:47 +01:00
Stefan Weil
db9c7e0312
Use std::stringstream to generate hOCR output
...
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
zdenop
72d8df581b
Merge pull request #2121 from stweil/hocr
...
Move code for hOCR renderer to new file
2018-12-16 16:26:27 +01:00
Stefan Weil
c7e8d30280
Fix value for PHYSICAL_IMG_NR in ALTO output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 15:07:02 +01:00
Stefan Weil
457c53026d
Fix indentation of hOCR output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb
Format code in new file hocrrenderer.cpp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2
Move code for hOCR renderer to new file
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00
Stefan Weil
fbbbdb4565
Use std::stringstream to generate ALTO output and add <SP> element
...
Using std::stringstream simplifies the code.
The <SP> element is needed between two >String> elements.
Remove also some unneeded spaces in the ALTO output.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-12 22:29:35 +01:00
Stefan Weil
f0a4d04187
Add config variable for selection of dot product function
...
All also a C++ implementation with more aggressive compiler options
which is optimized for the CPU where the software was built.
It is now possible to select the function used for the dot product
with -c dotproduct=FUNCTION where FUNCTION can be one of those values:
* auto selection based on detected hardware (default)
* generic C++ code with default compiler options
* native C++ code optimized for build host
* avx optimized code for AVX
* sse optimized code for SSE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-01 00:19:28 +01:00
Stefan Weil
1910b1a72b
SIMDDetect: Use tesseract namespace and format code
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 23:36:39 +01:00
Stefan Weil
ed48b2a8f5
Format new ALTO code with clang-format
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-11-30 06:37:25 +01:00
Jake Sebright
d7cee03a94
Add support for ALTO output
2018-11-30 06:09:36 +01:00
Egor Pugin
685b136d89
Fix incorrect condition.
2018-11-29 19:02:54 +03:00
Zdenko Podobný
3d508a65a7
set unlv_tilde_crunching to false; fixes #1449 #948
2018-10-23 09:26:32 +02:00
Stefan Weil
be0cf03778
tesseractmain: Fix memory leak
...
Commit 49d7df6dc3
introduced a memory leak
when the output file could not be created.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 18:50:47 +02:00
Stefan Weil
d75ef80f12
Get sorted list of available languages
...
TessBaseAPI::GetAvailableLanguagesAsVector returned the list of languages
without sorting, so the result was random and not user friendly.
Now `tesseract --list-langs` shows the available languages and scripts
in alphabetic order.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-22 14:07:03 +02:00
Stefan Weil
e232114089
Fix use of undefined macro USE_DEVICE_SELECTION
...
This fixes compiler warnings.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 13:58:12 +02:00
Stefan Weil
d364750cb3
Remove type cast and fix compiler warning (-Wcast-qual)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-20 12:04:46 +02:00
Marco Atzeri
ebbd4e3efc
fixes #426 ; define NOUNDEFINED for cygwin
2018-10-20 11:25:28 +02:00
Stefan Weil
bb181ec8d3
Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
df7d1e1f97
Rename API function for getting LSTM choices
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-19 10:50:38 +02:00
Stefan Weil
49d7df6dc3
tesseractmain: Show error message when output file could not be created
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:49 +02:00
Stefan Weil
b0b8dfbc81
TessResultRenderer: Extend API to access status of renderer
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-18 19:22:48 +02:00
Noah Metzger
c13371d6e0
Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
...
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
32e1e4b6b4
TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396172)
...
This fixes a warning from Coverity Scan
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Stefan Weil
d89ec15571
Revert "Fix CID 1396172 (Uninitialized members)"
...
This reverts commit cbd09de7fe
.
The variable can be removed as it is not used.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-16 16:53:23 +02:00
Zdenko Podobný
cbd09de7fe
Fix CID 1396172 (Uninitialized members)
2018-10-16 12:24:10 +02:00
Stefan Weil
6ffb53f815
win32: Show TIFF errors on console
...
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.
Such error messages can be triggered by TIFF files which include
vendor specific tags.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-13 20:42:14 +02:00
Stefan Weil
d86d520fd0
Remove tab character in source files
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
ca5d285a28
hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470
2018-10-09 16:34:50 +02:00
zdenop
956525f5a4
fix uninitialized variable, remove unused variable
2018-10-09 15:47:20 +02:00