Commit Graph

33 Commits

Author SHA1 Message Date
Stefan Weil
d8d63fd71b Optimize performance with clang-tidy
The code was partially formatted with clang-format and optimized with

    clang-tidy --checks="-*,perfor*" --fix src/*/*.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 15:54:04 +01:00
Merlijn Wajer
ca177e72f3 hocrrenderer: write scan_res property to the ocr_page
This will make Tesseract emit the DPI of the document, if known at OCR
time. This is requird to properly interpret the x_fsize (font size)
property of words, since Tesseract scales the font size to the DPI.

See issue #3326 (https://github.com/tesseract-ocr/tesseract/issues/3326)
2021-09-21 11:02:52 +02:00
Ger Hobbelt
444fe14273 Fix a couple of 'shadowed local variables' compiler warnings
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)

[sw]: Format commit message and use different fix for blamer.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
897e59613d Clean code for hOCR renderer
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 16:36:23 +02:00
Stefan Weil
d4d51910e1 Add braces to single line statements (clang-tidy -checks='-*,google-readability-braces-around-statements')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-22 09:02:13 +01:00
Egor Pugin
0eb7ba88bf [clang-format] Execute clang format on include and src dirs.
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Stefan Weil
ea446b1eae Remove blanks at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 14:05:36 +01:00
Stefan Weil
fec9c11c8c Use std::vector, std::string in baseapi.h
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Robin Watts
27d513462c Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR.
This means the sources compile perfectly in the absence of
config_auto.h/HAVE_CONFIG_H as they were intended to do.

TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION
by autoconf, so there are no actual changes in compiled code.
2020-05-12 21:45:12 +02:00
Egor Pugin
2a37f5dd62 Update includes to use <>. 2019-10-29 14:50:11 +03:00
amitdo
e1bae15547 Fix #include path of public headers 2019-10-28 19:10:30 +02:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Noah Metzger
c350077b96 Made the lstm_choice mode compatible with the hocr_char_boxes mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07 Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Stefan Weil
e6ca7f3ec6 hocrrenderer: Add missing escaping of special characters in HTML output
This converts special character like '<' or '>' to the
correct HTML entities.

Optimize also the code a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-19 13:53:36 +02:00
Stefan Weil
4cb3f34c09 Improve formatting of hOCR output with character boxes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 11:07:18 +02:00
Noah Metzger
2dd5d0d60a Fixed a bug when first decode iteration stays empty and added some comments.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-15 10:05:22 +02:00
Noah Metzger
11a4cd298b Added parameters for the LSTM CTC Choice mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Noah Metzger
f2d685a90f Added CTC-based Symbolchoices.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Nick White
068eb4c35d Add different classes to hocr output depending on BlockType
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-14 13:25:08 +01:00
Stefan Weil
708511adcb Only include windows.h using host.h
host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be
set before including windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
20d5eedd45 Modernize code (clang-tidy check modernize-loop-convert)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
Stefan Weil
81fbd878dd Add more missing include statements for Windows build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-01 08:10:25 +02:00
Noah Metzger
5b3e2fe812 Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Noah Metzger
754e38d2b4 Added the option to get the timesteps separated by the suggested segmentation
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
Shree Devi Kumar
9c89cd51cf Add a new renderer to create box files from images for LSTM training
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)

fix typo

(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)

Add lstmboxrenderer to CMakeLists

(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)

fix formatting

(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Nick White
ebbf907c56 Fix typo in hocr character box output 2019-01-13 16:28:31 +00:00
Nick White
4ce797b6f6 Fix hocr character box info to use new hocr renderer correctly 2019-01-13 13:01:14 +00:00
Nick White
c43e4501e3 Merge remote-tracking branch 'origin/master' into hocrcharboxes 2019-01-13 12:41:42 +00:00
Stefan Weil
db9c7e0312 Use std::stringstream to generate hOCR output
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
Stefan Weil
457c53026d Fix indentation of hOCR output
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb Format code in new file hocrrenderer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2 Move code for hOCR renderer to new file
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00