Stefan Weil
d8d63fd71b
Optimize performance with clang-tidy
...
The code was partially formatted with clang-format and optimized with
clang-tidy --checks="-*,perfor*" --fix src/*/*.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 15:54:04 +01:00
Merlijn Wajer
ca177e72f3
hocrrenderer: write scan_res property to the ocr_page
...
This will make Tesseract emit the DPI of the document, if known at OCR
time. This is requird to properly interpret the x_fsize (font size)
property of words, since Tesseract scales the font size to the DPI.
See issue #3326 (https://github.com/tesseract-ocr/tesseract/issues/3326 )
2021-09-21 11:02:52 +02:00
Ger Hobbelt
444fe14273
Fix a couple of 'shadowed local variables' compiler warnings
...
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)
[sw]: Format commit message and use different fix for blamer.cpp
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
897e59613d
Clean code for hOCR renderer
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-06 16:36:23 +02:00
Stefan Weil
d4d51910e1
Add braces to single line statements (clang-tidy -checks='-*,google-readability-braces-around-statements')
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-22 09:02:13 +01:00
Egor Pugin
0eb7ba88bf
[clang-format] Execute clang format on include and src dirs.
...
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Stefan Weil
ea446b1eae
Remove blanks at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 14:05:36 +01:00
Stefan Weil
fec9c11c8c
Use std::vector, std::string in baseapi.h
...
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Robin Watts
27d513462c
Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR.
...
This means the sources compile perfectly in the absence of
config_auto.h/HAVE_CONFIG_H as they were intended to do.
TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION
by autoconf, so there are no actual changes in compiled code.
2020-05-12 21:45:12 +02:00
Egor Pugin
2a37f5dd62
Update includes to use <>.
2019-10-29 14:50:11 +03:00
amitdo
e1bae15547
Fix #include path of public headers
2019-10-28 19:10:30 +02:00
Stefan Weil
994ec697d8
Remove member functions STRING::string and StringParam::string
...
They were redundant because there exist member functions 'c_str' which do the same.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Noah Metzger
c350077b96
Made the lstm_choice mode compatible with the hocr_char_boxes mode
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07
Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Stefan Weil
e6ca7f3ec6
hocrrenderer: Add missing escaping of special characters in HTML output
...
This converts special character like '<' or '>' to the
correct HTML entities.
Optimize also the code a little bit.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-19 13:53:36 +02:00
Stefan Weil
4cb3f34c09
Improve formatting of hOCR output with character boxes
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 11:07:18 +02:00
Noah Metzger
2dd5d0d60a
Fixed a bug when first decode iteration stays empty and added some comments.
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-15 10:05:22 +02:00
Noah Metzger
11a4cd298b
Added parameters for the LSTM CTC Choice mode
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Noah Metzger
f2d685a90f
Added CTC-based Symbolchoices.
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Nick White
068eb4c35d
Add different classes to hocr output depending on BlockType
...
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-14 13:25:08 +01:00
Stefan Weil
708511adcb
Only include windows.h using host.h
...
host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be
set before including windows.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
20d5eedd45
Modernize code (clang-tidy check modernize-loop-convert)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
Stefan Weil
81fbd878dd
Add more missing include statements for Windows build
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-01 08:10:25 +02:00
Noah Metzger
5b3e2fe812
Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Noah Metzger
754e38d2b4
Added the option to get the timesteps separated by the suggested segmentation
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
Shree Devi Kumar
9c89cd51cf
Add a new renderer to create box files from images for LSTM training
...
(cherry picked from commit 921da6be2bdbda2ddd64514f9b6bec40a336246a)
fix typo
(cherry picked from commit 7bd1a0c80393fce2f34e2845cb26760bcf3791cd)
Add lstmboxrenderer to CMakeLists
(cherry picked from commit cfef3a889aef830725921b5c0218d5e9c633b03e)
fix formatting
(cherry picked from commit 7ba2b01ede7940ed609a073364948ef8c838cd10)
2019-02-05 14:03:29 +00:00
Nick White
ebbf907c56
Fix typo in hocr character box output
2019-01-13 16:28:31 +00:00
Nick White
4ce797b6f6
Fix hocr character box info to use new hocr renderer correctly
2019-01-13 13:01:14 +00:00
Nick White
c43e4501e3
Merge remote-tracking branch 'origin/master' into hocrcharboxes
2019-01-13 12:41:42 +00:00
Stefan Weil
db9c7e0312
Use std::stringstream to generate hOCR output
...
Using std::stringstream simplifies the code and allows conversion of
double to string independant of the current locale setting.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-16 20:14:11 +01:00
Stefan Weil
457c53026d
Fix indentation of hOCR output
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 17:51:59 +01:00
Stefan Weil
5de3fc47bb
Format code in new file hocrrenderer.cpp
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:35:21 +01:00
Stefan Weil
48713f7df2
Move code for hOCR renderer to new file
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-12-15 15:33:47 +01:00