Commit Graph

53 Commits

Author SHA1 Message Date
Stefan Weil
d8d63fd71b Optimize performance with clang-tidy
The code was partially formatted with clang-format and optimized with

    clang-tidy --checks="-*,perfor*" --fix src/*/*.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 15:54:04 +01:00
Stefan Weil
988102c41d Disable incomplete code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
842cca1d49 Fix more signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:11:57 +02:00
Stefan Weil
3bb8263b3e lstm: Fix some signed/unsigned compiler warnings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-10 20:00:31 +02:00
Ger Hobbelt
444fe14273 Fix a couple of 'shadowed local variables' compiler warnings
These fixes got through while I manually extracted the template work
from my mainline (warnings due to running MSVC at Level 4)

[sw]: Format commit message and use different fix for blamer.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-07-20 20:49:03 +02:00
Stefan Weil
a701454ae5
Fix vector resize with init for all elements (issue #3473) (#3474)
Fixes: c8b8d266d6
Fixes: 9710bc0465
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-06-29 21:05:29 +03:00
Stefan Weil
77ed2886a7 Modernize code (clang-tidy -checks='-*,modernize-loop-convert')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-22 09:02:51 +01:00
Stefan Weil
d4d51910e1 Add braces to single line statements (clang-tidy -checks='-*,google-readability-braces-around-statements')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-22 09:02:13 +01:00
Stefan Weil
27293fad62 Modernize code (clang-tidy -checks='-*,modernize-use-emplace')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
35e143ddfc Modernize code (clang-tidy -checks='-*,modernize-use-auto')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
02774bda6e Modernize code (clang-tidy -checks='-*,modernize-loop-convert')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
0c20d3f843 Fix compiler warnings (mostly -Wsign-compare)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 09:29:34 +01:00
Stefan Weil
54aec32586 Replace remaining PointerVector by std::vector for src/lstm
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-19 22:22:04 +01:00
Stefan Weil
2a3682a35e Replace remaining GenericVector by std::vector in src/lstm
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-16 12:25:11 +01:00
Egor Pugin
0eb7ba88bf [clang-format] Execute clang format on include and src dirs.
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Stefan Weil
9b15e65900 Replace resize(0) by clear() for std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-12 19:24:54 +01:00
Egor Pugin
9710bc0465 More std::vector. 2021-01-07 13:57:57 +03:00
Egor Pugin
4175679da6 Revert kdpair, genericheap changes. 2020-12-28 02:31:45 +03:00
Egor Pugin
4fc467a922 Inherit GenericVector from std::vector. Inherit kdpairs from std::pair. Rewrite some move ctors to modern C++ style. 2020-12-26 03:23:09 +03:00
Stefan Weil
51dff483e7 Fix runtime error caused by too large TBOX
Runtime error reported by sanitizer:

    src/ccstruct/rect.h:191:44: runtime error: 50961 is outside the range of representable values of type 'short'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/ccstruct/rect.h:191:44 in

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-06-30 20:51:52 +02:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Stefan Weil
5fdd32bea8 Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch
secondary_beam_size_ is set but never used, so remove it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 22:09:03 +02:00
Noah Metzger
c350077b96 Made the lstm_choice mode compatible with the hocr_char_boxes mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07 Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Noah Metzger
3a5e508934 Implemented improved bounding box algorithm
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-16 11:38:50 +02:00
zdenop
56d4fdce00
Merge pull request #2554 from noahmetzger/LSTMChoiceRIL
Improved lstm_choice_mode
2019-07-15 11:51:52 +02:00
Noah Metzger
2dd5d0d60a Fixed a bug when first decode iteration stays empty and added some comments.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-15 10:05:22 +02:00
Noah Metzger
f2d685a90f Added CTC-based Symbolchoices.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Stefan Weil
ee04347347 Fix format string for 64 bit integer (CID 1402986)
Commit c1264c189e was not the right fix.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:20:50 +02:00
Stefan Weil
c1264c189e Fix format string for 64 bit integer
This fixes also a warning from gcc.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-23 09:31:09 +02:00
Stefan Weil
df98bb7368 Move LSTMTrainer from libtesseract to libtesseract_training
LSTMTrainer is only used for training, so the shared library for
Tesseract can be made smaller.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 16:23:51 +02:00
Stefan Weil
4bec4a69a0 Add missing static attributes
This fixes lots of compiler warnings like these ones:

    src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations]
    src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Stefan Weil
20d5eedd45 Modernize code (clang-tidy check modernize-loop-convert)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
Stefan Weil
a0fd90583b Modernize C++ code using auto
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
631882a346 Fix compiler warnings (signed / unsigned mismatch)
clang warnings:

    src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 08:36:07 +01:00
Stefan Weil
ee2f9bf7bf Remove old comments in file headers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Noah Metzger
5b3e2fe812 Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Noah Metzger
754e38d2b4 Added the option to get the timesteps separated by the suggested segmentation
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
Robert Schubert
3912cb1c33 LSTM char_whitelist/blacklist (6ac2ff0): more robust
- unicharset can be null too
2019-03-09 10:40:40 +01:00
Robert Schubert
b45999088c LSTM char_whitelist/blacklist (6ac2ff0): multi-code chars
- move decision from ComputeTopN to ContinueContext, where
  it belongs: block context continuations which emit final
  codes translating to disabled unichar_ids.
  (The normal logic for fallback from top2 > top2 > rest
   will apply.)
- pass UNICHARSET refs appropriately
2019-03-08 12:30:16 +01:00
Robert Schubert
6ac2ff083e trying to add tessedit_char_whitelist etc. again:
- ignore matrix outputs in ComputeTopN if they
  belong to a disabled unichar_id
- pass UNICHARSET refs to check that
- in SetBlackAndWhitelist, also update the unicharset
  of the lstm_recognizer_ instance, if any
2019-03-07 01:37:23 +01:00
Stefan Weil
877e62db55 Fix compiler warning (-Wmaybe-uninitialized)
gcc warning:

    src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]

It's a false positive, but setting the variable to 0 satisfies the compiler.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Noah Metzger
f7f5f41073 Fixed a mac compiler warning in recodebeam.cpp
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-23 16:57:39 +02:00
Noah Metzger
c13371d6e0 Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
8dc9e9fd14 Fix use of wrong UNICHARSET
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 13:21:09 +02:00
Stefan Weil
f24426cd1b Convert CRLF line endings to LF
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-23 18:18:15 +02:00
Noah Metzger
663be426f6 Added the option for character accumulated glyph confidences.
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-20 10:43:58 +02:00
Stefan Weil
6a28cce96b Fix whitespace issues
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
Noah Metzger
d4490af06d Fix issue reported by Coverity Scan
CID: 1375395 (Dereference after null check)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-31 10:43:39 +02:00