Commit Graph

46 Commits

Author SHA1 Message Date
Stefan Weil
d4d51910e1 Add braces to single line statements (clang-tidy -checks='-*,google-readability-braces-around-statements')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-22 09:02:13 +01:00
Stefan Weil
27293fad62 Modernize code (clang-tidy -checks='-*,modernize-use-emplace')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
35e143ddfc Modernize code (clang-tidy -checks='-*,modernize-use-auto')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
02774bda6e Modernize code (clang-tidy -checks='-*,modernize-loop-convert')
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 21:45:55 +01:00
Stefan Weil
0c20d3f843 Fix compiler warnings (mostly -Wsign-compare)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-21 09:29:34 +01:00
Stefan Weil
54aec32586 Replace remaining PointerVector by std::vector for src/lstm
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-19 22:22:04 +01:00
Stefan Weil
2a3682a35e Replace remaining GenericVector by std::vector in src/lstm
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-16 12:25:11 +01:00
Egor Pugin
0eb7ba88bf [clang-format] Execute clang format on include and src dirs.
Script:
find include src -type f | sort > all.txt
find include src -type f | grep -v "\.cpp" | grep -v "\.h" | sort > skip.txt
comm -23 all.txt skip.txt | xargs clang-format -i
2021-03-12 22:35:02 +03:00
Stefan Weil
9b15e65900 Replace resize(0) by clear() for std::vector
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-12 19:24:54 +01:00
Egor Pugin
9710bc0465 More std::vector. 2021-01-07 13:57:57 +03:00
Egor Pugin
4175679da6 Revert kdpair, genericheap changes. 2020-12-28 02:31:45 +03:00
Egor Pugin
4fc467a922 Inherit GenericVector from std::vector. Inherit kdpairs from std::pair. Rewrite some move ctors to modern C++ style. 2020-12-26 03:23:09 +03:00
Stefan Weil
51dff483e7 Fix runtime error caused by too large TBOX
Runtime error reported by sanitizer:

    src/ccstruct/rect.h:191:44: runtime error: 50961 is outside the range of representable values of type 'short'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/ccstruct/rect.h:191:44 in

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-06-30 20:51:52 +02:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Stefan Weil
5fdd32bea8 Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch
secondary_beam_size_ is set but never used, so remove it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 22:09:03 +02:00
Noah Metzger
c350077b96 Made the lstm_choice mode compatible with the hocr_char_boxes mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07 Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Noah Metzger
3a5e508934 Implemented improved bounding box algorithm
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-16 11:38:50 +02:00
zdenop
56d4fdce00
Merge pull request #2554 from noahmetzger/LSTMChoiceRIL
Improved lstm_choice_mode
2019-07-15 11:51:52 +02:00
Noah Metzger
2dd5d0d60a Fixed a bug when first decode iteration stays empty and added some comments.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-15 10:05:22 +02:00
Noah Metzger
f2d685a90f Added CTC-based Symbolchoices.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Stefan Weil
ee04347347 Fix format string for 64 bit integer (CID 1402986)
Commit c1264c189e was not the right fix.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:20:50 +02:00
Stefan Weil
c1264c189e Fix format string for 64 bit integer
This fixes also a warning from gcc.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-23 09:31:09 +02:00
Stefan Weil
df98bb7368 Move LSTMTrainer from libtesseract to libtesseract_training
LSTMTrainer is only used for training, so the shared library for
Tesseract can be made smaller.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 16:23:51 +02:00
Stefan Weil
4bec4a69a0 Add missing static attributes
This fixes lots of compiler warnings like these ones:

    src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations]
    src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Stefan Weil
20d5eedd45 Modernize code (clang-tidy check modernize-loop-convert)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
Stefan Weil
a0fd90583b Modernize C++ code using auto
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
631882a346 Fix compiler warnings (signed / unsigned mismatch)
clang warnings:

    src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 08:36:07 +01:00
Stefan Weil
ee2f9bf7bf Remove old comments in file headers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Noah Metzger
5b3e2fe812 Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Noah Metzger
754e38d2b4 Added the option to get the timesteps separated by the suggested segmentation
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
Robert Schubert
3912cb1c33 LSTM char_whitelist/blacklist (6ac2ff0): more robust
- unicharset can be null too
2019-03-09 10:40:40 +01:00
Robert Schubert
b45999088c LSTM char_whitelist/blacklist (6ac2ff0): multi-code chars
- move decision from ComputeTopN to ContinueContext, where
  it belongs: block context continuations which emit final
  codes translating to disabled unichar_ids.
  (The normal logic for fallback from top2 > top2 > rest
   will apply.)
- pass UNICHARSET refs appropriately
2019-03-08 12:30:16 +01:00
Robert Schubert
6ac2ff083e trying to add tessedit_char_whitelist etc. again:
- ignore matrix outputs in ComputeTopN if they
  belong to a disabled unichar_id
- pass UNICHARSET refs to check that
- in SetBlackAndWhitelist, also update the unicharset
  of the lstm_recognizer_ instance, if any
2019-03-07 01:37:23 +01:00
Stefan Weil
877e62db55 Fix compiler warning (-Wmaybe-uninitialized)
gcc warning:

    src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]

It's a false positive, but setting the variable to 0 satisfies the compiler.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-09 16:32:20 +01:00
Noah Metzger
f7f5f41073 Fixed a mac compiler warning in recodebeam.cpp
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-23 16:57:39 +02:00
Noah Metzger
c13371d6e0 Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode
Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-10-17 16:43:39 +02:00
Stefan Weil
8dc9e9fd14 Fix use of wrong UNICHARSET
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 13:21:09 +02:00
Stefan Weil
f24426cd1b Convert CRLF line endings to LF
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-23 18:18:15 +02:00
Noah Metzger
663be426f6 Added the option for character accumulated glyph confidences.
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-08-20 10:43:58 +02:00
Stefan Weil
6a28cce96b Fix whitespace issues
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-08-01 13:19:52 +02:00
Noah Metzger
d4490af06d Fix issue reported by Coverity Scan
CID: 1375395 (Dereference after null check)

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-31 10:43:39 +02:00
Noah Metzger
91c7504a35 Added a feature to enrich the hOCR output with glyph confidences
By using the parameter -c glyph_confidences=true the user is able to enrich
the hOCR output with additional information. Tesseract then lists additionally
the timesteps with all glyphs that were considered with their confidence
for every timestep of the LSTM.

The format of the hOCR output is slightly changed: There is now a linebreak
after every word for better readability by humans.

Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2018-07-25 18:18:58 +02:00
Alexander Zaitsev
d54d7486b4 Use std::max/std::min instead of MAX/MIN macros. 2018-05-20 17:49:48 +03:00
Egor Pugin
e95ff1159e Move sources into src dir. Update build scripts. 2018-04-25 11:02:54 +03:00