tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2025-01-21 00:20:45 +08:00

Author	SHA1	Message	Date
Noah Metzger	86b90200fb	Add some of the lstm_choice_mode functionality to restore compatibility with the 4.0 Version Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-04-30 15:33:52 +02:00
Noah Metzger	fa948d640a	Removed lstm_choice_mode for backwards compatibility in 4.1 Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-04-29 22:33:27 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
Stefan Weil	877e62db55	Fix compiler warning (-Wmaybe-uninitialized) gcc warning: src/lstm/recodebeam.cpp:270:41: warning: ‘current_char’ may be used uninitialized in this function [-Wmaybe-uninitialized] It's a false positive, but setting the variable to 0 satisfies the compiler. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-09 16:32:20 +01:00
Noah Metzger	f7f5f41073	Fixed a mac compiler warning in recodebeam.cpp Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-10-23 16:57:39 +02:00
Noah Metzger	c13371d6e0	Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences to lstm_choice_mode Renamed the global attribute glyph_confidences to lstm_choice_mode and the method GetGlyphConfidences() to GetChoices(). All Variables and comments contained in related methods were renamed as well. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-10-17 16:43:39 +02:00
Stefan Weil	8dc9e9fd14	Fix use of wrong UNICHARSET Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 13:21:09 +02:00
Stefan Weil	f24426cd1b	Convert CRLF line endings to LF Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-08-23 18:18:15 +02:00
Noah Metzger	663be426f6	Added the option for character accumulated glyph confidences. The parameter glyph_confidences is changed from bool to int. An execution with value 1 outputs the hOCR file enriched with glyph confidences for every timestep like before. An execution with value 2 outputs the timesteps accumulated over the recognized characters. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-08-20 10:43:58 +02:00
Stefan Weil	6a28cce96b	Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-08-01 13:19:52 +02:00
Noah Metzger	d4490af06d	Fix issue reported by Coverity Scan CID: 1375395 (Dereference after null check) Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-07-31 10:43:39 +02:00
Noah Metzger	91c7504a35	Added a feature to enrich the hOCR output with glyph confidences By using the parameter -c glyph_confidences=true the user is able to enrich the hOCR output with additional information. Tesseract then lists additionally the timesteps with all glyphs that were considered with their confidence for every timestep of the LSTM. The format of the hOCR output is slightly changed: There is now a linebreak after every word for better readability by humans. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2018-07-25 18:18:58 +02:00
Alexander Zaitsev	d54d7486b4	Use std::max/std::min instead of MAX/MIN macros.	2018-05-20 17:49:48 +03:00
Egor Pugin	e95ff1159e	Move sources into src dir. Update build scripts.	2018-04-25 11:02:54 +03:00

23 Commits