tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-13 07:59:04 +08:00

Author	SHA1	Message	Date
Stefan Weil	fcfdb7e56f	Remove unused include statements Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:48:31 +02:00
Stefan Weil	ba0c55adc5	svutil: Remove SVSync::StartThread and SVSync::ExitThread Both are unused now. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	85068be405	lstmtester: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	43a281893f	scrollview: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	a6d723bf10	Replace SVSync::StartThread by std::thread and use std::this_thread::yield Using yield instead of a sleep makes running imagedata_test much faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	13bb4623b1	Use std::lock_guard to protect a code block This is simpler than using lock() / unlock() explicitly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	93427391c1	Replace SVAutoLock by std::lock_guard Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	c0b8ee3b82	Replace CCUtilMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	36026e3c35	Replace SVMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
zdenop	56d4fdce00	Merge pull request #2554 from noahmetzger/LSTMChoiceRIL Improved lstm_choice_mode	2019-07-15 11:51:52 +02:00
Noah Metzger	2dd5d0d60a	Fixed a bug when first decode iteration stays empty and added some comments. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-15 10:05:22 +02:00
Stefan Weil	61eab60fe3	arch: Reduce number of include files for dot product functions dotproductavx.h and dotproductsse.h declared only two functions. Move those declarations to dotproduct.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	2d5b166876	Add dot product implementation for Intel FMA (double = tessdata_best) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	9259ed8f26	Optimize tprintf implementation It no longer uses a local buffer, so it needs less memory and no mutex. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 20:59:07 +02:00
Stefan Weil	2aebd10fb7	FPRow: Add missing initialisation for scalar (CID 1402754) Modernize the code also a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 17:15:55 +02:00
Stefan Weil	bdc7abf518	Fix format strings for size_t arguments (CID 1402762, 1402767) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:57:19 +02:00
Noah Metzger	11a4cd298b	Added parameters for the LSTM CTC Choice mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Noah Metzger	f2d685a90f	Added CTC-based Symbolchoices. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Stefan Weil	ee04347347	Fix format string for 64 bit integer (CID 1402986) Commit `c1264c189e` was not the right fix. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:20:50 +02:00
Stefan Weil	890b810a9e	tfnetwork: Add missing return statement (CID 1402992) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 08:20:52 +02:00
Egor Pugin	3b6f071ee8	Implement CMake+SW build. Currently only Windows is supported. You could try it as following: mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1	2019-07-08 18:50:30 +03:00
Egor Pugin	84ffcc0d38	Merge pull request #2548 from zhuangzhuang/fix_tesstrain_py_error fix tesstrain.py error	2019-07-08 11:25:41 +03:00
zhuangzhuang1988	18c67f4989	fix tesstrain.py error	2019-07-08 14:35:17 +08:00
zhuangzhuang	9eb997fc0b	fix windows stdout messy code (#2546 ) * fix windows stdout messy code * fix type name error * remoe unnecessary codepoint check.	2019-07-08 09:33:53 +03:00
Stefan Weil	d653bb61f3	genericvector: Remove redundant declarations tesseract::FileReader and tesseract::FileWriter are already declared in serialis.h which is included by genericvector.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-05 18:47:15 +02:00
Dmitry Bely	74145f0686	Fix crash in Tesseract::classify_word_and_language() when tessedit_timing_debug is enabled	2019-07-05 12:36:25 +02:00
zdenop	01535706ec	Merge pull request #2539 from stweil/tesscallback Replace tesscallback.h and related proprietary data types by C++-11 functionals	2019-07-05 10:52:06 +02:00
Stefan Weil	134eb39960	Remove tesscallback.h It is no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3bae459823	Use C++-11 code instead of TessCallback for WERD_RES::ConditionalBlobMerge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	e61c828dcd	Use C++-11 code instead of TessCallback for UNICHARSET::load_via_fgets Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	0ea8ada308	Use C++-11 code instead of TessCallback for WidthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	1c1eb76c36	Use C++-11 code instead of TessCallback for Dawg::iterate_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3fb15b3891	Use C++-11 code instead of TessCallback for ObjectCache::Get Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	56d8210909	Use C++-11 code instead of TessCallback for TruthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	c33b05be55	Use C++-11 code instead of TessCallback for PointerVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	cc0405298b	Use C++-11 code instead of TessCallback for read, write Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	242e1db7fa	Use C++-11 code instead of TessCallback for function set_compare_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ffd8101986	Use C++-11 code instead of TessCallback for function set_clear_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ded24d0367	ccmain: Use C++-11 code instead of TessCallback1 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	eeec9c66d4	training: Use C++-11 code for TestCallback This allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	201ba0dd53	Fix handling of single pages from multipage TIFF files (issue #2537 ) That case now uses Leptonica to deliver the desired image instead of using an inefficient loop in the Tesseract code. See commit `54fafc4e2e` which used similar code in the past. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:56:57 +02:00
Stefan Weil	f1c6564cd7	Revert "fix read wrong tiff page." This reverts commit `75d230a7ac`. That commit introduced new problems (memory leak, potential endless loop) and style issues. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:44:07 +02:00
Stefan Weil	fd001c3ab9	Fix linker error with disabled legacy engine (issue #2532 ) Commit `3871caae86` introduced a build regression when the legacy engine was disabled. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 13:47:38 +02:00
zhuangzhuang1988	75d230a7ac	fix read wrong tiff page.	2019-07-04 12:32:18 +08:00
zhuangzhuang1988	4d4c16bce1	fix start ScrollView.jar failed when lstmtraining	2019-07-03 07:27:50 +02:00
zhuangzhuang1988	99cb088708	close log file handle before move it.	2019-07-01 10:53:12 +08:00
zhuangzhuang1988	a3a361f73d	fix logger file encoding error.	2019-06-28 18:29:52 +08:00
Stefan Weil	5895534b5e	Update enum from unicode/uchar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-25 10:55:33 +02:00
Stefan Weil	c1264c189e	Fix format string for 64 bit integer This fixes also a warning from gcc. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:31:09 +02:00
Stefan Weil	dfd35d3e27	baseapi: Remove old code The workaround is no longer needed because _splitpath and _MAX_FNAME were removed in commit `cc0d87c5b8`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:32 +02:00
Stefan Weil	dd261e8d42	Replace code using _splitpath_s (win32) That simplifies the code and removes a dependency on "newer" versions of Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:15 +02:00
Stefan Weil	f522b039a5	Remove outdated comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:03:19 +02:00
Stefan Weil	ea20bf0373	Remove dummy code from LSTMTrainer::InitTensorFlowNetwork Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:01:40 +02:00
Stefan Weil	41f91c96c8	cmake: Build training tools also on Linux and macOS This enables CI tests for the code in src/training on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 20:27:56 +02:00
Egor Pugin	ab28a03e93	Merge pull request #2514 from stweil/tessresultcallback Move LSTMTrainer from libtesseract to libtesseract_training	2019-06-22 18:34:49 +03:00
Stefan Weil	df98bb7368	Move LSTMTrainer from libtesseract to libtesseract_training LSTMTrainer is only used for training, so the shared library for Tesseract can be made smaller. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 16:23:51 +02:00
Stefan Weil	cb2957b3d2	Replace callback by direct function calls in TessBaseAPI::GetComponentImages The new code avoids dynamic memory allocation, uses faster function calls and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 14:54:31 +02:00
Stefan Weil	3159f42257	Remove unused GenericVector::dot_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:21 +02:00
Stefan Weil	bef73d9956	Remove unused GenericVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:08 +02:00
Egor Pugin	3c6a04ea1a	Merge pull request #2512 from stweil/tessresultcallback Simplify class LSTMTrainer	2019-06-22 13:41:21 +03:00
Stefan Weil	2a9b2fb32a	Remove wrong description for GenericVector::set_compare_callback and simplify code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 11:22:07 +02:00
Stefan Weil	bd13069fe8	Simplify class LSTMTrainer The function pointers and callbacks file_reader_, file_writer_, checkpointer_reader_ and checkpoint_writer_ are always set to the same values. Replacing them by direct function calls simplifies the code and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 09:18:13 +02:00
Stefan Weil	3871caae86	Simplify indirect call of LMPainPoints::GeneratePainPoint It does neither need a temporary TessResultCallback2 nor the function LMPainPoints::GenerateForBlamer. This also allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-21 17:09:33 +02:00
zdenop	60b4c68d31	tesstrain_utils.sh: remove redundant code	2019-06-20 18:42:29 +02:00
Stefan Weil	5f23290655	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-20 08:38:00 +02:00
Stefan Weil	2c78735d97	ocrfeatures: Remove locally used functions from global interface ReadFeature, WriteFeature are only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 15:09:39 +02:00
zdenop	a3593d994b	Merge pull request #2499 from stweil/embedded Remove code for embedded build	2019-06-17 10:24:45 +02:00
Stefan Weil	674d6a90d8	Remove code for embedded build That code is unrelated to Tesseract and can be easily implemented by external projects which require it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 09:55:33 +02:00
zdenop	60aee9f821	create OUTPUT_DIR did not exist; fixes #2497	2019-06-16 15:07:16 +02:00
zdenop	fad96db497	Merge pull request #2494 from Shreeshrii/master Allow saving of box/tiff pairs during legacy tesseract training	2019-06-14 20:44:49 +02:00
Shree	6fa4587949	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:35:39 +00:00
Shree	45cdf741ae	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:32:41 +00:00
Shree	832c6edb97	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:25:54 +00:00
James R. Barlow	a9890afd12	Fix text2image compilation on C++17 compilers C++17 drops support for `std::random_shuffle`, breaking C++17 compilers that run to compile text2image.cpp. std::shuffle is valid on C++11 through C++17, so use std::shuffle instead. Due to the use `std::random_shuffle`, `text2image --render_ngrams` would not give consistent results for different compilers or platforms. With the current change, the same random number generator is used for all platforms and initialized to the same seed, so training output should be consistent.	2019-06-13 16:07:20 -07:00
Stefan Weil	fefd521a49	Add dot product implementation using std::inner_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-31 12:07:17 +02:00
Stefan Weil	e0c2f0a782	Fix crash in PreloadRenderers with nullptr outputbase The crash could be triggered by a wrong command line. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-29 07:46:29 +02:00
Stefan Weil	9a4bd041c8	Fix build for unittests Commit `29f2cff203` was the wrong fix for the compiler warnings because it broke the unittest build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 21:36:34 +02:00
Stefan Weil	2c23e7ead5	scanedg: Add const attributes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	4b3bbd908a	Remove EXTERN macro Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	ac999b2409	Remove unused macros This fixes compiler warnings from clang++ like these ones: src/ccutil/params.cpp:34:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:67:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:68:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:78:9: warning: macro is not used [-Wunused-macros] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	8c8eb21bc5	Fix compiler errors for old gcc Travis CI with gcc 4.8 failed with errors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 15:38:40 +02:00
Stefan Weil	a86143a41d	Remove some unused functions, constants and variables This fixes compiler warnings, for example: src/ccutil/strngs.cpp:36:11: warning: unused variable 'kMaxDoubleSize' [-Wunused-const-variable] src/viewer/svutil.cpp:320:13: warning: unused function 'TessFreeAddrInfo' [-Wunused-function] src/ccstruct/werd.cpp:32:19: warning: unused variable 'CANT_SCALE_EDGESTEPS' [-Wunused-const-variable] src/textord/bbgrid.cpp:103:10: warning: unused variable 'old_tright' [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:52:31 +02:00
Stefan Weil	29f2cff203	training: Add missing static attributes That fixes several warnings from clang++ like the following one: src/training/combine_lang_model.cpp:36:1: warning: no previous extern declaration for non-static variable 'FLAGS_lang_is_rtl' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:33:52 +02:00
Stefan Weil	a139d553a7	training: Move declarations from cpp files to h file That fixes several warnings from clang++ like the following one: src/training/commontraining.cpp:95:1: warning: no previous extern declaration for non-static variable 'FLAGS_D' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	389285010c	featdefs: Add missing include statement It is needed for PicoFeatureLength. This fixes a compiler warning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	4bec4a69a0	Add missing static attributes This fixes lots of compiler warnings like these ones: src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations] src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	7e7811ff92	bits16: Modernize code This also fixes warnings like the following one from clang++: src/ccmain/pgedit.cpp:114:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:08 +02:00
Stefan Weil	334d9b4633	unicodes: Optimize code by using constexpr and removing unused globals Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:51:28 +02:00
Stefan Weil	23d05a5e1b	featdefs: Optimize code by using constexpr This also fixes some warnings from clang++: src/classify/featdefs.cpp:47:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:57:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:66:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:75:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:46:36 +02:00
Stefan Weil	7628112273	Fix broken build for Leptonica < 1.77 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:23:43 +02:00
Stefan Weil	55901a480f	Remove classify/cutoffs.h It only defined CLASS_CUTOFF_ARRAY and some unused definitions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 13:54:44 +02:00
zdenop	82458db630	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-05-25 11:14:28 +02:00
zdenop	539673b503	fix '--enable-visibility' build	2019-05-25 11:13:33 +02:00
zdenop	8de022ab1c	Merge pull request #2461 from stweil/tensorflow Support build with Tensorflow	2019-05-25 10:52:37 +02:00
Stefan Weil	32dcfd06ba	Replace Tensorflow by TensorFlow The name is written in camel case, see https://www.tensorflow.org/. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 17:14:28 +02:00
Stefan Weil	2441e4d8ac	Implement check for Tensorflow header file This looks for one of the header files which are included by Tesseract. It currently uses a hard coded path which works for Debian / Ubuntu. Simplify also the rules for linking Tensorflow. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 16:52:14 +02:00
Stefan Weil	9cdf041448	Remove "third_party/" in comments and update path names Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:12:52 +02:00
Stefan Weil	4382ab1a34	Support build with Tensorflow It expects include files in /usr/include/tensorflow. * Add configure option --with-tensorflow (disabled by default) * Fix data type tensorflow::int64 * Remove "third_party/" in include statements * Add dummy implementations for Backward and DebugWeights in TFNetwork * Add files generated with protoc from tfnetwork.proto (so the Tensorflow sources are not needed for the build) * Update Makefiles Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:11:31 +02:00
Zdenko Podobný	294f548ac1	fix missing tiff format	2019-05-24 10:39:17 +02:00
Stefan Weil	3f74da5da9	lstmtrainer: Set constant kLearningRateDecay at compile time sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2. This also fixes a compiler warning: src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-23 15:01:23 +02:00
zdenop	4bab7dd83d	Merge pull request #2451 from Bharat123rox/lgtm Some LGTM alert fixes and potential bugfixes	2019-05-22 12:19:44 +02:00
Egor Pugin	fea1f3970b	Merge pull request #2452 from stweil/tprintf tprintf: Make code reentrant and use less memory	2019-05-22 12:31:34 +03:00
Egor Pugin	8f99880a7a	Merge pull request #2453 from stweil/crashcode Remove SavePixForCrash and related code	2019-05-22 12:30:29 +03:00
Bharat123rox	bc3ea622a6	Fix bug in max_max_dist	2019-05-22 08:21:30 +02:00
Bharat123rox	0bf45e81e7	Fix LGTM and revert bugfix for later PR	2019-05-22 11:23:27 +05:30
Bharat123rox	945ccac85a	Fix syntax error	2019-05-22 10:10:12 +05:30
Stefan Weil	6514479e8f	Remove SavePixForCrash and related code That debugging code uses very much memory and is no longer useful. text data bss dec hex filename 815 0 262144 262959 4032f src/ccutil/globaloc.o Remove also the function err_exit which was only used in ccmain/reject.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:25:58 +02:00
Stefan Weil	078a129674	tprintf: Make code reentrant and use less memory Reduce the maximum message size from 64 KiB to 2 KiB which still should be large enought for trace messages. Create the smaller message on the stack instead of using a global array to allow reentrancy and to reduce the memory use of Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:22:58 +02:00
Bharat123rox	7f31a0634d	Some LGTM fixes and potential bugfixes	2019-05-21 23:24:50 +05:30
Stefan Weil	d2ca81e794	Remove local definition of M_PI It is defined for all platforms when math.h or cmath is included after defining the macro _USE_MATH_DEFINES. Define _USE_MATH_DEFINES before any include statement to make sure that M_PI gets defined. It is not necessary to define it conditionally only for Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 21:18:52 +02:00
Stefan Weil	64bdceee69	Fix compiler warnings This fixes lots of warnings related to ERRCODE like the following one: src/ccutil/errcode.h:81:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-19 22:10:22 +02:00
Stefan Weil	09edd1a604	Fix out-of-bounds writes in Classify::ReadNewCutoffs The function did not correctly read Chinese unichars into the local Class variable if the locale was set to de_DE.UTF-8 (or other incompatible locales). That resulted in a wrong ClassId which was used to write into the Cutoffs array without checking for valid bounds. On macOS the result was a runtime error in baseapi_test (see GitHub issue #1250): [ RUN ] TesseractTest.InitConfigOnlyTest baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug Replacing sscanf by std::istringstream fixes that. Add also an assertion to catch future out-of-bounds writes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:39:55 +02:00
zdenop	7e9d2f4bc4	Merge pull request #2432 from nickjwhite/hocrmoretypes Add different classes to hocr output depending on BlockType	2019-05-16 17:02:48 +02:00
Stefan Weil	331cc84d8d	Remove assertions for unsupported locale settings The latest code passed all unittests with locale de_DE.UTF-8 and has fixed the locale issues which were reported on GitHub. Therefore the assertions can be removed. Any remaining locale issue will be fixed when it is identified. To help finding such remaining isses, debug code now uses the user's locale settings instead of the default "C" locale for all executables which use TessBaseAPI. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 13:59:39 +02:00
Stefan Weil	77f9bad3c2	Fix UNICHARSET::save_to_string for locale de_DE.UTF-8 That function writes float values which must always use '.' as the decimal separator, no matter what the current locale setting is. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:39:59 +02:00
Stefan Weil	36ed6da349	Fix baseapi_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/baseapi_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 12 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10 tests from TesseractTest [ RUN ] TesseractTest.ArraySizeTest [ OK ] TesseractTest.ArraySizeTest (0 ms) [ RUN ] TesseractTest.BasicTesseractTest [ OK ] TesseractTest.BasicTesseractTest (1251 ms) [ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected [ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms) [ RUN ] TesseractTest.HOCRWorksWithoutSetInputName [ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms) [ RUN ] TesseractTest.HOCRContainsBaseline [ OK ] TesseractTest.HOCRContainsBaseline (389 ms) [ RUN ] TesseractTest.RickSnyderNotFuckSnyder [ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms) [ RUN ] TesseractTest.AdaptToWordStrTest Trying to adapt "136 " to "1 3 6" Trying to adapt "256 " to "2 5 6" Trying to adapt "410 " to "4 1 0" Trying to adapt "432 " to "4 3 2" Trying to adapt "540 " to "5 4 0" Trying to adapt "692 " to "6 9 2" Trying to adapt "779 " to "7 7 9" Trying to adapt "793 " to "7 9 3" Trying to adapt "808 " to "8 0 8" Trying to adapt "815 " to "8 1 5" Trying to adapt "12 " to "1 2" Trying to adapt "12 " to "1 2" [ OK ] TesseractTest.AdaptToWordStrTest (788 ms) [ RUN ] TesseractTest.BasicLSTMTest [ OK ] TesseractTest.BasicLSTMTest (4525 ms) [ RUN ] TesseractTest.LSTMGeometryTest [ OK ] TesseractTest.LSTMGeometryTest (615 ms) [ RUN ] TesseractTest.InitConfigOnlyTest Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.232621 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.231864 in normproto file is not in unichar set. [...] Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.233915 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.221755 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar ? in normproto file is not in unichar set. baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug [INFO] Lang eng took 327ms in regular init [INFO] Lang chi_tra took 1422ms in regular init Abort trap: 6 TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream instead of sscanf. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:05:09 +02:00
Stefan Weil	0dcc889e8d	Fix apiexample_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/apiexample_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from EuroText [ RUN ] EuroText.FastLatinOCR contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-15 22:43:47 +02:00
Stefan Weil	6b1e709b19	Fix Doxygen comments for void functions Void functions should not use @return. It causes compiler warnings like this one: src/classify/intproto.cpp:326:5: warning: '@return' command used in a comment that is attached to a function returning void [-Wdocumentation] Some non-void functions also were documented with @return none. Fix those comments, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 21:57:17 +02:00
Stefan Weil	caa04882fd	normmatch: Remove unused private function PrintNormMatch was unused. Remove it and remove also an unused prototype. Make the only remaining private function NormEvidenceOf static. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 20:56:04 +02:00
Nick White	068eb4c35d	Add different classes to hocr output depending on BlockType These classes are taken from the hOCR specification, and seem to map well onto the BlockType types. There are probably more that could be added.	2019-05-14 13:25:08 +01:00
Stefan Weil	5d92fbf010	Replace sscanf by std::istringstream Using std::istringstream allows conversion of string to float independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 15:04:30 +02:00
Stefan Weil	c76ceafcdf	Fix reading of parameter from traineddata normproto component The NonEssential parameter was wrongly derived from linear_token instead of essential_token and therefore always set to true. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 14:43:58 +02:00
Stefan Weil	c07bc4e014	Fix Doxygen comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:55:23 +02:00
Stefan Weil	c8e96e2c02	Fix cast from pointer to integer type Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:54:46 +02:00
zdenop	7a5b9b8fcd	ScrollView: remove custom implementation of GetAddrInfo	2019-05-04 15:16:41 +02:00
zdenop	5e01f74648	remove unused include	2019-05-04 15:14:54 +02:00
Stefan Weil	aba037329a	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-04 11:05:50 +02:00
Stefan Weil	57ff92e4bf	tesscallback: Remove unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 22:14:04 +02:00
zdenop	9192c3afe2	correct tessdata comment in baseapi.h	2019-05-02 08:43:04 +02:00
zdenop	7e48368a5e	Merge pull request #2421 from stweil/includes universalambigs: Add missing include file	2019-05-02 08:36:49 +02:00
zdenop	39d3824c78	Merge pull request #2420 from stweil/locale Fix more locale dependencies	2019-05-02 08:31:41 +02:00
Stefan Weil	cd749be473	universalambigs: Add missing include file This allows fixing two compiler warnings from clang++: src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations] src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:36:31 +02:00
Stefan Weil	4fbc0a257b	commandlineflags: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	d047fa1d1b	paramsd: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	e3860e45b7	clusttool: Replace strtof by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	ed45656ec8	clusttool: Remove unused code and some global functions * WriteProtoList is unused. Remove it. * ReadNFloats, WriteNFloats and WriteProtoStyle are only used locally, so make them local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	28a521fec2	Fix some typos (most found and fixed by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-01 20:30:41 +02:00
zdenop	41f50b19bb	fix crash in case of missing PNG support in Leptonica see #2333	2019-05-01 19:51:54 +02:00
zdenop	90aef80dd7	fix documentation about datapath: ending "/" is not relevant	2019-05-01 11:37:50 +02:00
Jeff Breidenbach	546a9e81eb	fix #1900 : intraword spacing for slightly better pdf copy-paste performance	2019-04-29 11:28:30 +02:00
zdenop	137e6de56f	Print info when uzn file is used.	2019-04-28 19:06:38 +02:00
Zdenko Podobný	80e54e401d	fix spelling	2019-04-24 15:35:22 +02:00
Zdenko Podobný	832c257771	remove unused variable	2019-04-24 14:55:35 +02:00
Stefan Weil	b7bc71e987	Fix build for Windows * winsock2.h is case sensitive, lower case is required for cross build. * ws2tcpip.h is required for addrinfo. * FreeAddrInfo conflicts with existing freeaddrinfo, so rename it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-24 11:24:47 +02:00
zdenop	129fe95390	svutil.cpp: fix windows build	2019-04-23 23:03:28 +02:00
zdenop	7bacc8852b	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-04-23 22:01:30 +02:00
zdenop	5c6ac61fe2	remove unused includes	2019-04-23 20:59:36 +02:00
zdenop	27f0f2ecea	MSVS support inttypes.h from VS 2015	2019-04-23 20:45:14 +02:00
Stefan Weil	708511adcb	Only include windows.h using host.h host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be set before including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	53f1265362	Clean macros in platform.h * Remove unused macros ultoa, SIGNED. * Move macros NOMINMAX and WIN32_LEAN_AND_MEAN to host.h because they are used when including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	3bd61bfae4	svutil: Clean include file * Remove MIN, MAX macros. They are unused. * Include windows.h indirectly by including host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	e12b99d49b	Remove host.h from Tesseract API It is not needed by other API header files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	8a34da027f	Fix typo in description Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:50:37 +02:00
Shree	f8fba6362b	fix the coordinates for EOL tab	2019-04-22 09:54:20 +00:00
zdenop	3ec7c22a87	fix missing EOL	2019-04-22 08:49:55 +02:00
Stefan Weil	09255ebe44	Don't include windows.h from platform.h This partially reverts commit `c150b9832d`. Now params.cpp includes host.h which also gets the definition for MAX_PATH. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-21 22:20:13 +02:00
zdenop	6781d78211	Merge pull request #2399 from stweil/pgedit pgedit: Remove unused global functions	2019-04-20 19:26:02 +02:00
Stefan Weil	4ac1fad18a	pdfrenderer: Replace snprintf by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Some snprintf statements are not needed at all because a constant string can be appended directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	07d5365a1f	baseapi: Use std::stringstream to format float values Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	743fc2562d	Remove unneeded include statements for pgedit.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	26dd0b82bf	pgedit: Remove unused global functions pgeditor_show_point is unused, so remove it completely. Some more functions are only used locally, so make them static functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	217c2530e6	Remove strtofloat Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	7c3f9000cd	Replace sscanf by std::stringstream Using std::stringstream allows working with the C locale, independent of the current locale settings. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	5529a5db11	unittest: Fix and enable params_model_test This needs the latest test submodule. The test uses LoadFromFile which is not used otherwise, so remove that function from class ParamsModel. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-18 17:06:48 +02:00
Stefan Weil	a1ffcd3654	Use std::stringstream for add_str_double Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:16:16 +02:00
Stefan Weil	aa64a63f69	Use std::stringstream to generate PDF output Using std::stringstream simplifies the code and allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:15:39 +02:00
Stefan Weil	78a957b989	Remove spaces a line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:54:42 +02:00
Stefan Weil	12ca2513d4	Revert "e" flag for fopen clang-tidy added it in commit `ac0b191f6b`. The "e" flag is an extension for glibc which sets the O_CLOEXEC flag, so the file handle is not leaked to child processes. It is not needed here. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:53:57 +02:00
Samuel Lee	e32b3360aa	Fix for MSVC LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.	2019-04-11 02:33:51 +09:00
Stefan Weil	f88a7f28e3	fontinfo: Fix wrong delete Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:16:04 +02:00
Stefan Weil	3dfe1b8807	classify: Modernize function UniformDensity This should fix an issue reported by Codacy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:13:45 +02:00
Stefan Weil	72c874140e	Modernize code by replacing C type casts This was done using clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 09:04:51 +02:00
zdenop	95a15a7a82	fix cmake&clang build	2019-04-06 15:31:53 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Robert Schubert	25a42ea42f	fixed failure report for tesstrain commands: - with `set -e` in effect, looking at stdout to detect failure is too late	2019-04-06 08:13:03 +02:00
Robert Schubert	d5584e793e	fixed failure report for tesstrain commands: - with `set -e` in effect, it does not make sense to query `$?` indirectly	2019-04-06 08:13:03 +02:00
zdenop	be617b3722	Merge pull request #2361 from Shreeshrii/truth Change message display for debug_level -1 during lstmtraining	2019-04-05 10:52:21 +02:00
zdenop	2982cb4ff3	Merge pull request #2368 from amitdo/no-legacy-fix disable-legacy build: Do not include unused headers	2019-04-05 09:35:04 +02:00
Stefan Weil	d35a6f2de5	Modernize code (clang-tidy check modernize-deprecated-headers) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
amitdo	fab9a54981	Remove unneeded 'SUBDIRS=' from 3 Makefile.am files	2019-04-04 19:31:39 +02:00
Shree	6673347986	Change page to line in message	2019-04-04 15:43:29 +00:00
Shree	51c3535310	Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1	2019-04-04 15:33:22 +00:00
Shree	84d4cc2e95	Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1	2019-04-04 15:33:22 +00:00
Amit D	2069c057d6	Merge branch 'master' into no-legacy-fix	2019-04-04 18:26:22 +03:00
Egor Pugin	2a1d238bd5	Merge pull request #2366 from stweil/modernize Modernize code with "using"	2019-04-04 15:13:10 +03:00
amitdo	546014aecd	disable-legacy build: Do not include unused headers	2019-04-04 15:09:08 +03:00
Stefan Weil	98346c2cd4	Modernize and format code The code was modernized using clang-tidy with "modernize-use-using". The modified files were then formatted using clang-tidy with "google-readability-braces-around-statements", then clang-format was applied. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-03 21:02:23 +02:00
Shreeshrii	613c2bf6e4	Change pages to lines in message The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.	2019-04-03 10:41:14 +05:30
Egor Pugin	af7cc1ce4c	Fix windows build.	2019-04-01 22:38:01 +03:00
Stefan Weil	81fbd878dd	Add more missing include statements for Windows build Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-01 08:10:25 +02:00
Stefan Weil	ab009fae94	Remove macro WINDLLNAME It is now no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:05:41 +02:00
Stefan Weil	77a5f2623e	Remove unused config variable tessedit_module_name It was only defined for Windows builds. Use also false instead of 0 to set the default value of two boolean config variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:04:00 +02:00
Stefan Weil	c150b9832d	Add missing include statements for Windows build The last commits which removed BOOL8 had broken the Windows build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 19:02:29 +02:00
Stefan Weil	802f42e821	Remove BOOL8, TRUE, FALSE from host.h Remove unneeded include statements for host.h, add required ones and update the comments for the remaining include statements. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:27:20 +02:00
Stefan Weil	be96b7b660	bits16: Format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:26:50 +02:00
Stefan Weil	146079f31d	api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:15:53 +02:00
Stefan Weil	4e0c726d6c	ccutil: replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:47 +02:00
Stefan Weil	da0c14ae45	cutil: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:19 +02:00
Stefan Weil	87a973652c	classify: Replace BOOL8, TRUE, FALSE by bool, true, false Simplify also some related code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:48 +02:00
Stefan Weil	30ee3afc29	textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:20 +02:00
Stefan Weil	b391ab84d0	wordrec: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:54:21 +02:00
Stefan Weil	cbb5e729a1	classify: Use bool and replace TRUE, FALSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:50 +02:00
Stefan Weil	46fa59aadc	ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:06 +02:00
Stefan Weil	92b9f9f8de	ccmain: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:52:09 +02:00
Stefan Weil	7db25e15c0	Remove unused config variable tessedit_single_match Replace also TRUE, FALSE by true, false. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:38:35 +02:00
Stefan Weil	ca2947a2c0	blobclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:36:46 +02:00
Stefan Weil	f2bd98e656	PageIterator: Remove useless const Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:35:43 +02:00
Stefan Weil	813b7803e0	pgedit: Replace BOOL8 by bool Replace also TRUE, FALSE by true, false and add some static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:29:15 +02:00
Stefan Weil	664811a869	Replace BOOL8, TRUE, FALSE by bool, true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:28:28 +02:00
Stefan Weil	51a2c2eae8	Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:24:02 +02:00
Stefan Weil	95ea778745	capi: Replace FALSE, TRUE and simplify and format code Format code using clang-format and clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:19:04 +02:00
Stefan Weil	89ba48b106	strngs: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:13:38 +02:00
Stefan Weil	127d0e31f0	serialis: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:12:11 +02:00
Stefan Weil	8b663e7620	helpers: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:06:19 +02:00
zdenop	3bb8f9cd49	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-03-31 16:54:15 +02:00
zdenop	5f06402755	python: optimize imports, reformat code	2019-03-31 16:53:39 +02:00
zdenop	2e9fd69c9e	use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"	2019-03-31 16:53:33 +02:00
zdenop	a0527b41bd	fix LGTM reports for python	2019-03-31 16:53:25 +02:00
Stefan Weil	1948f0d520	ocrclass: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:39:44 +02:00
Stefan Weil	85957e9673	WERD: Don't print space character after "FALSE" at end of line Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:32:42 +02:00
Stefan Weil	83d4433d3b	Modernize and format unichar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:30:15 +02:00
Stefan Weil	ac0b191f6b	Modernize and format genericvector.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:21:32 +02:00
Stefan Weil	36ed08636b	Modernize and format tesscallback.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:16:00 +02:00
zdenop	f47c7c92dd	fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer; CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142	2019-03-31 12:26:49 +02:00
Shreeshrii	ea36e94e58	fix Could not parse bool from flag (#2359 )	2019-03-29 14:50:21 +01:00
Stefan Weil	852598eecf	Remove file tessedit.h It only declared the unused global variable global_monitor which is now removed, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	6e59abcce2	Remove file cutil.h It only contained three type definitions which fit better in other include files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	b6bfb20f1d	Improve readability of conditional code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	36a1a30c22	Remove some old type casts Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	a44bf41f14	Modernize C++ loops The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-loop-convert' -fix Then the resulting code was cleaned manually. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 08:38:21 +01:00
Stefan Weil	ed011670c8	Modernize C++ code using bool literals The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-bool-literals' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:58:02 +01:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	36f768853a	Modernize C++ code using override The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-override' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:37:52 +01:00
Stefan Weil	f877640bc9	Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval tesstrain: check failure of subjobs	2019-03-25 16:10:09 +01:00
Stefan Weil	d8d2f6f48a	Fix broken shell scripts for training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 15:32:43 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ecaad2aca8	ccstruct/werd: Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 07:57:34 +01:00
Stefan Weil	b1e305f38c	Simplify code which tests for non-empty StringParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:35:52 +01:00
Stefan Weil	f9860cda41	Optimize functions ResetFrom The loop can terminate as soon as the parameter name was found. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:21:23 +01:00
Stefan Weil	41da5afe9d	UNICHARSET: Fix compiler warning (signed/unsigned mismatch) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:18:21 +01:00
Stefan Weil	91e2b253c0	Format modified code with clang-format Format the files which were changed in commit `297d7d86ce`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:10:29 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	58423d2f6c	Merge pull request #2328 from bertsky/lstm-with-user-patterns2 Add user words / patterns again	2019-03-24 19:38:40 +01:00
zdenop	0d36d9a9d7	Merge pull request #2341 from Shreeshrii/fix Fix	2019-03-24 18:21:09 +01:00
Stefan Weil	da6305b632	Fix compiler warnings caused by ASSERT_HOST The modified definition avoids warnings caused by redundant semicolons. Now a semicolon is required when using the macro, so a few code locations had to be updated. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:47:04 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	f4f34a87db	WERD_RES: Fix uninitialized member variable Credit to OSS-Fuzz which reported this issue: pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool' #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7 #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3 #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 14:59:08 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Shreeshrii	8749f3553e	LINEDATA=false	2019-03-23 19:16:49 +05:30
Shree	bcb7cf9846	sort arguments, use true/false instead of 1/0	2019-03-23 12:28:53 +00:00
Shree	c2db272134	Modify distort_image for Boolean	2019-03-22 17:02:46 +00:00
Shree	259d5af6b1	Add PSM values to the definition	2019-03-22 15:29:02 +00:00
Shree	8eafec0d17	Fix comments with current values of PSM codes	2019-03-22 14:10:49 +00:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Shree	9b915d5efb	add --distort_image	2019-03-22 05:39:38 +00:00
Shree	f7ffde99d5	add --distort_image	2019-03-22 05:34:00 +00:00
zdenop	ac7ea4322a	Merge pull request #2335 from Shreeshrii/master Changes to tesstrain.py - max_workers=8, distort_image=false	2019-03-17 15:27:34 +01:00
zdenop	26877ba703	check min. python version; os.uname is not available on windows	2019-03-17 15:25:48 +01:00
Shreeshrii	f8e8521606	Update tesstrain_utils.py	2019-03-17 15:32:35 +05:30
Shree	6fa8e1bb15	Set max_workers=8	2019-03-17 09:58:11 +00:00
Shree	e21499e81e	Set default value for distort_image	2019-03-17 09:54:16 +00:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Shree	d47b0d588a	Use LATIN_FONTS for kmr	2019-03-15 15:47:56 +00:00
Shree	3eee1d217a	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 15:37:49 +00:00
Robert Schubert	297d7d86ce	trying to add user words/patterns again: - pass in ParamsVectors from Tesseract (carrying values from langdata/config/api) into LSTMRecognizer::Load and LoadDictionary - after LSTMRecognizer's Dict is initialised (with default values), reset the variables user_{words,patterns}_{suffix,file} from the corresponding entries in the passed vector	2019-03-15 16:06:19 +01:00
Shree	b2ebf0195f	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 14:39:39 +00:00
Shree	37befdf6c4	Add option for --distort_image	2019-03-15 13:32:36 +00:00
zdenop	0a36b38169	Merge pull request #2317 from eighttails/master Added missing linker flags for MinGW.	2019-03-15 08:01:21 +01:00
Robert Schubert	14346e56b0	tesstrain: catch+handle SIGINT (to stop waiting on subjobs)	2019-03-15 00:03:16 +01:00
Robert Schubert	6cbad17e30	tesstrain: check all subjobs' retval	2019-03-14 14:38:51 +01:00
Robert Schubert	5316bcbb94	tesstrain: check failure of subjobs	2019-03-14 11:42:01 +01:00
Stefan Weil	4c2bbebecc	Fix compiler warning (-Wunused-value) Warning from clang++: ..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:56:03 +01:00
Stefan Weil	ed84ba0a44	Fix wrong comparison symbol_steps is a vector, so testing for a nullptr was wrong. clang++ reports: ..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare] if (&word_res_->symbol_steps == nullptr \|\| !LSTM_mode_) return nullptr; ~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:38:38 +01:00
Tadahito Yao	bbbd262a8d	Added missing linker flags for MinGW.	2019-03-13 22:10:36 +09:00
jm server2	1206362d30	`accumulated_timesteps` is not a pointer but a vector and in case we use ChoiceIterator without `lstm_choice_mode` tesseract crashes (or similar) because the check is true and we reference not existing item	2019-03-13 12:55:14 +01:00
Stefan Weil	3baf0d8076	Fix boolean assignments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 15:34:24 +01:00
Stefan Weil	8ad0489f0f	Remove svpaint.cpp from libtesseract svpaint is a standalone application (it includes a main function) and should not be part of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 12:22:53 +01:00
zdenop	7546a01020	Merge pull request #2310 from noahmetzger/LSTMChoiceRIL Lstm choice ril	2019-03-12 10:46:11 +01:00
Stefan Weil	35a999f91a	Fix assertion caused by wrong unicharset Credit to OSS-Fuzz: it found another case which triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 This is the OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:31:21 +01:00
Stefan Weil	56a39bda77	Fix float division by zero That runtime error is normally not visible because it does not abort the program, but is detected when the code was compiled with sanitizers. It can be triggered with this OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:28:16 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00

... 4 5 6 7 8 ...

1215 Commits