tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-14 00:31:47 +08:00

Author	SHA1	Message	Date
Stefan Weil	286d8275c7	Add support for image or image list by URL This allows OCR of images from the internet without downloading them first: tesseract http://IMAGE_URL OUTPUT ... It uses libcurl. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-01 12:10:45 +02:00
Stefan Weil	47d70d7014	Modernize code for LIST (fix some -Wold-style-cast warnings) - Use C++ type casts - Remove unneeded type cast - Simplify code for function pop - Remove macro push_on (it was only used once) This fixes lots of compiler warnings caused by old type casts.	2019-10-01 11:12:00 +02:00
Stefan Weil	672d67859f	mfoutline: Modernize code - Use C++ enums - Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT - Use float constant for MF_SCALE_FACTOR - Replace macros by inline functions - Fix documentation comment This fixes several warnings from clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 21:33:15 +02:00
Stefan Weil	7ec5f0ca02	intmatcher: Avoid conversion from double to float and vice versa This fixes some clang warnings: src/classify/intmatcher.cpp:48:49: warning: implicit conversion loses floating-point precision: 'double' to 'const float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:34: warning: implicit conversion loses floating-point precision: 'double' to 'float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:64: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 18:05:26 +02:00
Stefan Weil	6d259ebe44	Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare) This fixes a clang warning: src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of unsigned enum expression >= 0 is always true [-Wtautological-unsigned-enum-zero-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-29 22:13:27 +02:00
Stefan Weil	49e351508c	Re-add strngs.h to public API It is still needed. This partially reverts commit `a730b5c4ff`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 10:34:48 +02:00
Stefan Weil	8ad86d6494	Add missing linker flags for TensorFlow Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 09:42:37 +02:00
zdenop	d6aa866430	ignore #pragma optimize for clang-cl	2019-09-27 21:19:37 +02:00
Stefan Weil	74d5ce82a6	Remove vecfuncs.cpp and vecfunc.h Replace the macros which were declared in vecfuncs.h by member functions and move a function which was only used in chop.cpp to that file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 21:20:03 +02:00
Stefan Weil	7bddad59d1	Optimize class ChoiceIterator Re-order a class variable to avoid memory holes and remove unused class variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 09:43:57 +02:00
Noah Metzger	ff4c1d204d	Fixed minor bug with the Choice iterator when lstm_choice_mode is not active. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-24 15:38:28 +02:00
Stefan Weil	994ec697d8	Remove member functions STRING::string and StringParam::string They were redundant because there exist member functions 'c_str' which do the same. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-23 08:33:08 +02:00
Egor Pugin	1fa7324cf7	Merge pull request #2668 from stweil/api Remove STRING from the public Tesseract API	2019-09-23 01:02:26 +03:00
amitdo	0598879a00	Disable legacy build: Disable bitvec.h	2019-09-22 20:37:13 +02:00
Stefan Weil	a730b5c4ff	Remove STRING from the public Tesseract API Removing STRING from genericvector.h allows eliminating the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
Stefan Weil	8cb677d6a2	Replace STRING arguments for LoadDataFromFile and SaveDataToFile This is a step to eliminate the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
amitdo	1e13d1d4d5	Disable legacy build: Disable more unneeded code	2019-09-22 20:55:24 +03:00
zdenop	39a63c2837	Merge pull request #2663 from bertsky/fix-lstm-user-patterns fix langdata (user words/patterns) file suffixes for LSTMs:	2019-09-20 15:32:54 +02:00
Stefan Weil	0c7cc5a4dd	Fix CID 1405673 part 2 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-19 19:37:05 +02:00
Robert Schubert	5b976bfb55	fix langdata (user words/patterns) file suffixes for LSTMs: - add another constructor for LSTMRecognizer which takes the language_data_path_prefix configured/selected at runtime and passes it to the internal CCUtil - use this in Tesseract::init_tesseract_lang_data when LSTMs are available (this was missing from `297d7d86ce`)	2019-09-19 19:30:54 +02:00
amitdo	479a7b1ca0	Disabled legacy build: Disable more unneeded code	2019-09-19 19:00:13 +03:00
Stefan Weil	3b030b4aeb	Fix CID 1405673 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 22:04:08 +02:00
Stefan Weil	85e8529a2e	Fix CID 1164624 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 21:59:42 +02:00
Stefan Weil	b2999d8190	Fix comment for Textord::make_prop_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 15:03:45 +02:00
Stefan Weil	256701e2e0	Re-order initialisation in constructor of class ViterbiStateEntry This fixes compiler warnings caused by commit `091ce345f6`: src/wordrec/lm_state.h💯7: warning: field 'cost' will be initialized after field 'curr_b' [-Wreorder] src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags' will be initialized after field 'dawg_info' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	081521fb9f	Move initial values for class ColPartition from constructor to header file This fixes compiler warnings caused by commit `5b4565b80b`: src/textord/colpartition.cpp:91:24: warning: field 'last_column_' will be initialized after field 'column_set_' [-Wreorder] src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_' will be initialized after field 'nearest_neighbor_above_' [-Wreorder] src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_' will be initialized after field 'owns_blobs_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	8f66020821	Re-order initialisation in constructors of classes Dawg and DawgPosition This fixes compiler warnings caused by commit `ecf0f2dee5`: src/dict/dawg.h:202:9: warning: field 'type_' will be initialized after field 'lang_' [-Wreorder] src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized after field 'dawg_ref' [-Wreorder] src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized after field 'punc_ref' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	b466cead8e	Add more initial values for class Classify from constructor to header file This fixes compiler warnings caused by commit `751fcd2b11`: src/classify/classify.cpp:176:7: warning: field 'EnableLearning' will be initialized after field 'il1_adaption_test' [-Wreorder] src/classify/classify.cpp:187:7: warning: field 'dict_' will be initialized after field 'static_classifier_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	91b3248af3	Fix CID 1164666 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 22:01:25 +02:00
Stefan Weil	fc6899d898	Fix CID 1164664 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:52:51 +02:00
Stefan Weil	930e11996c	Fix CID 1375402 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:17:12 +02:00
Stefan Weil	408d6e8b72	simd: Check OSXSAVE bit before calling _xgetbv Both checks are needed for AVX, AVX2 and FMA checks. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:35:37 +02:00
Stefan Weil	627faa6f9c	Remove UnicharAmbigs for builds without legacy code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:11:30 +02:00
amitdo	2134cd7867	Disabled legacy engine build: Disable code related to ambigs.	2019-09-15 19:11:30 +02:00
Stefan Weil	0c960c3cc5	Fix 1164647 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 14:25:48 +02:00
amitdo	994596842e	'Disabled leagcy engine' build: don't include unused header	2019-09-15 12:35:36 +03:00
Egor Pugin	6a9584fbc2	Merge pull request #2650 from stweil/cid Fix several issues reported by Coverity Scan	2019-09-14 21:18:37 +03:00
Stefan Weil	763f4781e8	Fix CID 1164662 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:22:56 +02:00
Stefan Weil	6fd58d2897	Fix CID 1164659 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:20:14 +02:00
Stefan Weil	c3500e8d95	Fix CID 1164657 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:11:02 +02:00
Stefan Weil	1d3ee3b2a7	Fix CID 1164649 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:37:00 +02:00
Stefan Weil	bd1083904d	Fix CID 1164648 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:32:29 +02:00
Stefan Weil	80f367c6f4	Fix CID 1164644 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:26:49 +02:00
Stefan Weil	7caded8e6b	Fix CID 1164643 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:24:26 +02:00
Stefan Weil	3127242bcd	Fix CID 1164638 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:18:15 +02:00
Stefan Weil	06de3075e0	Fix CID 1164636 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:13:06 +02:00
Stefan Weil	052f9ca0bc	Fix CID 1164634, CID 1164635 (Uninitialized pointer field) Remove the unused dummy member variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:12:39 +02:00
Stefan Weil	97dda3d535	Fix CID 1386099 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	46f21a4182	Fix CID 1164633 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9ea579bf1b	Fix CID 1164628 ff (Uninitialized pointer field) and optimize class ParamContent Only one of bIt, dIt, iIt and sIt is used, so put all four in a union. This fixes CID 1164628, CID 1164629, CID 1164630 and CID 1164631. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	74b552fc31	Remove unused FeatureEnabled from FEATURE_DEFS_STRUCT Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9f709404f9	Fix CID 1164622 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	5b1f0dbd4b	Fix CID 1164620 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	951f442303	Fix CID 1386105 (Logically dead code) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	64fc205e78	Fix CID 1402767 (Invalid type in argument to printf format specifier) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	f62a895f74	Remove unused italic, bold in class BLOCK_RES and class WORD_RES Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 11:53:58 +02:00
Stefan Weil	ceb8af889e	Fix CID 1340276 (Uninitialized scalar field) for class BLOB_CHOICE xgap_before_ and xgap_after_ are never used, so remove them. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:15:47 +02:00
Stefan Weil	5fdd32bea8	Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch secondary_beam_size_ is set but never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:09:03 +02:00
Stefan Weil	737173a84d	Fix CID 1375401 (Uninitialized scalar field) for class Dawg Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:03:10 +02:00
Stefan Weil	edba74d64f	Fix CID 1400760 (Uninitialized scalar field) for class BLOCK Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:58:05 +02:00
Stefan Weil	8ff321e41a	Fix two issues reported by Coverity Scan and modernize class WERD_RES Report from Coverity Scan: CID 1405560 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member end is not initialized in this constructor nor in any functions that it calls. CID 1405561 [...] Modernize and optimize class WERD_RES. This not only fixes the issues but also reduces the size and eliminates the functions InitNonPointers and InitPointers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:51:36 +02:00
Stefan Weil	ecf0f2dee5	Optimize classes Trie, Dawg and DawgPosition Reduce size from 368 to 352 bytes for Trie, 72 to 64 bytes for Dawg and 40 to 24 bytes for DawgPosition by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 08:15:01 +02:00
Stefan Weil	efd8dea587	Optimize classes CLIST_ITERATOR, ELIST_ITERATOR, ELIST2_ITERATOR Reduce size from 56 to 48 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 22:03:03 +02:00
Stefan Weil	751fcd2b11	Optimize class Classify Reduce size from 138016 to 13000 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 21:46:55 +02:00
Stefan Weil	0ad08a99b0	Optimize class TFile Reduce size from 24 to 16 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:17:05 +02:00
Stefan Weil	5b4565b80b	Optimize class ColPartition Reduce size from 248 to 224 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	5a12273650	Optimize struct LMConsistencyInfo Reduce size from 104 to 96 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	091ce345f6	Optimize class ViterbiStateEntry Reduce size from 232 to 216 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	913cbe6eae	Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit The class no longer uses bit fields. Re-ordering the member variables avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 09:07:48 +02:00
Stefan Weil	a922745d9a	tfnetwork: Fix info text Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-11 19:10:25 +02:00
Stefan Weil	5fa09f184f	RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in recodebeam_test and unicharcompress_test: src/ccutil/unicharcompress.h:84:27: runtime error: left shift of 267 by 28 places cannot be represented in type 'int' code has up to kMaxCodeLen (9) values, so the highest possible value for i is 8, and the shift value can reach 7 * 8 = 56. That requires an uint64_t data type. size_t would fit for 64 bit hosts, but be too small for 32 bit hosts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	4a2d5a2e8d	OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in osd_test and textlineprojection_test: src/ccmain/osdetect.cpp:109:14: runtime error: division by zero Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	5c6fade555	BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix these runtime errors in mastertrainer_test: src/ccutil/bitvector.cpp:119:18: runtime error: null pointer passed as argument 2, which is declared to never be null src/ccutil/bitvector.cpp:124:10: runtime error: null pointer passed as argument 1, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
zdenop	98c7aaa343	Lstm choice ril (#2635 ) Lstm choice ril	2019-09-06 19:12:00 +02:00
Stefan Weil	9f32032517	ccutil: Remove old comments There is no CLIST2 in the current code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-05 17:52:42 +02:00
Stefan Weil	b6933a1082	Use type bool for boolean values in class BLOBNBOX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-03 19:56:59 +02:00
Noah Metzger	c350077b96	Made the lstm_choice mode compatible with the hocr_char_boxes mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:54 +02:00
Noah Metzger	e8b9c10d07	Clean up lstm_choice_mode and cut it down to 2 modes instead of 4 Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:53 +02:00
Stefan Weil	fdf4067296	Fix warnings from LGTM This fixes three LGTM warnings: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 22:04:24 +02:00
Stefan Weil	dc90741f1b	Fix crash when function lookup tables are accessed with NaN Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 13:42:09 +02:00
Stefan Weil	7968f50fe6	capi: Add missing PSM_RAW_LINE to TessPageSegMode Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-25 09:08:09 +02:00
zdenop	0ded672067	fix typo	2019-08-18 18:47:32 +02:00
Stefan Weil	00cff79f7f	simd: Check whether the OS supports FMA, AVX, ... Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-16 22:51:17 +02:00
Stefan Weil	43b2e9513b	lstmtrainer: Fix diagnostic message Signed character values must be converted to unsigned integers for %x. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-15 14:31:32 +02:00
Stefan Weil	100d8cd29b	lstmtester: Add missing space in log messages Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-14 14:12:47 +02:00
Stefan Weil	a86251c62b	classify/Makefile: Fix inconsistent style Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 21:35:59 +02:00
Egor Pugin	423a188513	Export some classify vars.	2019-08-13 20:12:21 +03:00
Stefan Weil	46e2a0f106	Remove more code for builds with disabled legacy engine Now the Tesseract library no longer includes unused code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 17:49:10 +02:00
Egor Pugin	73f713519c	Merge pull request #2614 from stweil/training Move source files which are used for training only to src/training	2019-08-12 19:35:50 +03:00
Stefan Weil	e84cb24def	Move source files which are used for training only to src/training They are moved from src/classify and src/lstm to src/training. This reduces the size of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 17:08:08 +02:00
Stefan Weil	ba17bc8204	OpenCL: Add static attribute for kernel_src It is only used in openclwrapper.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:13:45 +02:00
Stefan Weil	970622fbd1	Remove unused functions create_edges_window, draw_raw_edge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:04:10 +02:00
Stefan Weil	23e605911f	Remove unused function truncate_path and related files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:48:56 +02:00
Stefan Weil	bce585286d	Remove global array kPolyBlockNames from Tesseract library It is only used in unittest/layout_test.cc after moving a test from baseapi_test.cc to that file, so it can be made local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:33:55 +02:00
Stefan Weil	beec85e023	Remove UNICHARSET::load_from_inmemory_file and related code The method was only used in unittest where it can be replaced by UNICHARSET::load_from_file which also simplifies the code. This allows removing the class InMemoryFilePointer and fixes a TODO. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 13:07:15 +02:00
Stefan Weil	315dd9df3f	cmake: Don't link pthread on Windows Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-07 15:24:00 +02:00
Stefan Weil	b8079d8ce1	universalambigs: Add hack to fix builds with Microsoft compiler The MS compiler only accepts string constants up to 65535 characters, so shorten the string for that compiler to fix the compilation. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-06 15:46:07 +02:00
Zdenko Podobný	c5a50b93ce	move fileio.cpp and fileio.h to training (this fix android build)	2019-08-04 21:26:39 +02:00
Stefan Weil	6acab45837	universalambigs: Replace octal characters by UTF-8 string This improves readability and reduces the file size. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	8127b4dd27	Clean ambigs.h * Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator * Move some declarations to ambigs.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	23ef93ac4d	cmake: Add missing pthread library It is needed for C++ threads since commit `85068be405`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-26 07:45:51 +02:00
Stefan Weil	e6ca7f3ec6	hocrrenderer: Add missing escaping of special characters in HTML output This converts special character like '<' or '>' to the correct HTML entities. Optimize also the code a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:53:36 +02:00
Stefan Weil	2679cae5d8	Simplify code by using ClipToRange Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:37:39 +02:00
Stefan Weil	4b2927ae41	LSTMRecognizer: Add non const get functions This allows removing several const casts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:26:51 +02:00
Stefan Weil	4cb3f34c09	Improve formatting of hOCR output with character boxes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:07:18 +02:00
Stefan Weil	9195a904a7	Use auto data type for results of std::ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:56:17 +02:00
Stefan Weil	4132194c49	Remove unused filesize_ from class InputBuffer This also simplifies the constructors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:48:27 +02:00
Stefan Weil	a2b13b49ff	Simplify shell code (fixes warning from Codacy) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:33:24 +02:00
Stefan Weil	d4e0ab3014	Use long instead of off_t for result from ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:14:42 +02:00
Stefan Weil	467f8f4140	Fix training script for macOS (issue #2578 ) Bash on macOS does not support "\|&": tesstrain_utils.sh: line 80: syntax error near unexpected token `&' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 17:18:44 +02:00
Stefan Weil	f92181561c	Fix some compiler warnings (unused local variables) gcc warnings: src/classify/protos.cpp:85:7: warning: unused variable ‘i’ [-Wunused-variable] src/classify/protos.cpp:86:7: warning: unused variable ‘Bit’ [-Wunused-variable] src/classify/protos.cpp:89:14: warning: unused variable ‘Config’ [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 07:47:28 +02:00
Stefan Weil	a419f2d78b	Modernize BIT_VECTOR a little bit This removes one more user of Emalloc / Efree. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 22:09:08 +02:00
zdenop	c8374cc528	Merge pull request #2576 from noahmetzger/LSTMChoiceRIL Implemented improved character bounding box algorithm	2019-07-16 12:25:17 +02:00
zdenop	f4925077e8	Merge pull request #2574 from stweil/fix classify: Use fixed size bit vector	2019-07-16 12:22:48 +02:00
zdenop	cb5c78be7d	Merge pull request #2572 from adaptech-cz/wordBoundsOn2ndPass Give word's bounds to callback also during second pass	2019-07-16 12:19:31 +02:00
Noah Metzger	3a5e508934	Implemented improved bounding box algorithm Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-16 11:38:50 +02:00
Stefan Weil	028fff6edd	classify: Use fixed size bit vector The vector was already limited to MAX_NUM_PROTOS (512) entries or 64 bytes in the old code. Now it uses that size right from the start which avoids reallocating it later when entries are added. The old code which reallocated the vector to expand it was buggy because the realloc function can return a different pointer, but the code still used the original pointer to reset the new bits. Function ExpandBitVector is now unused and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 10:18:11 +02:00
Robert Pösel	f99fcd7691	Give word's bounds to callback also during second pass	2019-07-16 09:11:06 +02:00
Stefan Weil	5bbb7f59a6	Remove structures.* It only provided the functions new_cell, free_cell which could be replaced by new, delete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	3621272051	Remove cutil_class.* It is no longer needed since commit `4523ce9f7d`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	ea462b2c03	Remove unused functions reverse16, reverse32 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 21:50:46 +02:00
Stefan Weil	c8cb925813	Remove non portable sleep by std::this_thread::sleep_for Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 16:00:07 +02:00
Stefan Weil	fcfdb7e56f	Remove unused include statements Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:48:31 +02:00
Stefan Weil	ba0c55adc5	svutil: Remove SVSync::StartThread and SVSync::ExitThread Both are unused now. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	85068be405	lstmtester: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	43a281893f	scrollview: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	a6d723bf10	Replace SVSync::StartThread by std::thread and use std::this_thread::yield Using yield instead of a sleep makes running imagedata_test much faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	13bb4623b1	Use std::lock_guard to protect a code block This is simpler than using lock() / unlock() explicitly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	93427391c1	Replace SVAutoLock by std::lock_guard Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	c0b8ee3b82	Replace CCUtilMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	36026e3c35	Replace SVMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
zdenop	56d4fdce00	Merge pull request #2554 from noahmetzger/LSTMChoiceRIL Improved lstm_choice_mode	2019-07-15 11:51:52 +02:00
Noah Metzger	2dd5d0d60a	Fixed a bug when first decode iteration stays empty and added some comments. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-15 10:05:22 +02:00
Stefan Weil	61eab60fe3	arch: Reduce number of include files for dot product functions dotproductavx.h and dotproductsse.h declared only two functions. Move those declarations to dotproduct.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	2d5b166876	Add dot product implementation for Intel FMA (double = tessdata_best) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	9259ed8f26	Optimize tprintf implementation It no longer uses a local buffer, so it needs less memory and no mutex. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 20:59:07 +02:00
Stefan Weil	2aebd10fb7	FPRow: Add missing initialisation for scalar (CID 1402754) Modernize the code also a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 17:15:55 +02:00
Stefan Weil	bdc7abf518	Fix format strings for size_t arguments (CID 1402762, 1402767) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:57:19 +02:00
Noah Metzger	11a4cd298b	Added parameters for the LSTM CTC Choice mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Noah Metzger	f2d685a90f	Added CTC-based Symbolchoices. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Stefan Weil	ee04347347	Fix format string for 64 bit integer (CID 1402986) Commit `c1264c189e` was not the right fix. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:20:50 +02:00
Stefan Weil	890b810a9e	tfnetwork: Add missing return statement (CID 1402992) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 08:20:52 +02:00
Egor Pugin	3b6f071ee8	Implement CMake+SW build. Currently only Windows is supported. You could try it as following: mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1	2019-07-08 18:50:30 +03:00
Egor Pugin	84ffcc0d38	Merge pull request #2548 from zhuangzhuang/fix_tesstrain_py_error fix tesstrain.py error	2019-07-08 11:25:41 +03:00
zhuangzhuang1988	18c67f4989	fix tesstrain.py error	2019-07-08 14:35:17 +08:00
zhuangzhuang	9eb997fc0b	fix windows stdout messy code (#2546 ) * fix windows stdout messy code * fix type name error * remoe unnecessary codepoint check.	2019-07-08 09:33:53 +03:00
Stefan Weil	d653bb61f3	genericvector: Remove redundant declarations tesseract::FileReader and tesseract::FileWriter are already declared in serialis.h which is included by genericvector.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-05 18:47:15 +02:00
Dmitry Bely	74145f0686	Fix crash in Tesseract::classify_word_and_language() when tessedit_timing_debug is enabled	2019-07-05 12:36:25 +02:00
zdenop	01535706ec	Merge pull request #2539 from stweil/tesscallback Replace tesscallback.h and related proprietary data types by C++-11 functionals	2019-07-05 10:52:06 +02:00
Stefan Weil	134eb39960	Remove tesscallback.h It is no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3bae459823	Use C++-11 code instead of TessCallback for WERD_RES::ConditionalBlobMerge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	e61c828dcd	Use C++-11 code instead of TessCallback for UNICHARSET::load_via_fgets Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	0ea8ada308	Use C++-11 code instead of TessCallback for WidthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	1c1eb76c36	Use C++-11 code instead of TessCallback for Dawg::iterate_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3fb15b3891	Use C++-11 code instead of TessCallback for ObjectCache::Get Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	56d8210909	Use C++-11 code instead of TessCallback for TruthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	c33b05be55	Use C++-11 code instead of TessCallback for PointerVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	cc0405298b	Use C++-11 code instead of TessCallback for read, write Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	242e1db7fa	Use C++-11 code instead of TessCallback for function set_compare_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ffd8101986	Use C++-11 code instead of TessCallback for function set_clear_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ded24d0367	ccmain: Use C++-11 code instead of TessCallback1 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	eeec9c66d4	training: Use C++-11 code for TestCallback This allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	201ba0dd53	Fix handling of single pages from multipage TIFF files (issue #2537 ) That case now uses Leptonica to deliver the desired image instead of using an inefficient loop in the Tesseract code. See commit `54fafc4e2e` which used similar code in the past. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:56:57 +02:00
Stefan Weil	f1c6564cd7	Revert "fix read wrong tiff page." This reverts commit `75d230a7ac`. That commit introduced new problems (memory leak, potential endless loop) and style issues. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:44:07 +02:00
Stefan Weil	fd001c3ab9	Fix linker error with disabled legacy engine (issue #2532 ) Commit `3871caae86` introduced a build regression when the legacy engine was disabled. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 13:47:38 +02:00
zhuangzhuang1988	75d230a7ac	fix read wrong tiff page.	2019-07-04 12:32:18 +08:00
zhuangzhuang1988	4d4c16bce1	fix start ScrollView.jar failed when lstmtraining	2019-07-03 07:27:50 +02:00
zhuangzhuang1988	99cb088708	close log file handle before move it.	2019-07-01 10:53:12 +08:00
zhuangzhuang1988	a3a361f73d	fix logger file encoding error.	2019-06-28 18:29:52 +08:00
Stefan Weil	5895534b5e	Update enum from unicode/uchar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-25 10:55:33 +02:00
Stefan Weil	c1264c189e	Fix format string for 64 bit integer This fixes also a warning from gcc. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:31:09 +02:00
Stefan Weil	dfd35d3e27	baseapi: Remove old code The workaround is no longer needed because _splitpath and _MAX_FNAME were removed in commit `cc0d87c5b8`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:32 +02:00
Stefan Weil	dd261e8d42	Replace code using _splitpath_s (win32) That simplifies the code and removes a dependency on "newer" versions of Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:15 +02:00
Stefan Weil	f522b039a5	Remove outdated comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:03:19 +02:00
Stefan Weil	ea20bf0373	Remove dummy code from LSTMTrainer::InitTensorFlowNetwork Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:01:40 +02:00
Stefan Weil	41f91c96c8	cmake: Build training tools also on Linux and macOS This enables CI tests for the code in src/training on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 20:27:56 +02:00
Egor Pugin	ab28a03e93	Merge pull request #2514 from stweil/tessresultcallback Move LSTMTrainer from libtesseract to libtesseract_training	2019-06-22 18:34:49 +03:00
Stefan Weil	df98bb7368	Move LSTMTrainer from libtesseract to libtesseract_training LSTMTrainer is only used for training, so the shared library for Tesseract can be made smaller. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 16:23:51 +02:00
Stefan Weil	cb2957b3d2	Replace callback by direct function calls in TessBaseAPI::GetComponentImages The new code avoids dynamic memory allocation, uses faster function calls and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 14:54:31 +02:00
Stefan Weil	3159f42257	Remove unused GenericVector::dot_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:21 +02:00
Stefan Weil	bef73d9956	Remove unused GenericVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:08 +02:00
Egor Pugin	3c6a04ea1a	Merge pull request #2512 from stweil/tessresultcallback Simplify class LSTMTrainer	2019-06-22 13:41:21 +03:00
Stefan Weil	2a9b2fb32a	Remove wrong description for GenericVector::set_compare_callback and simplify code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 11:22:07 +02:00
Stefan Weil	bd13069fe8	Simplify class LSTMTrainer The function pointers and callbacks file_reader_, file_writer_, checkpointer_reader_ and checkpoint_writer_ are always set to the same values. Replacing them by direct function calls simplifies the code and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 09:18:13 +02:00
Stefan Weil	3871caae86	Simplify indirect call of LMPainPoints::GeneratePainPoint It does neither need a temporary TessResultCallback2 nor the function LMPainPoints::GenerateForBlamer. This also allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-21 17:09:33 +02:00
zdenop	60b4c68d31	tesstrain_utils.sh: remove redundant code	2019-06-20 18:42:29 +02:00
Stefan Weil	5f23290655	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-20 08:38:00 +02:00
Stefan Weil	2c78735d97	ocrfeatures: Remove locally used functions from global interface ReadFeature, WriteFeature are only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 15:09:39 +02:00
zdenop	a3593d994b	Merge pull request #2499 from stweil/embedded Remove code for embedded build	2019-06-17 10:24:45 +02:00
Stefan Weil	674d6a90d8	Remove code for embedded build That code is unrelated to Tesseract and can be easily implemented by external projects which require it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 09:55:33 +02:00
zdenop	60aee9f821	create OUTPUT_DIR did not exist; fixes #2497	2019-06-16 15:07:16 +02:00
zdenop	fad96db497	Merge pull request #2494 from Shreeshrii/master Allow saving of box/tiff pairs during legacy tesseract training	2019-06-14 20:44:49 +02:00
Shree	6fa4587949	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:35:39 +00:00
Shree	45cdf741ae	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:32:41 +00:00
Shree	832c6edb97	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:25:54 +00:00
James R. Barlow	a9890afd12	Fix text2image compilation on C++17 compilers C++17 drops support for `std::random_shuffle`, breaking C++17 compilers that run to compile text2image.cpp. std::shuffle is valid on C++11 through C++17, so use std::shuffle instead. Due to the use `std::random_shuffle`, `text2image --render_ngrams` would not give consistent results for different compilers or platforms. With the current change, the same random number generator is used for all platforms and initialized to the same seed, so training output should be consistent.	2019-06-13 16:07:20 -07:00
Stefan Weil	fefd521a49	Add dot product implementation using std::inner_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-31 12:07:17 +02:00
Stefan Weil	e0c2f0a782	Fix crash in PreloadRenderers with nullptr outputbase The crash could be triggered by a wrong command line. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-29 07:46:29 +02:00
Stefan Weil	9a4bd041c8	Fix build for unittests Commit `29f2cff203` was the wrong fix for the compiler warnings because it broke the unittest build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 21:36:34 +02:00
Stefan Weil	2c23e7ead5	scanedg: Add const attributes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	4b3bbd908a	Remove EXTERN macro Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	ac999b2409	Remove unused macros This fixes compiler warnings from clang++ like these ones: src/ccutil/params.cpp:34:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:67:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:68:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:78:9: warning: macro is not used [-Wunused-macros] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	8c8eb21bc5	Fix compiler errors for old gcc Travis CI with gcc 4.8 failed with errors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 15:38:40 +02:00
Stefan Weil	a86143a41d	Remove some unused functions, constants and variables This fixes compiler warnings, for example: src/ccutil/strngs.cpp:36:11: warning: unused variable 'kMaxDoubleSize' [-Wunused-const-variable] src/viewer/svutil.cpp:320:13: warning: unused function 'TessFreeAddrInfo' [-Wunused-function] src/ccstruct/werd.cpp:32:19: warning: unused variable 'CANT_SCALE_EDGESTEPS' [-Wunused-const-variable] src/textord/bbgrid.cpp:103:10: warning: unused variable 'old_tright' [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:52:31 +02:00
Stefan Weil	29f2cff203	training: Add missing static attributes That fixes several warnings from clang++ like the following one: src/training/combine_lang_model.cpp:36:1: warning: no previous extern declaration for non-static variable 'FLAGS_lang_is_rtl' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:33:52 +02:00
Stefan Weil	a139d553a7	training: Move declarations from cpp files to h file That fixes several warnings from clang++ like the following one: src/training/commontraining.cpp:95:1: warning: no previous extern declaration for non-static variable 'FLAGS_D' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	389285010c	featdefs: Add missing include statement It is needed for PicoFeatureLength. This fixes a compiler warning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	4bec4a69a0	Add missing static attributes This fixes lots of compiler warnings like these ones: src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations] src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	7e7811ff92	bits16: Modernize code This also fixes warnings like the following one from clang++: src/ccmain/pgedit.cpp:114:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:08 +02:00
Stefan Weil	334d9b4633	unicodes: Optimize code by using constexpr and removing unused globals Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:51:28 +02:00
Stefan Weil	23d05a5e1b	featdefs: Optimize code by using constexpr This also fixes some warnings from clang++: src/classify/featdefs.cpp:47:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:57:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:66:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:75:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:46:36 +02:00
Stefan Weil	7628112273	Fix broken build for Leptonica < 1.77 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:23:43 +02:00
Stefan Weil	55901a480f	Remove classify/cutoffs.h It only defined CLASS_CUTOFF_ARRAY and some unused definitions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 13:54:44 +02:00
zdenop	82458db630	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-05-25 11:14:28 +02:00
zdenop	539673b503	fix '--enable-visibility' build	2019-05-25 11:13:33 +02:00
zdenop	8de022ab1c	Merge pull request #2461 from stweil/tensorflow Support build with Tensorflow	2019-05-25 10:52:37 +02:00
Stefan Weil	32dcfd06ba	Replace Tensorflow by TensorFlow The name is written in camel case, see https://www.tensorflow.org/. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 17:14:28 +02:00
Stefan Weil	2441e4d8ac	Implement check for Tensorflow header file This looks for one of the header files which are included by Tesseract. It currently uses a hard coded path which works for Debian / Ubuntu. Simplify also the rules for linking Tensorflow. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 16:52:14 +02:00
Stefan Weil	9cdf041448	Remove "third_party/" in comments and update path names Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:12:52 +02:00
Stefan Weil	4382ab1a34	Support build with Tensorflow It expects include files in /usr/include/tensorflow. * Add configure option --with-tensorflow (disabled by default) * Fix data type tensorflow::int64 * Remove "third_party/" in include statements * Add dummy implementations for Backward and DebugWeights in TFNetwork * Add files generated with protoc from tfnetwork.proto (so the Tensorflow sources are not needed for the build) * Update Makefiles Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:11:31 +02:00
Zdenko Podobný	294f548ac1	fix missing tiff format	2019-05-24 10:39:17 +02:00
Stefan Weil	3f74da5da9	lstmtrainer: Set constant kLearningRateDecay at compile time sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2. This also fixes a compiler warning: src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-23 15:01:23 +02:00
zdenop	4bab7dd83d	Merge pull request #2451 from Bharat123rox/lgtm Some LGTM alert fixes and potential bugfixes	2019-05-22 12:19:44 +02:00
Egor Pugin	fea1f3970b	Merge pull request #2452 from stweil/tprintf tprintf: Make code reentrant and use less memory	2019-05-22 12:31:34 +03:00
Egor Pugin	8f99880a7a	Merge pull request #2453 from stweil/crashcode Remove SavePixForCrash and related code	2019-05-22 12:30:29 +03:00
Bharat123rox	bc3ea622a6	Fix bug in max_max_dist	2019-05-22 08:21:30 +02:00
Bharat123rox	0bf45e81e7	Fix LGTM and revert bugfix for later PR	2019-05-22 11:23:27 +05:30
Bharat123rox	945ccac85a	Fix syntax error	2019-05-22 10:10:12 +05:30
Stefan Weil	6514479e8f	Remove SavePixForCrash and related code That debugging code uses very much memory and is no longer useful. text data bss dec hex filename 815 0 262144 262959 4032f src/ccutil/globaloc.o Remove also the function err_exit which was only used in ccmain/reject.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:25:58 +02:00
Stefan Weil	078a129674	tprintf: Make code reentrant and use less memory Reduce the maximum message size from 64 KiB to 2 KiB which still should be large enought for trace messages. Create the smaller message on the stack instead of using a global array to allow reentrancy and to reduce the memory use of Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:22:58 +02:00
Bharat123rox	7f31a0634d	Some LGTM fixes and potential bugfixes	2019-05-21 23:24:50 +05:30
Stefan Weil	d2ca81e794	Remove local definition of M_PI It is defined for all platforms when math.h or cmath is included after defining the macro _USE_MATH_DEFINES. Define _USE_MATH_DEFINES before any include statement to make sure that M_PI gets defined. It is not necessary to define it conditionally only for Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 21:18:52 +02:00
Stefan Weil	64bdceee69	Fix compiler warnings This fixes lots of warnings related to ERRCODE like the following one: src/ccutil/errcode.h:81:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-19 22:10:22 +02:00
Stefan Weil	09edd1a604	Fix out-of-bounds writes in Classify::ReadNewCutoffs The function did not correctly read Chinese unichars into the local Class variable if the locale was set to de_DE.UTF-8 (or other incompatible locales). That resulted in a wrong ClassId which was used to write into the Cutoffs array without checking for valid bounds. On macOS the result was a runtime error in baseapi_test (see GitHub issue #1250): [ RUN ] TesseractTest.InitConfigOnlyTest baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug Replacing sscanf by std::istringstream fixes that. Add also an assertion to catch future out-of-bounds writes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:39:55 +02:00
zdenop	7e9d2f4bc4	Merge pull request #2432 from nickjwhite/hocrmoretypes Add different classes to hocr output depending on BlockType	2019-05-16 17:02:48 +02:00
Stefan Weil	331cc84d8d	Remove assertions for unsupported locale settings The latest code passed all unittests with locale de_DE.UTF-8 and has fixed the locale issues which were reported on GitHub. Therefore the assertions can be removed. Any remaining locale issue will be fixed when it is identified. To help finding such remaining isses, debug code now uses the user's locale settings instead of the default "C" locale for all executables which use TessBaseAPI. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 13:59:39 +02:00
Stefan Weil	77f9bad3c2	Fix UNICHARSET::save_to_string for locale de_DE.UTF-8 That function writes float values which must always use '.' as the decimal separator, no matter what the current locale setting is. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:39:59 +02:00
Stefan Weil	36ed6da349	Fix baseapi_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/baseapi_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 12 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10 tests from TesseractTest [ RUN ] TesseractTest.ArraySizeTest [ OK ] TesseractTest.ArraySizeTest (0 ms) [ RUN ] TesseractTest.BasicTesseractTest [ OK ] TesseractTest.BasicTesseractTest (1251 ms) [ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected [ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms) [ RUN ] TesseractTest.HOCRWorksWithoutSetInputName [ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms) [ RUN ] TesseractTest.HOCRContainsBaseline [ OK ] TesseractTest.HOCRContainsBaseline (389 ms) [ RUN ] TesseractTest.RickSnyderNotFuckSnyder [ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms) [ RUN ] TesseractTest.AdaptToWordStrTest Trying to adapt "136 " to "1 3 6" Trying to adapt "256 " to "2 5 6" Trying to adapt "410 " to "4 1 0" Trying to adapt "432 " to "4 3 2" Trying to adapt "540 " to "5 4 0" Trying to adapt "692 " to "6 9 2" Trying to adapt "779 " to "7 7 9" Trying to adapt "793 " to "7 9 3" Trying to adapt "808 " to "8 0 8" Trying to adapt "815 " to "8 1 5" Trying to adapt "12 " to "1 2" Trying to adapt "12 " to "1 2" [ OK ] TesseractTest.AdaptToWordStrTest (788 ms) [ RUN ] TesseractTest.BasicLSTMTest [ OK ] TesseractTest.BasicLSTMTest (4525 ms) [ RUN ] TesseractTest.LSTMGeometryTest [ OK ] TesseractTest.LSTMGeometryTest (615 ms) [ RUN ] TesseractTest.InitConfigOnlyTest Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.232621 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.231864 in normproto file is not in unichar set. [...] Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.233915 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.221755 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar ? in normproto file is not in unichar set. baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug [INFO] Lang eng took 327ms in regular init [INFO] Lang chi_tra took 1422ms in regular init Abort trap: 6 TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream instead of sscanf. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:05:09 +02:00
Stefan Weil	0dcc889e8d	Fix apiexample_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/apiexample_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from EuroText [ RUN ] EuroText.FastLatinOCR contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-15 22:43:47 +02:00
Stefan Weil	6b1e709b19	Fix Doxygen comments for void functions Void functions should not use @return. It causes compiler warnings like this one: src/classify/intproto.cpp:326:5: warning: '@return' command used in a comment that is attached to a function returning void [-Wdocumentation] Some non-void functions also were documented with @return none. Fix those comments, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 21:57:17 +02:00
Stefan Weil	caa04882fd	normmatch: Remove unused private function PrintNormMatch was unused. Remove it and remove also an unused prototype. Make the only remaining private function NormEvidenceOf static. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 20:56:04 +02:00
Nick White	068eb4c35d	Add different classes to hocr output depending on BlockType These classes are taken from the hOCR specification, and seem to map well onto the BlockType types. There are probably more that could be added.	2019-05-14 13:25:08 +01:00
Stefan Weil	5d92fbf010	Replace sscanf by std::istringstream Using std::istringstream allows conversion of string to float independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 15:04:30 +02:00
Stefan Weil	c76ceafcdf	Fix reading of parameter from traineddata normproto component The NonEssential parameter was wrongly derived from linear_token instead of essential_token and therefore always set to true. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 14:43:58 +02:00
Stefan Weil	c07bc4e014	Fix Doxygen comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:55:23 +02:00
Stefan Weil	c8e96e2c02	Fix cast from pointer to integer type Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-12 08:54:46 +02:00
zdenop	7a5b9b8fcd	ScrollView: remove custom implementation of GetAddrInfo	2019-05-04 15:16:41 +02:00
zdenop	5e01f74648	remove unused include	2019-05-04 15:14:54 +02:00
Stefan Weil	aba037329a	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-04 11:05:50 +02:00
Stefan Weil	57ff92e4bf	tesscallback: Remove unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 22:14:04 +02:00
zdenop	9192c3afe2	correct tessdata comment in baseapi.h	2019-05-02 08:43:04 +02:00
zdenop	7e48368a5e	Merge pull request #2421 from stweil/includes universalambigs: Add missing include file	2019-05-02 08:36:49 +02:00
zdenop	39d3824c78	Merge pull request #2420 from stweil/locale Fix more locale dependencies	2019-05-02 08:31:41 +02:00
Stefan Weil	cd749be473	universalambigs: Add missing include file This allows fixing two compiler warnings from clang++: src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations] src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:36:31 +02:00
Stefan Weil	4fbc0a257b	commandlineflags: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	d047fa1d1b	paramsd: Replace strtod by std::stringstream Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:46 +02:00
Stefan Weil	e3860e45b7	clusttool: Replace strtof by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	ed45656ec8	clusttool: Remove unused code and some global functions * WriteProtoList is unused. Remove it. * ReadNFloats, WriteNFloats and WriteProtoStyle are only used locally, so make them local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-02 07:33:45 +02:00
Stefan Weil	28a521fec2	Fix some typos (most found and fixed by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-01 20:30:41 +02:00
zdenop	41f50b19bb	fix crash in case of missing PNG support in Leptonica see #2333	2019-05-01 19:51:54 +02:00
zdenop	90aef80dd7	fix documentation about datapath: ending "/" is not relevant	2019-05-01 11:37:50 +02:00
Jeff Breidenbach	546a9e81eb	fix #1900 : intraword spacing for slightly better pdf copy-paste performance	2019-04-29 11:28:30 +02:00
zdenop	137e6de56f	Print info when uzn file is used.	2019-04-28 19:06:38 +02:00
Zdenko Podobný	80e54e401d	fix spelling	2019-04-24 15:35:22 +02:00
Zdenko Podobný	832c257771	remove unused variable	2019-04-24 14:55:35 +02:00
Stefan Weil	b7bc71e987	Fix build for Windows * winsock2.h is case sensitive, lower case is required for cross build. * ws2tcpip.h is required for addrinfo. * FreeAddrInfo conflicts with existing freeaddrinfo, so rename it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-24 11:24:47 +02:00
zdenop	129fe95390	svutil.cpp: fix windows build	2019-04-23 23:03:28 +02:00
zdenop	7bacc8852b	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-04-23 22:01:30 +02:00
zdenop	5c6ac61fe2	remove unused includes	2019-04-23 20:59:36 +02:00
zdenop	27f0f2ecea	MSVS support inttypes.h from VS 2015	2019-04-23 20:45:14 +02:00
Stefan Weil	708511adcb	Only include windows.h using host.h host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be set before including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	53f1265362	Clean macros in platform.h * Remove unused macros ultoa, SIGNED. * Move macros NOMINMAX and WIN32_LEAN_AND_MEAN to host.h because they are used when including windows.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	3bd61bfae4	svutil: Clean include file * Remove MIN, MAX macros. They are unused. * Include windows.h indirectly by including host.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	e12b99d49b	Remove host.h from Tesseract API It is not needed by other API header files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:51:07 +02:00
Stefan Weil	8a34da027f	Fix typo in description Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-22 21:50:37 +02:00
Shree	f8fba6362b	fix the coordinates for EOL tab	2019-04-22 09:54:20 +00:00
zdenop	3ec7c22a87	fix missing EOL	2019-04-22 08:49:55 +02:00
Stefan Weil	09255ebe44	Don't include windows.h from platform.h This partially reverts commit `c150b9832d`. Now params.cpp includes host.h which also gets the definition for MAX_PATH. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-21 22:20:13 +02:00
zdenop	6781d78211	Merge pull request #2399 from stweil/pgedit pgedit: Remove unused global functions	2019-04-20 19:26:02 +02:00
Stefan Weil	4ac1fad18a	pdfrenderer: Replace snprintf by std::stringstream Using std::stringstream allows conversion of float to string independent of the current locale setting. Some snprintf statements are not needed at all because a constant string can be appended directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	07d5365a1f	baseapi: Use std::stringstream to format float values Using std::stringstream allows conversion of float to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:05:29 +02:00
Stefan Weil	743fc2562d	Remove unneeded include statements for pgedit.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	26dd0b82bf	pgedit: Remove unused global functions pgeditor_show_point is unused, so remove it completely. Some more functions are only used locally, so make them static functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-20 19:00:07 +02:00
Stefan Weil	217c2530e6	Remove strtofloat Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	7c3f9000cd	Replace sscanf by std::stringstream Using std::stringstream allows working with the C locale, independent of the current locale settings. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-19 11:19:04 +02:00
Stefan Weil	5529a5db11	unittest: Fix and enable params_model_test This needs the latest test submodule. The test uses LoadFromFile which is not used otherwise, so remove that function from class ParamsModel. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-18 17:06:48 +02:00
Stefan Weil	a1ffcd3654	Use std::stringstream for add_str_double Using std::stringstream allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:16:16 +02:00
Stefan Weil	aa64a63f69	Use std::stringstream to generate PDF output Using std::stringstream simplifies the code and allows conversion of double to string independent of the current locale setting. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-14 16:15:39 +02:00
Stefan Weil	78a957b989	Remove spaces a line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:54:42 +02:00
Stefan Weil	12ca2513d4	Revert "e" flag for fopen clang-tidy added it in commit `ac0b191f6b`. The "e" flag is an extension for glibc which sets the O_CLOEXEC flag, so the file handle is not leaked to child processes. It is not needed here. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-13 18:53:57 +02:00
Samuel Lee	e32b3360aa	Fix for MSVC LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.	2019-04-11 02:33:51 +09:00
Stefan Weil	f88a7f28e3	fontinfo: Fix wrong delete Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:16:04 +02:00
Stefan Weil	3dfe1b8807	classify: Modernize function UniformDensity This should fix an issue reported by Codacy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 12:13:45 +02:00
Stefan Weil	72c874140e	Modernize code by replacing C type casts This was done using clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-07 09:04:51 +02:00
zdenop	95a15a7a82	fix cmake&clang build	2019-04-06 15:31:53 +02:00
zdenop	ab09b09da6	Merge pull request #2294 from bertsky/lstm-with-char-whitelist trying to add tessedit_char_whitelist etc. again:	2019-04-06 14:41:30 +02:00
Robert Schubert	25a42ea42f	fixed failure report for tesstrain commands: - with `set -e` in effect, looking at stdout to detect failure is too late	2019-04-06 08:13:03 +02:00
Robert Schubert	d5584e793e	fixed failure report for tesstrain commands: - with `set -e` in effect, it does not make sense to query `$?` indirectly	2019-04-06 08:13:03 +02:00
zdenop	be617b3722	Merge pull request #2361 from Shreeshrii/truth Change message display for debug_level -1 during lstmtraining	2019-04-05 10:52:21 +02:00
zdenop	2982cb4ff3	Merge pull request #2368 from amitdo/no-legacy-fix disable-legacy build: Do not include unused headers	2019-04-05 09:35:04 +02:00
Stefan Weil	d35a6f2de5	Modernize code (clang-tidy check modernize-deprecated-headers) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
Stefan Weil	20d5eedd45	Modernize code (clang-tidy check modernize-loop-convert) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-05 08:29:00 +02:00
amitdo	fab9a54981	Remove unneeded 'SUBDIRS=' from 3 Makefile.am files	2019-04-04 19:31:39 +02:00
Shree	6673347986	Change page to line in message	2019-04-04 15:43:29 +00:00
Shree	51c3535310	Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1	2019-04-04 15:33:22 +00:00
Shree	84d4cc2e95	Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1	2019-04-04 15:33:22 +00:00
Amit D	2069c057d6	Merge branch 'master' into no-legacy-fix	2019-04-04 18:26:22 +03:00
Egor Pugin	2a1d238bd5	Merge pull request #2366 from stweil/modernize Modernize code with "using"	2019-04-04 15:13:10 +03:00
amitdo	546014aecd	disable-legacy build: Do not include unused headers	2019-04-04 15:09:08 +03:00
Stefan Weil	98346c2cd4	Modernize and format code The code was modernized using clang-tidy with "modernize-use-using". The modified files were then formatted using clang-tidy with "google-readability-braces-around-statements", then clang-format was applied. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-03 21:02:23 +02:00
Shreeshrii	613c2bf6e4	Change pages to lines in message The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.	2019-04-03 10:41:14 +05:30
Egor Pugin	af7cc1ce4c	Fix windows build.	2019-04-01 22:38:01 +03:00
Stefan Weil	81fbd878dd	Add more missing include statements for Windows build Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-04-01 08:10:25 +02:00
Stefan Weil	ab009fae94	Remove macro WINDLLNAME It is now no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:05:41 +02:00
Stefan Weil	77a5f2623e	Remove unused config variable tessedit_module_name It was only defined for Windows builds. Use also false instead of 0 to set the default value of two boolean config variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 20:04:00 +02:00
Stefan Weil	c150b9832d	Add missing include statements for Windows build The last commits which removed BOOL8 had broken the Windows build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 19:02:29 +02:00
Stefan Weil	802f42e821	Remove BOOL8, TRUE, FALSE from host.h Remove unneeded include statements for host.h, add required ones and update the comments for the remaining include statements. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:27:20 +02:00
Stefan Weil	be96b7b660	bits16: Format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:26:50 +02:00
Stefan Weil	146079f31d	api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 18:15:53 +02:00
Stefan Weil	4e0c726d6c	ccutil: replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:47 +02:00
Stefan Weil	da0c14ae45	cutil: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:56:19 +02:00
Stefan Weil	87a973652c	classify: Replace BOOL8, TRUE, FALSE by bool, true, false Simplify also some related code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:48 +02:00
Stefan Weil	30ee3afc29	textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:55:20 +02:00
Stefan Weil	b391ab84d0	wordrec: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:54:21 +02:00
Stefan Weil	cbb5e729a1	classify: Use bool and replace TRUE, FALSE Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:50 +02:00
Stefan Weil	46fa59aadc	ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:53:06 +02:00
Stefan Weil	92b9f9f8de	ccmain: Replace TRUE, FALSE by true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:52:09 +02:00
Stefan Weil	7db25e15c0	Remove unused config variable tessedit_single_match Replace also TRUE, FALSE by true, false. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:38:35 +02:00
Stefan Weil	ca2947a2c0	blobclass: Remove unused macros Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:36:46 +02:00
Stefan Weil	f2bd98e656	PageIterator: Remove useless const Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:35:43 +02:00
Stefan Weil	813b7803e0	pgedit: Replace BOOL8 by bool Replace also TRUE, FALSE by true, false and add some static attributes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:29:15 +02:00
Stefan Weil	664811a869	Replace BOOL8, TRUE, FALSE by bool, true, false Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:28:28 +02:00
Stefan Weil	51a2c2eae8	Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:24:02 +02:00
Stefan Weil	95ea778745	capi: Replace FALSE, TRUE and simplify and format code Format code using clang-format and clang-tidy. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:19:04 +02:00
Stefan Weil	89ba48b106	strngs: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:13:38 +02:00
Stefan Weil	127d0e31f0	serialis: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:12:11 +02:00
Stefan Weil	8b663e7620	helpers: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 17:06:19 +02:00
zdenop	3bb8f9cd49	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-03-31 16:54:15 +02:00
zdenop	5f06402755	python: optimize imports, reformat code	2019-03-31 16:53:39 +02:00
zdenop	2e9fd69c9e	use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable"	2019-03-31 16:53:33 +02:00
zdenop	a0527b41bd	fix LGTM reports for python	2019-03-31 16:53:25 +02:00
Stefan Weil	1948f0d520	ocrclass: Modernize and format code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:39:44 +02:00
Stefan Weil	85957e9673	WERD: Don't print space character after "FALSE" at end of line Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:32:42 +02:00
Stefan Weil	83d4433d3b	Modernize and format unichar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:30:15 +02:00
Stefan Weil	ac0b191f6b	Modernize and format genericvector.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:21:32 +02:00
Stefan Weil	36ed08636b	Modernize and format tesscallback.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-31 16:16:00 +02:00
zdenop	f47c7c92dd	fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer; CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142	2019-03-31 12:26:49 +02:00
Shreeshrii	ea36e94e58	fix Could not parse bool from flag (#2359 )	2019-03-29 14:50:21 +01:00
Stefan Weil	852598eecf	Remove file tessedit.h It only declared the unused global variable global_monitor which is now removed, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	6e59abcce2	Remove file cutil.h It only contained three type definitions which fit better in other include files. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-27 19:03:42 +01:00
Stefan Weil	b6bfb20f1d	Improve readability of conditional code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	36a1a30c22	Remove some old type casts Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 12:35:56 +01:00
Stefan Weil	a44bf41f14	Modernize C++ loops The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-loop-convert' -fix Then the resulting code was cleaned manually. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 08:38:21 +01:00
Stefan Weil	ed011670c8	Modernize C++ code using bool literals The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-bool-literals' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:58:02 +01:00
Stefan Weil	a0fd90583b	Modernize C++ code using auto The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-auto' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:55:08 +01:00
Stefan Weil	36f768853a	Modernize C++ code using override The modifications were done using this command: run-clang-tidy-8.py -header-filter='.' -checks='-,modernize-use-override' -fix Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-26 07:37:52 +01:00
Stefan Weil	f877640bc9	Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval tesstrain: check failure of subjobs	2019-03-25 16:10:09 +01:00
Stefan Weil	d8d2f6f48a	Fix broken shell scripts for training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 15:32:43 +01:00
Stefan Weil	631882a346	Fix compiler warnings (signed / unsigned mismatch) clang warnings: src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare] unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare] unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 08:36:07 +01:00
Stefan Weil	ecaad2aca8	ccstruct/werd: Format code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-25 07:57:34 +01:00
Stefan Weil	b1e305f38c	Simplify code which tests for non-empty StringParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:35:52 +01:00
Stefan Weil	f9860cda41	Optimize functions ResetFrom The loop can terminate as soon as the parameter name was found. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:21:23 +01:00
Stefan Weil	41da5afe9d	UNICHARSET: Fix compiler warning (signed/unsigned mismatch) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:18:21 +01:00
Stefan Weil	91e2b253c0	Format modified code with clang-format Format the files which were changed in commit `297d7d86ce`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 21:10:29 +01:00
Stefan Weil	06acbaf99c	IntegerMatcher: Fix division by zero Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1231:62: runtime error: division by zero #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62 #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const) tesseract/src/classify/adaptmatch.cpp:1213:29 #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT, bool, int, int, int, float, int, int, unsigned char const, tesseract::UnicharRating, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1184:13 #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT, short, INT_FEATURE_STRUCT const, unsigned char const, ADAPT_CLASS_STRUCT, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS) tesseract/src/classify/adaptmatch.cpp:1119:5 #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 19:39:31 +01:00
Stefan Weil	58423d2f6c	Merge pull request #2328 from bertsky/lstm-with-user-patterns2 Add user words / patterns again	2019-03-24 19:38:40 +01:00
zdenop	0d36d9a9d7	Merge pull request #2341 from Shreeshrii/fix Fix	2019-03-24 18:21:09 +01:00
Stefan Weil	da6305b632	Fix compiler warnings caused by ASSERT_HOST The modified definition avoids warnings caused by redundant semicolons. Now a semicolon is required when using the macro, so a few code locations had to be updated. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:47:04 +01:00
Stefan Weil	44a6d9f4d4	intmatcher: Catch more out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int, short) tesseract/src/classify/intmatcher.cpp:1121:17 #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11 #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f16ee in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads in release builds. Add also assertions for debug builds. See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 17:27:43 +01:00
Stefan Weil	5fd7228414	intmatcher: Catch out of bounds reads Credit to OSS-Fuzz which reported this issue: intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]' #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT, unsigned int) tesseract/src/classify/intmatcher.cpp:1163:17 #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT, unsigned int, unsigned int, short, INT_FEATURE_STRUCT const, tesseract::UnicharRating, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11 #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB, int, int, float, ADAPT_TEMPLATES_STRUCT) tesseract/src/classify/adaptmatch.cpp:894:9 #3 0x5f35fd in tesseract::Classify::LearnPieces(char const, int, int, float, tesseract::CharSegmentationType, char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:430:5 #4 0x5f201e in tesseract::Classify::LearnWord(char const, WERD_RES) tesseract/src/classify/adaptmatch.cpp:293:7 This catches the out of bounds data reads, but does not fix the primary reason: ProtoLengths currently gets values which are larger than the allowed index. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:44:33 +01:00
Stefan Weil	509ee95023	IntegerMatcher: Fix data type of loop counters ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for the related loop counters, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 15:35:06 +01:00
Stefan Weil	f4f34a87db	WERD_RES: Fix uninitialized member variable Credit to OSS-Fuzz which reported this issue: pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool' #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7 #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3 #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11 See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 14:59:08 +01:00
Stefan Weil	afc099b9f4	intmatcher: Split data_table The old code was a hack to improve the performance. The new code is clearer and results in the same binary when compiling with gcc 8.3.0, so it looks like the old hack is no longer needed with modern compilers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-24 08:15:40 +01:00
Shreeshrii	8749f3553e	LINEDATA=false	2019-03-23 19:16:49 +05:30
Shree	bcb7cf9846	sort arguments, use true/false instead of 1/0	2019-03-23 12:28:53 +00:00
Shree	c2db272134	Modify distort_image for Boolean	2019-03-22 17:02:46 +00:00
Shree	259d5af6b1	Add PSM values to the definition	2019-03-22 15:29:02 +00:00
Shree	8eafec0d17	Fix comments with current values of PSM codes	2019-03-22 14:10:49 +00:00
Stefan Weil	e1e56d9d66	Remove local function declarations from intmatcher.h This requires moving the local function HeapSort to the beginning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:39:39 +01:00
Stefan Weil	2ba194ca8d	Remove four unused parameters This fixes some compiler warnings: src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter] src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter] src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:30:24 +01:00
Stefan Weil	dd79d56e9f	Remove unused parameter BlobLength This fixes two compiler warnings: src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-22 11:17:19 +01:00
Shree	9b915d5efb	add --distort_image	2019-03-22 05:39:38 +00:00
Shree	f7ffde99d5	add --distort_image	2019-03-22 05:34:00 +00:00
zdenop	ac7ea4322a	Merge pull request #2335 from Shreeshrii/master Changes to tesstrain.py - max_workers=8, distort_image=false	2019-03-17 15:27:34 +01:00
zdenop	26877ba703	check min. python version; os.uname is not available on windows	2019-03-17 15:25:48 +01:00
Shreeshrii	f8e8521606	Update tesstrain_utils.py	2019-03-17 15:32:35 +05:30
Shree	6fa8e1bb15	Set max_workers=8	2019-03-17 09:58:11 +00:00
Shree	e21499e81e	Set default value for distort_image	2019-03-17 09:54:16 +00:00
Stefan Weil	ee2f9bf7bf	Remove old comments in file headers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-16 10:55:00 +01:00
Shree	d47b0d588a	Use LATIN_FONTS for kmr	2019-03-15 15:47:56 +00:00
Shree	3eee1d217a	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 15:37:49 +00:00
Robert Schubert	297d7d86ce	trying to add user words/patterns again: - pass in ParamsVectors from Tesseract (carrying values from langdata/config/api) into LSTMRecognizer::Load and LoadDictionary - after LSTMRecognizer's Dict is initialised (with default values), reset the variables user_{words,patterns}_{suffix,file} from the corresponding entries in the passed vector	2019-03-15 16:06:19 +01:00
Shree	b2ebf0195f	Add kmr and kur_ara, remove kur from training scripts	2019-03-15 14:39:39 +00:00
Shree	37befdf6c4	Add option for --distort_image	2019-03-15 13:32:36 +00:00
zdenop	0a36b38169	Merge pull request #2317 from eighttails/master Added missing linker flags for MinGW.	2019-03-15 08:01:21 +01:00
Robert Schubert	14346e56b0	tesstrain: catch+handle SIGINT (to stop waiting on subjobs)	2019-03-15 00:03:16 +01:00
Robert Schubert	6cbad17e30	tesstrain: check all subjobs' retval	2019-03-14 14:38:51 +01:00
Robert Schubert	5316bcbb94	tesstrain: check failure of subjobs	2019-03-14 11:42:01 +01:00
Stefan Weil	4c2bbebecc	Fix compiler warning (-Wunused-value) Warning from clang++: ..\src\ccmain\ltrresultiterator.cpp(454,8): warning: expression result unused [-Wunused-value] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:56:03 +01:00
Stefan Weil	ed84ba0a44	Fix wrong comparison symbol_steps is a vector, so testing for a nullptr was wrong. clang++ reports: ..\src\ccmain\ltrresultiterator.cpp(440,19): warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare] if (&word_res_->symbol_steps == nullptr \|\| !LSTM_mode_) return nullptr; ~~~~~~~~~~~^~~~~~~~~~~~ ~~~~~~~ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-13 20:38:38 +01:00
Tadahito Yao	bbbd262a8d	Added missing linker flags for MinGW.	2019-03-13 22:10:36 +09:00
jm server2	1206362d30	`accumulated_timesteps` is not a pointer but a vector and in case we use ChoiceIterator without `lstm_choice_mode` tesseract crashes (or similar) because the check is true and we reference not existing item	2019-03-13 12:55:14 +01:00
Stefan Weil	3baf0d8076	Fix boolean assignments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 15:34:24 +01:00
Stefan Weil	8ad0489f0f	Remove svpaint.cpp from libtesseract svpaint is a standalone application (it includes a main function) and should not be part of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 12:22:53 +01:00
zdenop	7546a01020	Merge pull request #2310 from noahmetzger/LSTMChoiceRIL Lstm choice ril	2019-03-12 10:46:11 +01:00
Stefan Weil	35a999f91a	Fix assertion caused by wrong unicharset Credit to OSS-Fuzz: it found another case which triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 This is the OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:31:21 +01:00
Stefan Weil	56a39bda77	Fix float division by zero That runtime error is normally not visible because it does not abort the program, but is detected when the code was compiled with sanitizers. It can be triggered with this OSS-Fuzz testcase: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 09:28:16 +01:00
Noah Metzger	5b3e2fe812	Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-12 09:15:10 +01:00
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
zdenop	f80085c0bf	Merge pull request #2289 from Armyke/master Added an additional optional --tmp_dir parameter to specify the tempo…	2019-03-06 15:03:14 +01:00
Stefan Weil	1c7e00611b	Add initial support for traineddata files in standard archive formats This requires libarchive-dev. Tesseract can now load traineddata files in any of the archive formats which are supported by libarchive. Example of a zipped BagIt archive: $ unzip -l /usr/local/share/tessdata/zip.traineddata Archive: /usr/local/share/tessdata/zip.traineddata Length Date Time Name --------- ---------- ----- ---- 55 2019-03-05 15:27 bagit.txt 0 2019-03-05 15:25 data/ 1557 2019-03-05 15:28 manifest-sha256.txt 1082890 2019-03-05 15:25 data/eng.word-dawg 1487588 2019-03-05 15:25 data/eng.lstm 7477 2019-03-05 15:25 data/eng.unicharset 63346 2019-03-05 15:25 data/eng.shapetable 976552 2019-03-05 15:25 data/eng.inttemp 13408 2019-03-05 15:25 data/eng.normproto 4322 2019-03-05 15:25 data/eng.punc-dawg 4738 2019-03-05 15:25 data/eng.lstm-number-dawg 1410 2019-03-05 15:25 data/eng.freq-dawg 844 2019-03-05 15:25 data/eng.pffmtable 6360 2019-03-05 15:25 data/eng.lstm-unicharset 1012 2019-03-05 15:25 data/eng.lstm-recoder 1047 2019-03-05 15:25 data/eng.unicharambigs 4322 2019-03-05 15:25 data/eng.lstm-punc-dawg 16109842 2019-03-05 15:25 data/eng.bigram-dawg 80 2019-03-05 15:25 data/eng.version 6426 2019-03-05 15:25 data/eng.number-dawg 3694794 2019-03-05 15:25 data/eng.lstm-word-dawg --------- ------- 23468070 21 files `combine_tessdata -d` and `combine_tessdata -u` also work. The traineddata files in the new format can be generated with standard tools like zip or tar. More work is needed for other training tools and big endian support. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-05 17:18:48 +01:00
Armyke	56b04d4ea7	Added the same --tmp_dir flag to tesstrain_utils.sh	2019-03-04 14:05:25 +00:00
Armyke	25fa392887	Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive	2019-03-04 13:26:53 +00:00
Stefan Weil	7fbde96a04	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:26:07 +01:00
Stefan Weil	38fac625cd	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:01:48 +01:00
Shree	a0202bac70	Rename function to TessBaseAPIGetTsvText to be consistent to the Create method	2019-03-02 16:29:53 +00:00
zdenop	5de2a21b3f	Merge pull request #2283 from Shreeshrii/lstmbox Add missing renderers to C-API	2019-03-02 15:15:34 +01:00
Stefan Weil	9c90894ff0	PAGE_RES_IT: Optimize compare operators by using inline code Avoiding a function call will make both == and != operator faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:57:16 +01:00
Stefan Weil	295996ed05	commandlineflags: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:21:04 +01:00
Stefan Weil	eb14726aac	ICOORD: Fix old type casts This fixes compiler warnings and avoids unnecessary conversions between float and double. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	fb0f1bcf66	BoxChar: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	0e1a1fc3cf	Validator: Fix compiler warnings (signed/unsigned) This also fixes a regression in validate_grapheme_test introduced by commit `32e9d7c8f5`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 13:05:03 +01:00
Shree	c7e8131efc	Add TSV option to C-API	2019-03-02 09:50:54 +00:00
Shree	22c099348b	rename LSTMBOX to LSTMBox	2019-03-02 09:11:47 +00:00
zdenop	2ba8e0061a	Merge branch 'master' into mya	2019-03-01 18:37:24 +01:00
Shree	c33f03e33e	Add lstmboxand wordstrbox to capi.h	2019-03-01 17:16:59 +00:00
Shree	76ec21df3d	Add lstmbox and wordstrbox to C-API	2019-03-01 16:40:41 +00:00
zdenop	646b043d2c	use space instead of tab	2019-03-01 14:36:09 +01:00
Shree	5ee1deaea2	correct handling of 0BF0-0BFA Tamil numbers and symbols	2019-03-01 13:21:49 +00:00
zdenop	d7ddc4c5b7	Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER Treat U_ARABIC_NUMBER as LTR	2019-02-28 09:27:54 +01:00
zdenop	12c1225a5f	Merge pull request #2271 from stweil/refactor Refactor class Network	2019-02-27 07:43:13 +01:00
Michal Čihař	14c4494f42	Allow UTF-8 variant of C locale It behaves same in scanf, but it allows proper handling of unicode chars.	2019-02-26 21:37:33 +01:00
Stefan Weil	98dd3b6351	Refactor class Network That class is an abstract class with several pure virtual functions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-02-26 16:55:31 +01:00
Shree	25b02bf1f2	Treat U_ARABIC_NUMBER as LTR	2019-02-26 09:51:21 +00:00
Shreeshrii	2f71fe280c	Use alternative way to comment a block of code (using the c preprocessor). https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382 Thanks @amitdo	2019-02-26 15:05:51 +05:30
Shree	449f1cd4ba	Remove test for Word started with a combiner	2019-02-25 18:47:42 +00:00

... 7 8 9 10 11 ...

1487 Commits