tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-13 07:59:04 +08:00

Author	SHA1	Message	Date
amitdo	dfede8ac01	Move all public headers to include/tesseract	2019-10-28 18:50:31 +02:00
zdenop	cede5b34e7	Add pageseg_apply_music_mask option to allow disabling the musi… (#2732 ) Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-27 17:02:05 +01:00
zdenop	4a37cde0d9	fix inverting (Bilevel BW png) in pdf; fixes # 2059	2019-10-27 14:15:12 +01:00
Nat	52bc15acd9	Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-24 11:44:05 -05:00
Egor Pugin	c727b556f0	Remove unneeded TESS_API from source file.	2019-10-23 13:26:46 +03:00
Egor Pugin	e2688c39e9	Remove TESS_CALL.	2019-10-23 13:21:59 +03:00
wshwang	4ee95a615a	src/ccutil/bits16.h remove warnings (#2726 )	2019-10-23 11:46:24 +02:00
wshwang	71e291bae5	Remove warning C4312	2019-10-22 13:06:44 +02:00
zdenop	fc629eae3b	Subject: training: show error description for open/delete file	2019-10-21 16:31:57 +02:00
Stefan Weil	90bcff3732	Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-21 13:12:36 +02:00
Stefan Weil	a209a6b4b5	Copy resolution of source image (fix issue #1702 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-20 20:45:35 +02:00
zdenop	36dc2ccf75	fix memory leak at PangoFontInfo::CanRenderString	2019-10-20 16:43:04 +02:00
zdenop	1ec34378d9	test for synthesized font faces.	2019-10-19 15:05:28 +02:00
zdenop	cbbe45d94b	cmake: add minimum required version for pango and icu based on autotools	2019-10-19 15:00:49 +02:00
zdenop	37c7a5dd82	text2image: show pango version	2019-10-19 14:52:06 +02:00
Stefan Weil	73a38b39d5	quadlsq: Fix warnings from LGTM Fix two occurrences of this LGTM warning: Multiplication result may overflow 'double' before it is converted to 'long double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 12:07:54 +02:00
Stefan Weil	22cf0f854d	Use "C" locale for PDF output This fixes wrong output of integers with locale de_DE.UTF-8: - /Width 2.481 - /Height 3.508 + /Width 2481 + /Height 3508 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:30:42 +02:00
Stefan Weil	914a8e40d6	Use "C" locale for ALTO output This fixes wrong output of integers with locale de_DE.UTF-8: - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0"> + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0"> Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:18:27 +02:00
Stefan Weil	3e8cc203f4	Fix build error (undefined local variable) The latest commit `96025c7923` was incomplete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:05:31 +02:00
Stefan Weil	96025c7923	Remove unimplemented +/- for parameter files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-17 17:14:43 +02:00
zdenop	a3cfd66f37	do not exit if not existing parameter is used. fixes #1334	2019-10-15 07:56:22 +02:00
zdenop	0150fc57cc	Report when tesseract legacy engine not present. (fix issue #2053 )	2019-10-14 22:55:47 +02:00
Stefan Weil	a1e3150bd7	Add new parameter "document_title" to set the title in OCR output files The title can be set for hOCR and PDF output. Currently it is also used for ALTO, so setting the title can be used as a workaround for issue #2700. The constant unknown_title_ is no longer needed and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-10 15:42:52 +02:00
Stefan Weil	7a7704bc94	Extend function BoxFileName to handle more common image names The function derives the file name for the .box file from an image name. For training from existing line images, it is useful to directly support the image names which are commonly used. While generated images for Tesseract training typically use the name pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized or NAME.nrm.png for grayscale images. BoxFileName is also now a local function as it is only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-05 15:59:56 +02:00
jm	fb150265ef	speed optimisation - add the option to disable automatic inverting of line images	2019-10-04 10:09:52 +02:00
Stefan Weil	6b35d6ff6e	Fix comment which referred to unused Tesseract parameter This completes commit `aa2ab68e29`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-03 09:23:25 +02:00
Johannes Künsebeck	aa2ab68e29	Removed unused parameters The following parameters are not used anywhere anymore: * use_definite_ambigs_for_classifier * max_viterbi_list_size * word_to_debug_lengths * fragments_debug * tessedit_redo_xheight * debug_acceptable_wds * tessedit_matcher_log * tessedit_test_adaption_mode * docqual_excuse_outline_errs * crunch_pot_garbage * suspect_space_level * tessedit_consistent_reps * wordrec_display_all_words * wordrec_no_block * wordrec_worst_state * fragments_guide_chopper * segment_adjust_debug * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists) * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists) * classify_min_norm_scale_x * classify_max_norm_scale_x * classify_min_norm_scale_y * classify_max_norm_scale_y * il1_adaption_test * textord_blob_size_bigile * textord_blob_size_smallile * editor_debug_config_file * textord_tabfind_show_color_fit The list was generated by a python script and each parameter occurence checked manually.	2019-10-03 09:18:29 +02:00
Stefan Weil	1e84a6f225	Don't create OCR result files when training data is created The configuration file lstm.train causes Tesseract to generate training data for training of an LSTM line recognizer. In this mode, no other files with OCR results should be written. Without this patch, Tesseract writes a small text file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-02 19:29:27 +02:00
Stefan Weil	286d8275c7	Add support for image or image list by URL This allows OCR of images from the internet without downloading them first: tesseract http://IMAGE_URL OUTPUT ... It uses libcurl. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-01 12:10:45 +02:00
Stefan Weil	47d70d7014	Modernize code for LIST (fix some -Wold-style-cast warnings) - Use C++ type casts - Remove unneeded type cast - Simplify code for function pop - Remove macro push_on (it was only used once) This fixes lots of compiler warnings caused by old type casts.	2019-10-01 11:12:00 +02:00
Stefan Weil	672d67859f	mfoutline: Modernize code - Use C++ enums - Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT - Use float constant for MF_SCALE_FACTOR - Replace macros by inline functions - Fix documentation comment This fixes several warnings from clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 21:33:15 +02:00
Stefan Weil	7ec5f0ca02	intmatcher: Avoid conversion from double to float and vice versa This fixes some clang warnings: src/classify/intmatcher.cpp:48:49: warning: implicit conversion loses floating-point precision: 'double' to 'const float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:34: warning: implicit conversion loses floating-point precision: 'double' to 'float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:64: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 18:05:26 +02:00
Stefan Weil	6d259ebe44	Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare) This fixes a clang warning: src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of unsigned enum expression >= 0 is always true [-Wtautological-unsigned-enum-zero-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-29 22:13:27 +02:00
Stefan Weil	49e351508c	Re-add strngs.h to public API It is still needed. This partially reverts commit `a730b5c4ff`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 10:34:48 +02:00
Stefan Weil	8ad86d6494	Add missing linker flags for TensorFlow Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 09:42:37 +02:00
zdenop	d6aa866430	ignore #pragma optimize for clang-cl	2019-09-27 21:19:37 +02:00
Stefan Weil	74d5ce82a6	Remove vecfuncs.cpp and vecfunc.h Replace the macros which were declared in vecfuncs.h by member functions and move a function which was only used in chop.cpp to that file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 21:20:03 +02:00
Stefan Weil	7bddad59d1	Optimize class ChoiceIterator Re-order a class variable to avoid memory holes and remove unused class variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 09:43:57 +02:00
Noah Metzger	ff4c1d204d	Fixed minor bug with the Choice iterator when lstm_choice_mode is not active. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-24 15:38:28 +02:00
Stefan Weil	994ec697d8	Remove member functions STRING::string and StringParam::string They were redundant because there exist member functions 'c_str' which do the same. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-23 08:33:08 +02:00
Egor Pugin	1fa7324cf7	Merge pull request #2668 from stweil/api Remove STRING from the public Tesseract API	2019-09-23 01:02:26 +03:00
amitdo	0598879a00	Disable legacy build: Disable bitvec.h	2019-09-22 20:37:13 +02:00
Stefan Weil	a730b5c4ff	Remove STRING from the public Tesseract API Removing STRING from genericvector.h allows eliminating the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
Stefan Weil	8cb677d6a2	Replace STRING arguments for LoadDataFromFile and SaveDataToFile This is a step to eliminate the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
amitdo	1e13d1d4d5	Disable legacy build: Disable more unneeded code	2019-09-22 20:55:24 +03:00
zdenop	39a63c2837	Merge pull request #2663 from bertsky/fix-lstm-user-patterns fix langdata (user words/patterns) file suffixes for LSTMs:	2019-09-20 15:32:54 +02:00
Stefan Weil	0c7cc5a4dd	Fix CID 1405673 part 2 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-19 19:37:05 +02:00
Robert Schubert	5b976bfb55	fix langdata (user words/patterns) file suffixes for LSTMs: - add another constructor for LSTMRecognizer which takes the language_data_path_prefix configured/selected at runtime and passes it to the internal CCUtil - use this in Tesseract::init_tesseract_lang_data when LSTMs are available (this was missing from `297d7d86ce`)	2019-09-19 19:30:54 +02:00
amitdo	479a7b1ca0	Disabled legacy build: Disable more unneeded code	2019-09-19 19:00:13 +03:00
Stefan Weil	3b030b4aeb	Fix CID 1405673 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 22:04:08 +02:00
Stefan Weil	85e8529a2e	Fix CID 1164624 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 21:59:42 +02:00
Stefan Weil	b2999d8190	Fix comment for Textord::make_prop_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 15:03:45 +02:00
Stefan Weil	256701e2e0	Re-order initialisation in constructor of class ViterbiStateEntry This fixes compiler warnings caused by commit `091ce345f6`: src/wordrec/lm_state.h💯7: warning: field 'cost' will be initialized after field 'curr_b' [-Wreorder] src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags' will be initialized after field 'dawg_info' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	081521fb9f	Move initial values for class ColPartition from constructor to header file This fixes compiler warnings caused by commit `5b4565b80b`: src/textord/colpartition.cpp:91:24: warning: field 'last_column_' will be initialized after field 'column_set_' [-Wreorder] src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_' will be initialized after field 'nearest_neighbor_above_' [-Wreorder] src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_' will be initialized after field 'owns_blobs_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	8f66020821	Re-order initialisation in constructors of classes Dawg and DawgPosition This fixes compiler warnings caused by commit `ecf0f2dee5`: src/dict/dawg.h:202:9: warning: field 'type_' will be initialized after field 'lang_' [-Wreorder] src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized after field 'dawg_ref' [-Wreorder] src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized after field 'punc_ref' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	b466cead8e	Add more initial values for class Classify from constructor to header file This fixes compiler warnings caused by commit `751fcd2b11`: src/classify/classify.cpp:176:7: warning: field 'EnableLearning' will be initialized after field 'il1_adaption_test' [-Wreorder] src/classify/classify.cpp:187:7: warning: field 'dict_' will be initialized after field 'static_classifier_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	91b3248af3	Fix CID 1164666 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 22:01:25 +02:00
Stefan Weil	fc6899d898	Fix CID 1164664 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:52:51 +02:00
Stefan Weil	930e11996c	Fix CID 1375402 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:17:12 +02:00
Stefan Weil	408d6e8b72	simd: Check OSXSAVE bit before calling _xgetbv Both checks are needed for AVX, AVX2 and FMA checks. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:35:37 +02:00
Stefan Weil	627faa6f9c	Remove UnicharAmbigs for builds without legacy code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:11:30 +02:00
amitdo	2134cd7867	Disabled legacy engine build: Disable code related to ambigs.	2019-09-15 19:11:30 +02:00
Stefan Weil	0c960c3cc5	Fix 1164647 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 14:25:48 +02:00
amitdo	994596842e	'Disabled leagcy engine' build: don't include unused header	2019-09-15 12:35:36 +03:00
Egor Pugin	6a9584fbc2	Merge pull request #2650 from stweil/cid Fix several issues reported by Coverity Scan	2019-09-14 21:18:37 +03:00
Stefan Weil	763f4781e8	Fix CID 1164662 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:22:56 +02:00
Stefan Weil	6fd58d2897	Fix CID 1164659 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:20:14 +02:00
Stefan Weil	c3500e8d95	Fix CID 1164657 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:11:02 +02:00
Stefan Weil	1d3ee3b2a7	Fix CID 1164649 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:37:00 +02:00
Stefan Weil	bd1083904d	Fix CID 1164648 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:32:29 +02:00
Stefan Weil	80f367c6f4	Fix CID 1164644 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:26:49 +02:00
Stefan Weil	7caded8e6b	Fix CID 1164643 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:24:26 +02:00
Stefan Weil	3127242bcd	Fix CID 1164638 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:18:15 +02:00
Stefan Weil	06de3075e0	Fix CID 1164636 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:13:06 +02:00
Stefan Weil	052f9ca0bc	Fix CID 1164634, CID 1164635 (Uninitialized pointer field) Remove the unused dummy member variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:12:39 +02:00
Stefan Weil	97dda3d535	Fix CID 1386099 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	46f21a4182	Fix CID 1164633 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9ea579bf1b	Fix CID 1164628 ff (Uninitialized pointer field) and optimize class ParamContent Only one of bIt, dIt, iIt and sIt is used, so put all four in a union. This fixes CID 1164628, CID 1164629, CID 1164630 and CID 1164631. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	74b552fc31	Remove unused FeatureEnabled from FEATURE_DEFS_STRUCT Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9f709404f9	Fix CID 1164622 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	5b1f0dbd4b	Fix CID 1164620 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	951f442303	Fix CID 1386105 (Logically dead code) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	64fc205e78	Fix CID 1402767 (Invalid type in argument to printf format specifier) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	f62a895f74	Remove unused italic, bold in class BLOCK_RES and class WORD_RES Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 11:53:58 +02:00
Stefan Weil	ceb8af889e	Fix CID 1340276 (Uninitialized scalar field) for class BLOB_CHOICE xgap_before_ and xgap_after_ are never used, so remove them. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:15:47 +02:00
Stefan Weil	5fdd32bea8	Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch secondary_beam_size_ is set but never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:09:03 +02:00
Stefan Weil	737173a84d	Fix CID 1375401 (Uninitialized scalar field) for class Dawg Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:03:10 +02:00
Stefan Weil	edba74d64f	Fix CID 1400760 (Uninitialized scalar field) for class BLOCK Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:58:05 +02:00
Stefan Weil	8ff321e41a	Fix two issues reported by Coverity Scan and modernize class WERD_RES Report from Coverity Scan: CID 1405560 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member end is not initialized in this constructor nor in any functions that it calls. CID 1405561 [...] Modernize and optimize class WERD_RES. This not only fixes the issues but also reduces the size and eliminates the functions InitNonPointers and InitPointers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:51:36 +02:00
Stefan Weil	ecf0f2dee5	Optimize classes Trie, Dawg and DawgPosition Reduce size from 368 to 352 bytes for Trie, 72 to 64 bytes for Dawg and 40 to 24 bytes for DawgPosition by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 08:15:01 +02:00
Stefan Weil	efd8dea587	Optimize classes CLIST_ITERATOR, ELIST_ITERATOR, ELIST2_ITERATOR Reduce size from 56 to 48 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 22:03:03 +02:00
Stefan Weil	751fcd2b11	Optimize class Classify Reduce size from 138016 to 13000 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 21:46:55 +02:00
Stefan Weil	0ad08a99b0	Optimize class TFile Reduce size from 24 to 16 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:17:05 +02:00
Stefan Weil	5b4565b80b	Optimize class ColPartition Reduce size from 248 to 224 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	5a12273650	Optimize struct LMConsistencyInfo Reduce size from 104 to 96 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	091ce345f6	Optimize class ViterbiStateEntry Reduce size from 232 to 216 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	913cbe6eae	Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit The class no longer uses bit fields. Re-ordering the member variables avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 09:07:48 +02:00
Stefan Weil	a922745d9a	tfnetwork: Fix info text Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-11 19:10:25 +02:00
Stefan Weil	5fa09f184f	RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in recodebeam_test and unicharcompress_test: src/ccutil/unicharcompress.h:84:27: runtime error: left shift of 267 by 28 places cannot be represented in type 'int' code has up to kMaxCodeLen (9) values, so the highest possible value for i is 8, and the shift value can reach 7 * 8 = 56. That requires an uint64_t data type. size_t would fit for 64 bit hosts, but be too small for 32 bit hosts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	4a2d5a2e8d	OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in osd_test and textlineprojection_test: src/ccmain/osdetect.cpp:109:14: runtime error: division by zero Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	5c6fade555	BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix these runtime errors in mastertrainer_test: src/ccutil/bitvector.cpp:119:18: runtime error: null pointer passed as argument 2, which is declared to never be null src/ccutil/bitvector.cpp:124:10: runtime error: null pointer passed as argument 1, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
zdenop	98c7aaa343	Lstm choice ril (#2635 ) Lstm choice ril	2019-09-06 19:12:00 +02:00
Stefan Weil	9f32032517	ccutil: Remove old comments There is no CLIST2 in the current code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-05 17:52:42 +02:00
Stefan Weil	b6933a1082	Use type bool for boolean values in class BLOBNBOX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-03 19:56:59 +02:00
Noah Metzger	c350077b96	Made the lstm_choice mode compatible with the hocr_char_boxes mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:54 +02:00
Noah Metzger	e8b9c10d07	Clean up lstm_choice_mode and cut it down to 2 modes instead of 4 Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:53 +02:00
Stefan Weil	fdf4067296	Fix warnings from LGTM This fixes three LGTM warnings: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 22:04:24 +02:00
Stefan Weil	dc90741f1b	Fix crash when function lookup tables are accessed with NaN Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 13:42:09 +02:00
Stefan Weil	7968f50fe6	capi: Add missing PSM_RAW_LINE to TessPageSegMode Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-25 09:08:09 +02:00
zdenop	0ded672067	fix typo	2019-08-18 18:47:32 +02:00
Stefan Weil	00cff79f7f	simd: Check whether the OS supports FMA, AVX, ... Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-16 22:51:17 +02:00
Stefan Weil	43b2e9513b	lstmtrainer: Fix diagnostic message Signed character values must be converted to unsigned integers for %x. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-15 14:31:32 +02:00
Stefan Weil	100d8cd29b	lstmtester: Add missing space in log messages Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-14 14:12:47 +02:00
Stefan Weil	a86251c62b	classify/Makefile: Fix inconsistent style Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 21:35:59 +02:00
Egor Pugin	423a188513	Export some classify vars.	2019-08-13 20:12:21 +03:00
Stefan Weil	46e2a0f106	Remove more code for builds with disabled legacy engine Now the Tesseract library no longer includes unused code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 17:49:10 +02:00
Egor Pugin	73f713519c	Merge pull request #2614 from stweil/training Move source files which are used for training only to src/training	2019-08-12 19:35:50 +03:00
Stefan Weil	e84cb24def	Move source files which are used for training only to src/training They are moved from src/classify and src/lstm to src/training. This reduces the size of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 17:08:08 +02:00
Stefan Weil	ba17bc8204	OpenCL: Add static attribute for kernel_src It is only used in openclwrapper.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:13:45 +02:00
Stefan Weil	970622fbd1	Remove unused functions create_edges_window, draw_raw_edge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:04:10 +02:00
Stefan Weil	23e605911f	Remove unused function truncate_path and related files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:48:56 +02:00
Stefan Weil	bce585286d	Remove global array kPolyBlockNames from Tesseract library It is only used in unittest/layout_test.cc after moving a test from baseapi_test.cc to that file, so it can be made local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:33:55 +02:00
Stefan Weil	beec85e023	Remove UNICHARSET::load_from_inmemory_file and related code The method was only used in unittest where it can be replaced by UNICHARSET::load_from_file which also simplifies the code. This allows removing the class InMemoryFilePointer and fixes a TODO. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 13:07:15 +02:00
Stefan Weil	315dd9df3f	cmake: Don't link pthread on Windows Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-07 15:24:00 +02:00
Stefan Weil	b8079d8ce1	universalambigs: Add hack to fix builds with Microsoft compiler The MS compiler only accepts string constants up to 65535 characters, so shorten the string for that compiler to fix the compilation. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-06 15:46:07 +02:00
Zdenko Podobný	c5a50b93ce	move fileio.cpp and fileio.h to training (this fix android build)	2019-08-04 21:26:39 +02:00
Stefan Weil	6acab45837	universalambigs: Replace octal characters by UTF-8 string This improves readability and reduces the file size. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	8127b4dd27	Clean ambigs.h * Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator * Move some declarations to ambigs.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	23ef93ac4d	cmake: Add missing pthread library It is needed for C++ threads since commit `85068be405`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-26 07:45:51 +02:00
Stefan Weil	e6ca7f3ec6	hocrrenderer: Add missing escaping of special characters in HTML output This converts special character like '<' or '>' to the correct HTML entities. Optimize also the code a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:53:36 +02:00
Stefan Weil	2679cae5d8	Simplify code by using ClipToRange Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:37:39 +02:00
Stefan Weil	4b2927ae41	LSTMRecognizer: Add non const get functions This allows removing several const casts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:26:51 +02:00
Stefan Weil	4cb3f34c09	Improve formatting of hOCR output with character boxes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:07:18 +02:00
Stefan Weil	9195a904a7	Use auto data type for results of std::ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:56:17 +02:00
Stefan Weil	4132194c49	Remove unused filesize_ from class InputBuffer This also simplifies the constructors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:48:27 +02:00
Stefan Weil	a2b13b49ff	Simplify shell code (fixes warning from Codacy) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:33:24 +02:00
Stefan Weil	d4e0ab3014	Use long instead of off_t for result from ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:14:42 +02:00
Stefan Weil	467f8f4140	Fix training script for macOS (issue #2578 ) Bash on macOS does not support "\|&": tesstrain_utils.sh: line 80: syntax error near unexpected token `&' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 17:18:44 +02:00
Stefan Weil	f92181561c	Fix some compiler warnings (unused local variables) gcc warnings: src/classify/protos.cpp:85:7: warning: unused variable ‘i’ [-Wunused-variable] src/classify/protos.cpp:86:7: warning: unused variable ‘Bit’ [-Wunused-variable] src/classify/protos.cpp:89:14: warning: unused variable ‘Config’ [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 07:47:28 +02:00
Stefan Weil	a419f2d78b	Modernize BIT_VECTOR a little bit This removes one more user of Emalloc / Efree. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 22:09:08 +02:00
zdenop	c8374cc528	Merge pull request #2576 from noahmetzger/LSTMChoiceRIL Implemented improved character bounding box algorithm	2019-07-16 12:25:17 +02:00
zdenop	f4925077e8	Merge pull request #2574 from stweil/fix classify: Use fixed size bit vector	2019-07-16 12:22:48 +02:00
zdenop	cb5c78be7d	Merge pull request #2572 from adaptech-cz/wordBoundsOn2ndPass Give word's bounds to callback also during second pass	2019-07-16 12:19:31 +02:00
Noah Metzger	3a5e508934	Implemented improved bounding box algorithm Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-16 11:38:50 +02:00
Stefan Weil	028fff6edd	classify: Use fixed size bit vector The vector was already limited to MAX_NUM_PROTOS (512) entries or 64 bytes in the old code. Now it uses that size right from the start which avoids reallocating it later when entries are added. The old code which reallocated the vector to expand it was buggy because the realloc function can return a different pointer, but the code still used the original pointer to reset the new bits. Function ExpandBitVector is now unused and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 10:18:11 +02:00
Robert Pösel	f99fcd7691	Give word's bounds to callback also during second pass	2019-07-16 09:11:06 +02:00
Stefan Weil	5bbb7f59a6	Remove structures.* It only provided the functions new_cell, free_cell which could be replaced by new, delete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	3621272051	Remove cutil_class.* It is no longer needed since commit `4523ce9f7d`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	ea462b2c03	Remove unused functions reverse16, reverse32 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 21:50:46 +02:00
Stefan Weil	c8cb925813	Remove non portable sleep by std::this_thread::sleep_for Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 16:00:07 +02:00

1 2 3 4 5 ...

1215 Commits