tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-04 18:29:06 +08:00

Author	SHA1	Message	Date
Stefan Weil	0cd2bdbd2b	automake: Flat build for src/textord Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	558462358a	automake: Flat build for src/opencl Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	6eeb486b77	automake: Flat build for src/lstm Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	7ebcc77e3b	automake: Flat build for src/dict Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	6181acf367	automake: Flat build for src/cutil Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	159160518b	automake: Flat build for src/classify Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	9730c7e167	automake: Flat build for src/ccutil Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	b1d449315e	automake: Flat build for src/ccstruct Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	9745a9d111	automake: Flat build for src/ccmain Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	a166efaad6	automake: Flat build for src/arch Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	cafb1bbfd7	automake: Flat build for src/api Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Martin Malmsten	9ed3887432	Added ComposedBlock level to Alto output	2019-11-17 21:06:12 +01:00
zdenop	2d6f38eebf	fix using bilevel tiff in pdf output	2019-11-10 16:11:52 +01:00
Shreeshrii	99dfa8a680	Add separator and training_iteration to checkpoint name (#2752 ) * Add separator and training_iteration to checkpoint name * specify modelname_N.NN_NN_NN.checkpoint for intermediate checkpoint	2019-11-09 12:22:40 +01:00
Stefan Weil	ac46b286a4	Fix issue #2748 Commit `94d0f77f56` tried to fix issue #2741 but created a new problem. This commit should fix both old and new issue. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-08 17:12:20 +01:00
Stefan Weil	0406f7706d	Use BRT_UNKNOWN instead of BRT_NOISE to initialize ColPartition::blob_type_ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-08 07:40:06 +01:00
Stefan Weil	9b46a67efa	Use "C" locale for printing parameters This fixes a test for the Python wrapper `tesserocr` (python setup.py test). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-04 19:21:20 +01:00
Egor Pugin	ab836dbb31	Merge pull request #2743 from DavidMaung/master Exposed the text2image option --ptsize to tesstrain.sh.	2019-11-02 17:09:51 +03:00
Stefan Weil	a306cd7370	Fail if no valid lstmf file was written (fix issue #2741 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-01 21:52:45 +01:00
Stefan Weil	94d0f77f56	Don't create an empty lstmf file If Tesseract cannot find text in the input image, it should not write an empty lstmf file. This problem was reported in issue #2741. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-01 21:43:26 +01:00
maungd@battelle.org	3d7afb69ea	Exposed the text2image option --ptsize to tesstrain.sh. Text2image has the option --ptsize which defaults to 12. This option is not exposed through tesstrain.sh; thus, you cannot use tesstrain.sh to explore training with different font sizes. I made a small modification to expose the --ptsize option to tesstrain.sh. It defaults to 12 if not specified.	2019-11-01 15:10:58 -04:00
Stefan Weil	b5498c70fa	Use pre-calculated lookup tables for all C++ compilers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-31 20:26:01 +01:00
Egor Pugin	2bcc9d8093	Remove cppan build.	2019-10-30 21:37:38 +03:00
Stefan Weil	ca87b06d59	Fix build for Intel Compiler (issue #2736 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-30 10:09:44 +01:00
Stefan Weil	20a50e9bcb	Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-30 10:06:31 +01:00
Egor Pugin	2a37f5dd62	Update includes to use <>.	2019-10-29 14:50:11 +03:00
Egor Pugin	9e324938ab	Update includes to use <>.	2019-10-29 14:31:38 +03:00
Stefan Weil	629b05d978	Update README.md and other documentation for new include file structure Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-29 12:26:41 +01:00
amitdo	2f8884a64e	Fix autotools build	2019-10-28 21:23:58 +02:00
amitdo	e1bae15547	Fix #include path of public headers	2019-10-28 19:10:30 +02:00
amitdo	dfede8ac01	Move all public headers to include/tesseract	2019-10-28 18:50:31 +02:00
zdenop	cede5b34e7	Add pageseg_apply_music_mask option to allow disabling the musi… (#2732 ) Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-27 17:02:05 +01:00
zdenop	4a37cde0d9	fix inverting (Bilevel BW png) in pdf; fixes # 2059	2019-10-27 14:15:12 +01:00
Nat	52bc15acd9	Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-24 11:44:05 -05:00
Egor Pugin	c727b556f0	Remove unneeded TESS_API from source file.	2019-10-23 13:26:46 +03:00
Egor Pugin	e2688c39e9	Remove TESS_CALL.	2019-10-23 13:21:59 +03:00
wshwang	4ee95a615a	src/ccutil/bits16.h remove warnings (#2726 )	2019-10-23 11:46:24 +02:00
wshwang	71e291bae5	Remove warning C4312	2019-10-22 13:06:44 +02:00
zdenop	fc629eae3b	Subject: training: show error description for open/delete file	2019-10-21 16:31:57 +02:00
Stefan Weil	90bcff3732	Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-21 13:12:36 +02:00
Stefan Weil	a209a6b4b5	Copy resolution of source image (fix issue #1702 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-20 20:45:35 +02:00
zdenop	36dc2ccf75	fix memory leak at PangoFontInfo::CanRenderString	2019-10-20 16:43:04 +02:00
zdenop	1ec34378d9	test for synthesized font faces.	2019-10-19 15:05:28 +02:00
zdenop	cbbe45d94b	cmake: add minimum required version for pango and icu based on autotools	2019-10-19 15:00:49 +02:00
zdenop	37c7a5dd82	text2image: show pango version	2019-10-19 14:52:06 +02:00
Stefan Weil	73a38b39d5	quadlsq: Fix warnings from LGTM Fix two occurrences of this LGTM warning: Multiplication result may overflow 'double' before it is converted to 'long double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 12:07:54 +02:00
Stefan Weil	22cf0f854d	Use "C" locale for PDF output This fixes wrong output of integers with locale de_DE.UTF-8: - /Width 2.481 - /Height 3.508 + /Width 2481 + /Height 3508 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:30:42 +02:00
Stefan Weil	914a8e40d6	Use "C" locale for ALTO output This fixes wrong output of integers with locale de_DE.UTF-8: - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0"> + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0"> Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:18:27 +02:00
Stefan Weil	3e8cc203f4	Fix build error (undefined local variable) The latest commit `96025c7923` was incomplete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:05:31 +02:00
Stefan Weil	96025c7923	Remove unimplemented +/- for parameter files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-17 17:14:43 +02:00
zdenop	a3cfd66f37	do not exit if not existing parameter is used. fixes #1334	2019-10-15 07:56:22 +02:00
zdenop	0150fc57cc	Report when tesseract legacy engine not present. (fix issue #2053 )	2019-10-14 22:55:47 +02:00
Stefan Weil	a1e3150bd7	Add new parameter "document_title" to set the title in OCR output files The title can be set for hOCR and PDF output. Currently it is also used for ALTO, so setting the title can be used as a workaround for issue #2700. The constant unknown_title_ is no longer needed and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-10 15:42:52 +02:00
Stefan Weil	7a7704bc94	Extend function BoxFileName to handle more common image names The function derives the file name for the .box file from an image name. For training from existing line images, it is useful to directly support the image names which are commonly used. While generated images for Tesseract training typically use the name pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized or NAME.nrm.png for grayscale images. BoxFileName is also now a local function as it is only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-05 15:59:56 +02:00
jm	fb150265ef	speed optimisation - add the option to disable automatic inverting of line images	2019-10-04 10:09:52 +02:00
Stefan Weil	6b35d6ff6e	Fix comment which referred to unused Tesseract parameter This completes commit `aa2ab68e29`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-03 09:23:25 +02:00
Johannes Künsebeck	aa2ab68e29	Removed unused parameters The following parameters are not used anywhere anymore: * use_definite_ambigs_for_classifier * max_viterbi_list_size * word_to_debug_lengths * fragments_debug * tessedit_redo_xheight * debug_acceptable_wds * tessedit_matcher_log * tessedit_test_adaption_mode * docqual_excuse_outline_errs * crunch_pot_garbage * suspect_space_level * tessedit_consistent_reps * wordrec_display_all_words * wordrec_no_block * wordrec_worst_state * fragments_guide_chopper * segment_adjust_debug * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists) * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists) * classify_min_norm_scale_x * classify_max_norm_scale_x * classify_min_norm_scale_y * classify_max_norm_scale_y * il1_adaption_test * textord_blob_size_bigile * textord_blob_size_smallile * editor_debug_config_file * textord_tabfind_show_color_fit The list was generated by a python script and each parameter occurence checked manually.	2019-10-03 09:18:29 +02:00
Stefan Weil	1e84a6f225	Don't create OCR result files when training data is created The configuration file lstm.train causes Tesseract to generate training data for training of an LSTM line recognizer. In this mode, no other files with OCR results should be written. Without this patch, Tesseract writes a small text file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-02 19:29:27 +02:00
Stefan Weil	286d8275c7	Add support for image or image list by URL This allows OCR of images from the internet without downloading them first: tesseract http://IMAGE_URL OUTPUT ... It uses libcurl. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-01 12:10:45 +02:00
Stefan Weil	47d70d7014	Modernize code for LIST (fix some -Wold-style-cast warnings) - Use C++ type casts - Remove unneeded type cast - Simplify code for function pop - Remove macro push_on (it was only used once) This fixes lots of compiler warnings caused by old type casts.	2019-10-01 11:12:00 +02:00
Stefan Weil	672d67859f	mfoutline: Modernize code - Use C++ enums - Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT - Use float constant for MF_SCALE_FACTOR - Replace macros by inline functions - Fix documentation comment This fixes several warnings from clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 21:33:15 +02:00
Stefan Weil	7ec5f0ca02	intmatcher: Avoid conversion from double to float and vice versa This fixes some clang warnings: src/classify/intmatcher.cpp:48:49: warning: implicit conversion loses floating-point precision: 'double' to 'const float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:34: warning: implicit conversion loses floating-point precision: 'double' to 'float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:64: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 18:05:26 +02:00
Stefan Weil	6d259ebe44	Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare) This fixes a clang warning: src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of unsigned enum expression >= 0 is always true [-Wtautological-unsigned-enum-zero-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-29 22:13:27 +02:00
Stefan Weil	49e351508c	Re-add strngs.h to public API It is still needed. This partially reverts commit `a730b5c4ff`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 10:34:48 +02:00
Stefan Weil	8ad86d6494	Add missing linker flags for TensorFlow Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 09:42:37 +02:00
zdenop	d6aa866430	ignore #pragma optimize for clang-cl	2019-09-27 21:19:37 +02:00
Stefan Weil	74d5ce82a6	Remove vecfuncs.cpp and vecfunc.h Replace the macros which were declared in vecfuncs.h by member functions and move a function which was only used in chop.cpp to that file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 21:20:03 +02:00
Stefan Weil	7bddad59d1	Optimize class ChoiceIterator Re-order a class variable to avoid memory holes and remove unused class variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 09:43:57 +02:00
Noah Metzger	ff4c1d204d	Fixed minor bug with the Choice iterator when lstm_choice_mode is not active. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-24 15:38:28 +02:00
Stefan Weil	994ec697d8	Remove member functions STRING::string and StringParam::string They were redundant because there exist member functions 'c_str' which do the same. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-23 08:33:08 +02:00
Egor Pugin	1fa7324cf7	Merge pull request #2668 from stweil/api Remove STRING from the public Tesseract API	2019-09-23 01:02:26 +03:00
amitdo	0598879a00	Disable legacy build: Disable bitvec.h	2019-09-22 20:37:13 +02:00
Stefan Weil	a730b5c4ff	Remove STRING from the public Tesseract API Removing STRING from genericvector.h allows eliminating the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
Stefan Weil	8cb677d6a2	Replace STRING arguments for LoadDataFromFile and SaveDataToFile This is a step to eliminate the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
amitdo	1e13d1d4d5	Disable legacy build: Disable more unneeded code	2019-09-22 20:55:24 +03:00
zdenop	39a63c2837	Merge pull request #2663 from bertsky/fix-lstm-user-patterns fix langdata (user words/patterns) file suffixes for LSTMs:	2019-09-20 15:32:54 +02:00
Stefan Weil	0c7cc5a4dd	Fix CID 1405673 part 2 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-19 19:37:05 +02:00
Robert Schubert	5b976bfb55	fix langdata (user words/patterns) file suffixes for LSTMs: - add another constructor for LSTMRecognizer which takes the language_data_path_prefix configured/selected at runtime and passes it to the internal CCUtil - use this in Tesseract::init_tesseract_lang_data when LSTMs are available (this was missing from `297d7d86ce`)	2019-09-19 19:30:54 +02:00
amitdo	479a7b1ca0	Disabled legacy build: Disable more unneeded code	2019-09-19 19:00:13 +03:00
Stefan Weil	3b030b4aeb	Fix CID 1405673 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 22:04:08 +02:00
Stefan Weil	85e8529a2e	Fix CID 1164624 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 21:59:42 +02:00
Stefan Weil	b2999d8190	Fix comment for Textord::make_prop_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 15:03:45 +02:00
Stefan Weil	256701e2e0	Re-order initialisation in constructor of class ViterbiStateEntry This fixes compiler warnings caused by commit `091ce345f6`: src/wordrec/lm_state.h💯7: warning: field 'cost' will be initialized after field 'curr_b' [-Wreorder] src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags' will be initialized after field 'dawg_info' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	081521fb9f	Move initial values for class ColPartition from constructor to header file This fixes compiler warnings caused by commit `5b4565b80b`: src/textord/colpartition.cpp:91:24: warning: field 'last_column_' will be initialized after field 'column_set_' [-Wreorder] src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_' will be initialized after field 'nearest_neighbor_above_' [-Wreorder] src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_' will be initialized after field 'owns_blobs_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	8f66020821	Re-order initialisation in constructors of classes Dawg and DawgPosition This fixes compiler warnings caused by commit `ecf0f2dee5`: src/dict/dawg.h:202:9: warning: field 'type_' will be initialized after field 'lang_' [-Wreorder] src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized after field 'dawg_ref' [-Wreorder] src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized after field 'punc_ref' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	b466cead8e	Add more initial values for class Classify from constructor to header file This fixes compiler warnings caused by commit `751fcd2b11`: src/classify/classify.cpp:176:7: warning: field 'EnableLearning' will be initialized after field 'il1_adaption_test' [-Wreorder] src/classify/classify.cpp:187:7: warning: field 'dict_' will be initialized after field 'static_classifier_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	91b3248af3	Fix CID 1164666 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 22:01:25 +02:00
Stefan Weil	fc6899d898	Fix CID 1164664 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:52:51 +02:00
Stefan Weil	930e11996c	Fix CID 1375402 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:17:12 +02:00
Stefan Weil	408d6e8b72	simd: Check OSXSAVE bit before calling _xgetbv Both checks are needed for AVX, AVX2 and FMA checks. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:35:37 +02:00
Stefan Weil	627faa6f9c	Remove UnicharAmbigs for builds without legacy code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:11:30 +02:00
amitdo	2134cd7867	Disabled legacy engine build: Disable code related to ambigs.	2019-09-15 19:11:30 +02:00
Stefan Weil	0c960c3cc5	Fix 1164647 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 14:25:48 +02:00
amitdo	994596842e	'Disabled leagcy engine' build: don't include unused header	2019-09-15 12:35:36 +03:00
Egor Pugin	6a9584fbc2	Merge pull request #2650 from stweil/cid Fix several issues reported by Coverity Scan	2019-09-14 21:18:37 +03:00
Stefan Weil	763f4781e8	Fix CID 1164662 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:22:56 +02:00
Stefan Weil	6fd58d2897	Fix CID 1164659 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:20:14 +02:00
Stefan Weil	c3500e8d95	Fix CID 1164657 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:11:02 +02:00
Stefan Weil	1d3ee3b2a7	Fix CID 1164649 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:37:00 +02:00
Stefan Weil	bd1083904d	Fix CID 1164648 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:32:29 +02:00
Stefan Weil	80f367c6f4	Fix CID 1164644 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:26:49 +02:00
Stefan Weil	7caded8e6b	Fix CID 1164643 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:24:26 +02:00
Stefan Weil	3127242bcd	Fix CID 1164638 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:18:15 +02:00
Stefan Weil	06de3075e0	Fix CID 1164636 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:13:06 +02:00
Stefan Weil	052f9ca0bc	Fix CID 1164634, CID 1164635 (Uninitialized pointer field) Remove the unused dummy member variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:12:39 +02:00
Stefan Weil	97dda3d535	Fix CID 1386099 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	46f21a4182	Fix CID 1164633 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9ea579bf1b	Fix CID 1164628 ff (Uninitialized pointer field) and optimize class ParamContent Only one of bIt, dIt, iIt and sIt is used, so put all four in a union. This fixes CID 1164628, CID 1164629, CID 1164630 and CID 1164631. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	74b552fc31	Remove unused FeatureEnabled from FEATURE_DEFS_STRUCT Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9f709404f9	Fix CID 1164622 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	5b1f0dbd4b	Fix CID 1164620 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	951f442303	Fix CID 1386105 (Logically dead code) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	64fc205e78	Fix CID 1402767 (Invalid type in argument to printf format specifier) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	f62a895f74	Remove unused italic, bold in class BLOCK_RES and class WORD_RES Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 11:53:58 +02:00
Stefan Weil	ceb8af889e	Fix CID 1340276 (Uninitialized scalar field) for class BLOB_CHOICE xgap_before_ and xgap_after_ are never used, so remove them. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:15:47 +02:00
Stefan Weil	5fdd32bea8	Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch secondary_beam_size_ is set but never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:09:03 +02:00
Stefan Weil	737173a84d	Fix CID 1375401 (Uninitialized scalar field) for class Dawg Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:03:10 +02:00
Stefan Weil	edba74d64f	Fix CID 1400760 (Uninitialized scalar field) for class BLOCK Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:58:05 +02:00
Stefan Weil	8ff321e41a	Fix two issues reported by Coverity Scan and modernize class WERD_RES Report from Coverity Scan: CID 1405560 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member end is not initialized in this constructor nor in any functions that it calls. CID 1405561 [...] Modernize and optimize class WERD_RES. This not only fixes the issues but also reduces the size and eliminates the functions InitNonPointers and InitPointers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:51:36 +02:00
Stefan Weil	ecf0f2dee5	Optimize classes Trie, Dawg and DawgPosition Reduce size from 368 to 352 bytes for Trie, 72 to 64 bytes for Dawg and 40 to 24 bytes for DawgPosition by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 08:15:01 +02:00
Stefan Weil	efd8dea587	Optimize classes CLIST_ITERATOR, ELIST_ITERATOR, ELIST2_ITERATOR Reduce size from 56 to 48 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 22:03:03 +02:00
Stefan Weil	751fcd2b11	Optimize class Classify Reduce size from 138016 to 13000 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 21:46:55 +02:00
Stefan Weil	0ad08a99b0	Optimize class TFile Reduce size from 24 to 16 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:17:05 +02:00
Stefan Weil	5b4565b80b	Optimize class ColPartition Reduce size from 248 to 224 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	5a12273650	Optimize struct LMConsistencyInfo Reduce size from 104 to 96 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	091ce345f6	Optimize class ViterbiStateEntry Reduce size from 232 to 216 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	913cbe6eae	Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit The class no longer uses bit fields. Re-ordering the member variables avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 09:07:48 +02:00
Stefan Weil	a922745d9a	tfnetwork: Fix info text Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-11 19:10:25 +02:00
Stefan Weil	5fa09f184f	RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in recodebeam_test and unicharcompress_test: src/ccutil/unicharcompress.h:84:27: runtime error: left shift of 267 by 28 places cannot be represented in type 'int' code has up to kMaxCodeLen (9) values, so the highest possible value for i is 8, and the shift value can reach 7 * 8 = 56. That requires an uint64_t data type. size_t would fit for 64 bit hosts, but be too small for 32 bit hosts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	4a2d5a2e8d	OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in osd_test and textlineprojection_test: src/ccmain/osdetect.cpp:109:14: runtime error: division by zero Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	5c6fade555	BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix these runtime errors in mastertrainer_test: src/ccutil/bitvector.cpp:119:18: runtime error: null pointer passed as argument 2, which is declared to never be null src/ccutil/bitvector.cpp:124:10: runtime error: null pointer passed as argument 1, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
zdenop	98c7aaa343	Lstm choice ril (#2635 ) Lstm choice ril	2019-09-06 19:12:00 +02:00
Stefan Weil	9f32032517	ccutil: Remove old comments There is no CLIST2 in the current code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-05 17:52:42 +02:00
Stefan Weil	b6933a1082	Use type bool for boolean values in class BLOBNBOX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-03 19:56:59 +02:00
Noah Metzger	c350077b96	Made the lstm_choice mode compatible with the hocr_char_boxes mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:54 +02:00
Noah Metzger	e8b9c10d07	Clean up lstm_choice_mode and cut it down to 2 modes instead of 4 Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:53 +02:00
Stefan Weil	fdf4067296	Fix warnings from LGTM This fixes three LGTM warnings: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 22:04:24 +02:00
Stefan Weil	dc90741f1b	Fix crash when function lookup tables are accessed with NaN Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 13:42:09 +02:00
Stefan Weil	7968f50fe6	capi: Add missing PSM_RAW_LINE to TessPageSegMode Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-25 09:08:09 +02:00
zdenop	0ded672067	fix typo	2019-08-18 18:47:32 +02:00
Stefan Weil	00cff79f7f	simd: Check whether the OS supports FMA, AVX, ... Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-16 22:51:17 +02:00
Stefan Weil	43b2e9513b	lstmtrainer: Fix diagnostic message Signed character values must be converted to unsigned integers for %x. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-15 14:31:32 +02:00
Stefan Weil	100d8cd29b	lstmtester: Add missing space in log messages Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-14 14:12:47 +02:00
Stefan Weil	a86251c62b	classify/Makefile: Fix inconsistent style Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 21:35:59 +02:00
Egor Pugin	423a188513	Export some classify vars.	2019-08-13 20:12:21 +03:00
Stefan Weil	46e2a0f106	Remove more code for builds with disabled legacy engine Now the Tesseract library no longer includes unused code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 17:49:10 +02:00
Egor Pugin	73f713519c	Merge pull request #2614 from stweil/training Move source files which are used for training only to src/training	2019-08-12 19:35:50 +03:00
Stefan Weil	e84cb24def	Move source files which are used for training only to src/training They are moved from src/classify and src/lstm to src/training. This reduces the size of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 17:08:08 +02:00
Stefan Weil	ba17bc8204	OpenCL: Add static attribute for kernel_src It is only used in openclwrapper.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:13:45 +02:00
Stefan Weil	970622fbd1	Remove unused functions create_edges_window, draw_raw_edge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:04:10 +02:00
Stefan Weil	23e605911f	Remove unused function truncate_path and related files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:48:56 +02:00
Stefan Weil	bce585286d	Remove global array kPolyBlockNames from Tesseract library It is only used in unittest/layout_test.cc after moving a test from baseapi_test.cc to that file, so it can be made local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:33:55 +02:00
Stefan Weil	beec85e023	Remove UNICHARSET::load_from_inmemory_file and related code The method was only used in unittest where it can be replaced by UNICHARSET::load_from_file which also simplifies the code. This allows removing the class InMemoryFilePointer and fixes a TODO. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 13:07:15 +02:00
Stefan Weil	315dd9df3f	cmake: Don't link pthread on Windows Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-07 15:24:00 +02:00
Stefan Weil	b8079d8ce1	universalambigs: Add hack to fix builds with Microsoft compiler The MS compiler only accepts string constants up to 65535 characters, so shorten the string for that compiler to fix the compilation. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-06 15:46:07 +02:00
Zdenko Podobný	c5a50b93ce	move fileio.cpp and fileio.h to training (this fix android build)	2019-08-04 21:26:39 +02:00
Stefan Weil	6acab45837	universalambigs: Replace octal characters by UTF-8 string This improves readability and reduces the file size. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	8127b4dd27	Clean ambigs.h * Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator * Move some declarations to ambigs.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	23ef93ac4d	cmake: Add missing pthread library It is needed for C++ threads since commit `85068be405`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-26 07:45:51 +02:00
Stefan Weil	e6ca7f3ec6	hocrrenderer: Add missing escaping of special characters in HTML output This converts special character like '<' or '>' to the correct HTML entities. Optimize also the code a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:53:36 +02:00
Stefan Weil	2679cae5d8	Simplify code by using ClipToRange Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-19 13:37:39 +02:00
Stefan Weil	4b2927ae41	LSTMRecognizer: Add non const get functions This allows removing several const casts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:26:51 +02:00
Stefan Weil	4cb3f34c09	Improve formatting of hOCR output with character boxes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 11:07:18 +02:00
Stefan Weil	9195a904a7	Use auto data type for results of std::ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:56:17 +02:00
Stefan Weil	4132194c49	Remove unused filesize_ from class InputBuffer This also simplifies the constructors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-18 10:48:27 +02:00
Stefan Weil	a2b13b49ff	Simplify shell code (fixes warning from Codacy) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:33:24 +02:00
Stefan Weil	d4e0ab3014	Use long instead of off_t for result from ftell Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 21:14:42 +02:00
Stefan Weil	467f8f4140	Fix training script for macOS (issue #2578 ) Bash on macOS does not support "\|&": tesstrain_utils.sh: line 80: syntax error near unexpected token `&' Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 17:18:44 +02:00
Stefan Weil	f92181561c	Fix some compiler warnings (unused local variables) gcc warnings: src/classify/protos.cpp:85:7: warning: unused variable ‘i’ [-Wunused-variable] src/classify/protos.cpp:86:7: warning: unused variable ‘Bit’ [-Wunused-variable] src/classify/protos.cpp:89:14: warning: unused variable ‘Config’ [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-17 07:47:28 +02:00
Stefan Weil	a419f2d78b	Modernize BIT_VECTOR a little bit This removes one more user of Emalloc / Efree. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 22:09:08 +02:00
zdenop	c8374cc528	Merge pull request #2576 from noahmetzger/LSTMChoiceRIL Implemented improved character bounding box algorithm	2019-07-16 12:25:17 +02:00
zdenop	f4925077e8	Merge pull request #2574 from stweil/fix classify: Use fixed size bit vector	2019-07-16 12:22:48 +02:00
zdenop	cb5c78be7d	Merge pull request #2572 from adaptech-cz/wordBoundsOn2ndPass Give word's bounds to callback also during second pass	2019-07-16 12:19:31 +02:00
Noah Metzger	3a5e508934	Implemented improved bounding box algorithm Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-16 11:38:50 +02:00
Stefan Weil	028fff6edd	classify: Use fixed size bit vector The vector was already limited to MAX_NUM_PROTOS (512) entries or 64 bytes in the old code. Now it uses that size right from the start which avoids reallocating it later when entries are added. The old code which reallocated the vector to expand it was buggy because the realloc function can return a different pointer, but the code still used the original pointer to reset the new bits. Function ExpandBitVector is now unused and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 10:18:11 +02:00
Robert Pösel	f99fcd7691	Give word's bounds to callback also during second pass	2019-07-16 09:11:06 +02:00
Stefan Weil	5bbb7f59a6	Remove structures.* It only provided the functions new_cell, free_cell which could be replaced by new, delete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	3621272051	Remove cutil_class.* It is no longer needed since commit `4523ce9f7d`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-16 07:03:52 +02:00
Stefan Weil	ea462b2c03	Remove unused functions reverse16, reverse32 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 21:50:46 +02:00
Stefan Weil	c8cb925813	Remove non portable sleep by std::this_thread::sleep_for Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 16:00:07 +02:00
Stefan Weil	fcfdb7e56f	Remove unused include statements Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:48:31 +02:00
Stefan Weil	ba0c55adc5	svutil: Remove SVSync::StartThread and SVSync::ExitThread Both are unused now. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	85068be405	lstmtester: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	43a281893f	scrollview: Replace SVSync::StartThread by std::thread Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	a6d723bf10	Replace SVSync::StartThread by std::thread and use std::this_thread::yield Using yield instead of a sleep makes running imagedata_test much faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 14:30:51 +02:00
Stefan Weil	13bb4623b1	Use std::lock_guard to protect a code block This is simpler than using lock() / unlock() explicitly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	93427391c1	Replace SVAutoLock by std::lock_guard Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	c0b8ee3b82	Replace CCUtilMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
Stefan Weil	36026e3c35	Replace SVMutex by std::mutex Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-15 12:01:28 +02:00
zdenop	56d4fdce00	Merge pull request #2554 from noahmetzger/LSTMChoiceRIL Improved lstm_choice_mode	2019-07-15 11:51:52 +02:00
Noah Metzger	2dd5d0d60a	Fixed a bug when first decode iteration stays empty and added some comments. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-15 10:05:22 +02:00
Stefan Weil	61eab60fe3	arch: Reduce number of include files for dot product functions dotproductavx.h and dotproductsse.h declared only two functions. Move those declarations to dotproduct.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	2d5b166876	Add dot product implementation for Intel FMA (double = tessdata_best) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-12 23:18:00 +02:00
Stefan Weil	9259ed8f26	Optimize tprintf implementation It no longer uses a local buffer, so it needs less memory and no mutex. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 20:59:07 +02:00
Stefan Weil	2aebd10fb7	FPRow: Add missing initialisation for scalar (CID 1402754) Modernize the code also a little bit. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 17:15:55 +02:00
Stefan Weil	bdc7abf518	Fix format strings for size_t arguments (CID 1402762, 1402767) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:57:19 +02:00
Noah Metzger	11a4cd298b	Added parameters for the LSTM CTC Choice mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Noah Metzger	f2d685a90f	Added CTC-based Symbolchoices. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-07-10 16:34:41 +02:00
Stefan Weil	ee04347347	Fix format string for 64 bit integer (CID 1402986) Commit `c1264c189e` was not the right fix. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 16:20:50 +02:00
Stefan Weil	890b810a9e	tfnetwork: Add missing return statement (CID 1402992) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-10 08:20:52 +02:00
Egor Pugin	3b6f071ee8	Implement CMake+SW build. Currently only Windows is supported. You could try it as following: mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1	2019-07-08 18:50:30 +03:00
Egor Pugin	84ffcc0d38	Merge pull request #2548 from zhuangzhuang/fix_tesstrain_py_error fix tesstrain.py error	2019-07-08 11:25:41 +03:00
zhuangzhuang1988	18c67f4989	fix tesstrain.py error	2019-07-08 14:35:17 +08:00
zhuangzhuang	9eb997fc0b	fix windows stdout messy code (#2546 ) * fix windows stdout messy code * fix type name error * remoe unnecessary codepoint check.	2019-07-08 09:33:53 +03:00
Stefan Weil	d653bb61f3	genericvector: Remove redundant declarations tesseract::FileReader and tesseract::FileWriter are already declared in serialis.h which is included by genericvector.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-05 18:47:15 +02:00
Dmitry Bely	74145f0686	Fix crash in Tesseract::classify_word_and_language() when tessedit_timing_debug is enabled	2019-07-05 12:36:25 +02:00
zdenop	01535706ec	Merge pull request #2539 from stweil/tesscallback Replace tesscallback.h and related proprietary data types by C++-11 functionals	2019-07-05 10:52:06 +02:00
Stefan Weil	134eb39960	Remove tesscallback.h It is no longer used. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3bae459823	Use C++-11 code instead of TessCallback for WERD_RES::ConditionalBlobMerge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	e61c828dcd	Use C++-11 code instead of TessCallback for UNICHARSET::load_via_fgets Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	0ea8ada308	Use C++-11 code instead of TessCallback for WidthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	1c1eb76c36	Use C++-11 code instead of TessCallback for Dawg::iterate_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	3fb15b3891	Use C++-11 code instead of TessCallback for ObjectCache::Get Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	56d8210909	Use C++-11 code instead of TessCallback for TruthCallback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	c33b05be55	Use C++-11 code instead of TessCallback for PointerVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	cc0405298b	Use C++-11 code instead of TessCallback for read, write Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	242e1db7fa	Use C++-11 code instead of TessCallback for function set_compare_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ffd8101986	Use C++-11 code instead of TessCallback for function set_clear_callback Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	ded24d0367	ccmain: Use C++-11 code instead of TessCallback1 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	eeec9c66d4	training: Use C++-11 code for TestCallback This allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 16:03:30 +02:00
Stefan Weil	201ba0dd53	Fix handling of single pages from multipage TIFF files (issue #2537 ) That case now uses Leptonica to deliver the desired image instead of using an inefficient loop in the Tesseract code. See commit `54fafc4e2e` which used similar code in the past. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:56:57 +02:00
Stefan Weil	f1c6564cd7	Revert "fix read wrong tiff page." This reverts commit `75d230a7ac`. That commit introduced new problems (memory leak, potential endless loop) and style issues. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 15:44:07 +02:00
Stefan Weil	fd001c3ab9	Fix linker error with disabled legacy engine (issue #2532 ) Commit `3871caae86` introduced a build regression when the legacy engine was disabled. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-07-04 13:47:38 +02:00
zhuangzhuang1988	75d230a7ac	fix read wrong tiff page.	2019-07-04 12:32:18 +08:00
zhuangzhuang1988	4d4c16bce1	fix start ScrollView.jar failed when lstmtraining	2019-07-03 07:27:50 +02:00
zhuangzhuang1988	99cb088708	close log file handle before move it.	2019-07-01 10:53:12 +08:00
zhuangzhuang1988	a3a361f73d	fix logger file encoding error.	2019-06-28 18:29:52 +08:00
Stefan Weil	5895534b5e	Update enum from unicode/uchar.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-25 10:55:33 +02:00
Stefan Weil	c1264c189e	Fix format string for 64 bit integer This fixes also a warning from gcc. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:31:09 +02:00
Stefan Weil	dfd35d3e27	baseapi: Remove old code The workaround is no longer needed because _splitpath and _MAX_FNAME were removed in commit `cc0d87c5b8`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:32 +02:00
Stefan Weil	dd261e8d42	Replace code using _splitpath_s (win32) That simplifies the code and removes a dependency on "newer" versions of Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-23 09:15:15 +02:00
Stefan Weil	f522b039a5	Remove outdated comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:03:19 +02:00
Stefan Weil	ea20bf0373	Remove dummy code from LSTMTrainer::InitTensorFlowNetwork Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 21:01:40 +02:00
Stefan Weil	41f91c96c8	cmake: Build training tools also on Linux and macOS This enables CI tests for the code in src/training on Linux and macOS. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 20:27:56 +02:00
Egor Pugin	ab28a03e93	Merge pull request #2514 from stweil/tessresultcallback Move LSTMTrainer from libtesseract to libtesseract_training	2019-06-22 18:34:49 +03:00
Stefan Weil	df98bb7368	Move LSTMTrainer from libtesseract to libtesseract_training LSTMTrainer is only used for training, so the shared library for Tesseract can be made smaller. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 16:23:51 +02:00
Stefan Weil	cb2957b3d2	Replace callback by direct function calls in TessBaseAPI::GetComponentImages The new code avoids dynamic memory allocation, uses faster function calls and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 14:54:31 +02:00
Stefan Weil	3159f42257	Remove unused GenericVector::dot_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:21 +02:00
Stefan Weil	bef73d9956	Remove unused GenericVector::compact Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 12:59:08 +02:00
Egor Pugin	3c6a04ea1a	Merge pull request #2512 from stweil/tessresultcallback Simplify class LSTMTrainer	2019-06-22 13:41:21 +03:00
Stefan Weil	2a9b2fb32a	Remove wrong description for GenericVector::set_compare_callback and simplify code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 11:22:07 +02:00
Stefan Weil	bd13069fe8	Simplify class LSTMTrainer The function pointers and callbacks file_reader_, file_writer_, checkpointer_reader_ and checkpoint_writer_ are always set to the same values. Replacing them by direct function calls simplifies the code and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-22 09:18:13 +02:00
Stefan Weil	3871caae86	Simplify indirect call of LMPainPoints::GeneratePainPoint It does neither need a temporary TessResultCallback2 nor the function LMPainPoints::GenerateForBlamer. This also allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-21 17:09:33 +02:00
zdenop	60b4c68d31	tesstrain_utils.sh: remove redundant code	2019-06-20 18:42:29 +02:00
Stefan Weil	5f23290655	tesscallback: Remove more unused code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-20 08:38:00 +02:00
Stefan Weil	2c78735d97	ocrfeatures: Remove locally used functions from global interface ReadFeature, WriteFeature are only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 15:09:39 +02:00
zdenop	a3593d994b	Merge pull request #2499 from stweil/embedded Remove code for embedded build	2019-06-17 10:24:45 +02:00
Stefan Weil	674d6a90d8	Remove code for embedded build That code is unrelated to Tesseract and can be easily implemented by external projects which require it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-06-17 09:55:33 +02:00
zdenop	60aee9f821	create OUTPUT_DIR did not exist; fixes #2497	2019-06-16 15:07:16 +02:00
zdenop	fad96db497	Merge pull request #2494 from Shreeshrii/master Allow saving of box/tiff pairs during legacy tesseract training	2019-06-14 20:44:49 +02:00
Shree	6fa4587949	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:35:39 +00:00
Shree	45cdf741ae	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:32:41 +00:00
Shree	832c6edb97	Allow saving of box/tiff pairs during base tesseract training	2019-06-14 09:25:54 +00:00
James R. Barlow	a9890afd12	Fix text2image compilation on C++17 compilers C++17 drops support for `std::random_shuffle`, breaking C++17 compilers that run to compile text2image.cpp. std::shuffle is valid on C++11 through C++17, so use std::shuffle instead. Due to the use `std::random_shuffle`, `text2image --render_ngrams` would not give consistent results for different compilers or platforms. With the current change, the same random number generator is used for all platforms and initialized to the same seed, so training output should be consistent.	2019-06-13 16:07:20 -07:00
Stefan Weil	fefd521a49	Add dot product implementation using std::inner_product Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-31 12:07:17 +02:00
Stefan Weil	e0c2f0a782	Fix crash in PreloadRenderers with nullptr outputbase The crash could be triggered by a wrong command line. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-29 07:46:29 +02:00
Stefan Weil	9a4bd041c8	Fix build for unittests Commit `29f2cff203` was the wrong fix for the compiler warnings because it broke the unittest build. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 21:36:34 +02:00
Stefan Weil	2c23e7ead5	scanedg: Add const attributes Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	4b3bbd908a	Remove EXTERN macro Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	ac999b2409	Remove unused macros This fixes compiler warnings from clang++ like these ones: src/ccutil/params.cpp:34:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:67:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:68:9: warning: macro is not used [-Wunused-macros] src/cutil/oldlist.cpp:78:9: warning: macro is not used [-Wunused-macros] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 20:27:21 +02:00
Stefan Weil	8c8eb21bc5	Fix compiler errors for old gcc Travis CI with gcc 4.8 failed with errors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 15:38:40 +02:00
Stefan Weil	a86143a41d	Remove some unused functions, constants and variables This fixes compiler warnings, for example: src/ccutil/strngs.cpp:36:11: warning: unused variable 'kMaxDoubleSize' [-Wunused-const-variable] src/viewer/svutil.cpp:320:13: warning: unused function 'TessFreeAddrInfo' [-Wunused-function] src/ccstruct/werd.cpp:32:19: warning: unused variable 'CANT_SCALE_EDGESTEPS' [-Wunused-const-variable] src/textord/bbgrid.cpp:103:10: warning: unused variable 'old_tright' [-Wunused-variable] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:52:31 +02:00
Stefan Weil	29f2cff203	training: Add missing static attributes That fixes several warnings from clang++ like the following one: src/training/combine_lang_model.cpp:36:1: warning: no previous extern declaration for non-static variable 'FLAGS_lang_is_rtl' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 11:33:52 +02:00
Stefan Weil	a139d553a7	training: Move declarations from cpp files to h file That fixes several warnings from clang++ like the following one: src/training/commontraining.cpp:95:1: warning: no previous extern declaration for non-static variable 'FLAGS_D' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	389285010c	featdefs: Add missing include statement It is needed for PicoFeatureLength. This fixes a compiler warning. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	4bec4a69a0	Add missing static attributes This fixes lots of compiler warnings like these ones: src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations] src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations] src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:09 +02:00
Stefan Weil	7e7811ff92	bits16: Modernize code This also fixes warnings like the following one from clang++: src/ccmain/pgedit.cpp:114:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-26 08:53:08 +02:00
Stefan Weil	334d9b4633	unicodes: Optimize code by using constexpr and removing unused globals Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:51:28 +02:00
Stefan Weil	23d05a5e1b	featdefs: Optimize code by using constexpr This also fixes some warnings from clang++: src/classify/featdefs.cpp:47:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:57:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:66:15: warning: declaration requires a global constructor [-Wglobal-constructors] src/classify/featdefs.cpp:75:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:46:36 +02:00
Stefan Weil	7628112273	Fix broken build for Leptonica < 1.77 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 14:23:43 +02:00
Stefan Weil	55901a480f	Remove classify/cutoffs.h It only defined CLASS_CUTOFF_ARRAY and some unused definitions. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-25 13:54:44 +02:00
zdenop	82458db630	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract	2019-05-25 11:14:28 +02:00
zdenop	539673b503	fix '--enable-visibility' build	2019-05-25 11:13:33 +02:00
zdenop	8de022ab1c	Merge pull request #2461 from stweil/tensorflow Support build with Tensorflow	2019-05-25 10:52:37 +02:00
Stefan Weil	32dcfd06ba	Replace Tensorflow by TensorFlow The name is written in camel case, see https://www.tensorflow.org/. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 17:14:28 +02:00
Stefan Weil	2441e4d8ac	Implement check for Tensorflow header file This looks for one of the header files which are included by Tesseract. It currently uses a hard coded path which works for Debian / Ubuntu. Simplify also the rules for linking Tensorflow. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 16:52:14 +02:00
Stefan Weil	9cdf041448	Remove "third_party/" in comments and update path names Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:12:52 +02:00
Stefan Weil	4382ab1a34	Support build with Tensorflow It expects include files in /usr/include/tensorflow. * Add configure option --with-tensorflow (disabled by default) * Fix data type tensorflow::int64 * Remove "third_party/" in include statements * Add dummy implementations for Backward and DebugWeights in TFNetwork * Add files generated with protoc from tfnetwork.proto (so the Tensorflow sources are not needed for the build) * Update Makefiles Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-24 14:11:31 +02:00
Zdenko Podobný	294f548ac1	fix missing tiff format	2019-05-24 10:39:17 +02:00
Stefan Weil	3f74da5da9	lstmtrainer: Set constant kLearningRateDecay at compile time sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2. This also fixes a compiler warning: src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-23 15:01:23 +02:00
zdenop	4bab7dd83d	Merge pull request #2451 from Bharat123rox/lgtm Some LGTM alert fixes and potential bugfixes	2019-05-22 12:19:44 +02:00
Egor Pugin	fea1f3970b	Merge pull request #2452 from stweil/tprintf tprintf: Make code reentrant and use less memory	2019-05-22 12:31:34 +03:00
Egor Pugin	8f99880a7a	Merge pull request #2453 from stweil/crashcode Remove SavePixForCrash and related code	2019-05-22 12:30:29 +03:00
Bharat123rox	bc3ea622a6	Fix bug in max_max_dist	2019-05-22 08:21:30 +02:00
Bharat123rox	0bf45e81e7	Fix LGTM and revert bugfix for later PR	2019-05-22 11:23:27 +05:30
Bharat123rox	945ccac85a	Fix syntax error	2019-05-22 10:10:12 +05:30
Stefan Weil	6514479e8f	Remove SavePixForCrash and related code That debugging code uses very much memory and is no longer useful. text data bss dec hex filename 815 0 262144 262959 4032f src/ccutil/globaloc.o Remove also the function err_exit which was only used in ccmain/reject.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:25:58 +02:00
Stefan Weil	078a129674	tprintf: Make code reentrant and use less memory Reduce the maximum message size from 64 KiB to 2 KiB which still should be large enought for trace messages. Create the smaller message on the stack instead of using a global array to allow reentrancy and to reduce the memory use of Tesseract. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-21 20:22:58 +02:00
Bharat123rox	7f31a0634d	Some LGTM fixes and potential bugfixes	2019-05-21 23:24:50 +05:30
Stefan Weil	d2ca81e794	Remove local definition of M_PI It is defined for all platforms when math.h or cmath is included after defining the macro _USE_MATH_DEFINES. Define _USE_MATH_DEFINES before any include statement to make sure that M_PI gets defined. It is not necessary to define it conditionally only for Windows. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-20 21:18:52 +02:00
Stefan Weil	64bdceee69	Fix compiler warnings This fixes lots of warnings related to ERRCODE like the following one: src/ccutil/errcode.h:81:15: warning: declaration requires a global constructor [-Wglobal-constructors] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-19 22:10:22 +02:00
Stefan Weil	09edd1a604	Fix out-of-bounds writes in Classify::ReadNewCutoffs The function did not correctly read Chinese unichars into the local Class variable if the locale was set to de_DE.UTF-8 (or other incompatible locales). That resulted in a wrong ClassId which was used to write into the Cutoffs array without checking for valid bounds. On macOS the result was a runtime error in baseapi_test (see GitHub issue #1250): [ RUN ] TesseractTest.InitConfigOnlyTest baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug Replacing sscanf by std::istringstream fixes that. Add also an assertion to catch future out-of-bounds writes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-18 13:39:55 +02:00
zdenop	7e9d2f4bc4	Merge pull request #2432 from nickjwhite/hocrmoretypes Add different classes to hocr output depending on BlockType	2019-05-16 17:02:48 +02:00
Stefan Weil	331cc84d8d	Remove assertions for unsupported locale settings The latest code passed all unittests with locale de_DE.UTF-8 and has fixed the locale issues which were reported on GitHub. Therefore the assertions can be removed. Any remaining locale issue will be fixed when it is identified. To help finding such remaining isses, debug code now uses the user's locale settings instead of the default "C" locale for all executables which use TessBaseAPI. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 13:59:39 +02:00
Stefan Weil	77f9bad3c2	Fix UNICHARSET::save_to_string for locale de_DE.UTF-8 That function writes float values which must always use '.' as the decimal separator, no matter what the current locale setting is. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:39:59 +02:00
Stefan Weil	36ed6da349	Fix baseapi_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/baseapi_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 12 tests from 2 test suites. [----------] Global test environment set-up. [----------] 10 tests from TesseractTest [ RUN ] TesseractTest.ArraySizeTest [ OK ] TesseractTest.ArraySizeTest (0 ms) [ RUN ] TesseractTest.BasicTesseractTest [ OK ] TesseractTest.BasicTesseractTest (1251 ms) [ RUN ] TesseractTest.IteratesParagraphsEvenIfNotDetected [ OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms) [ RUN ] TesseractTest.HOCRWorksWithoutSetInputName [ OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms) [ RUN ] TesseractTest.HOCRContainsBaseline [ OK ] TesseractTest.HOCRContainsBaseline (389 ms) [ RUN ] TesseractTest.RickSnyderNotFuckSnyder [ OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms) [ RUN ] TesseractTest.AdaptToWordStrTest Trying to adapt "136 " to "1 3 6" Trying to adapt "256 " to "2 5 6" Trying to adapt "410 " to "4 1 0" Trying to adapt "432 " to "4 3 2" Trying to adapt "540 " to "5 4 0" Trying to adapt "692 " to "6 9 2" Trying to adapt "779 " to "7 7 9" Trying to adapt "793 " to "7 9 3" Trying to adapt "808 " to "8 0 8" Trying to adapt "815 " to "8 1 5" Trying to adapt "12 " to "1 2" Trying to adapt "12 " to "1 2" [ OK ] TesseractTest.AdaptToWordStrTest (788 ms) [ RUN ] TesseractTest.BasicLSTMTest [ OK ] TesseractTest.BasicLSTMTest (4525 ms) [ RUN ] TesseractTest.LSTMGeometryTest [ OK ] TesseractTest.LSTMGeometryTest (615 ms) [ RUN ] TesseractTest.InitConfigOnlyTest Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.232621 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.231864 in normproto file is not in unichar set. [...] Error: unichar ? in normproto file is not in unichar set. Error: unichar 0.233915 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar 0.221755 in normproto file is not in unichar set. Error: unichar 0.000400 in normproto file is not in unichar set. Error: unichar ? in normproto file is not in unichar set. baseapi_test(21845,0x1134c45c0) malloc: * error for object 0x927f96c28005e0: pointer being freed was not allocated baseapi_test(21845,0x1134c45c0) malloc: * set a breakpoint in malloc_error_break to debug [INFO] Lang eng took 327ms in regular init [INFO] Lang chi_tra took 1422ms in regular init Abort trap: 6 TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream instead of sscanf. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-16 11:05:09 +02:00
Stefan Weil	0dcc889e8d	Fix apiexample_test with locale de_DE.UTF-8 The unittest failed with LANG=de_DE.UTF-8: $ unittest/apiexample_test Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 1 test from EuroText [ RUN ] EuroText.FastLatinOCR contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-15 22:43:47 +02:00
Stefan Weil	6b1e709b19	Fix Doxygen comments for void functions Void functions should not use @return. It causes compiler warnings like this one: src/classify/intproto.cpp:326:5: warning: '@return' command used in a comment that is attached to a function returning void [-Wdocumentation] Some non-void functions also were documented with @return none. Fix those comments, too. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 21:57:17 +02:00
Stefan Weil	caa04882fd	normmatch: Remove unused private function PrintNormMatch was unused. Remove it and remove also an unused prototype. Make the only remaining private function NormEvidenceOf static. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-05-14 20:56:04 +02:00
Nick White	068eb4c35d	Add different classes to hocr output depending on BlockType These classes are taken from the hOCR specification, and seem to map well onto the BlockType types. There are probably more that could be added.	2019-05-14 13:25:08 +01:00

... 4 5 6 7 8 ...

1395 Commits