tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-05 10:49:01 +08:00

Author	SHA1	Message	Date
Robin Watts	3408c36eab	Guard #include "config_auto.h" with HAVE_CONFIG_H. Every other file already does this.	2020-05-15 19:29:03 +01:00
Robin Watts	43437a540b	Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds. If api->Init is called with OEM_DEFAULT in DISABLED_LEGACY_ENGINE build modes, the engine mode is never set, resulting in no words being found.	2020-05-15 14:56:41 +01:00
Julian Gilbey	e7e6999d3b	Move comment about swap meaning for DeSerialize to correct function	2020-05-13 07:02:59 +01:00
Robin Watts	27d513462c	Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR. This means the sources compile perfectly in the absence of config_auto.h/HAVE_CONFIG_H as they were intended to do. TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION by autoconf, so there are no actual changes in compiled code.	2020-05-12 21:45:12 +02:00
Stefan Weil	39f7fb4a1a	Allow line images with larger width (depending on height) Training with normalized line images higher than 36 px also results in larger widths. The limit should therefore depend on the height used for the normalization. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:59:31 +02:00
Stefan Weil	34bdc8b74e	Allow line images with larger width Line images can be larger than the old limit, especially when training is made with newspaper lines. Image too large to learn!! Size = 2641x36 Image too large to learn!! Size = 2704x36 Image too large to learn!! Size = 2751x36 Image too large to learn!! Size = 3738x36 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:50:40 +02:00
Julian Gilbey	ca5735efcb	Destroy box before potentially exiting function	2020-05-12 15:25:16 +01:00
Stefan Weil	d3a0768c32	Merge pull request #2975 from robinwatts/pushback5 Tweak architecture specific SIMD files for ease of compilation	2020-05-12 14:55:32 +02:00
Robin Watts	a9b44ee8c2	Tweak architecture specific SIMD files for ease of compilation. This won't affect anything using the supplied build system. For other projects that include tesseract within them, however, this may make their life easier. For example, I have an integration of Tesseract with Ghostscript, in which tesseract is built as part of the Ghostscript build, without using the tesseract build system. The Ghostscript build system is makefile based, and has to work on a range of make systems, including unix make, gnu make and nmake. As such we have to avoid conditionals in the common makefiles. It therefore becomes hard to build one set of files on x86 systems, and another on (say) ARM systems. Accordingly, this commit makes small tweaks to the architecture specific files, so that they compile on EVERY platform; just they only compile to anything useful on the appropriate platform. Thus the makefiles can build all the files on all the systems, and the preprocessor flags mean that the correct functions are actually built.	2020-05-12 13:09:29 +01:00
Egor Pugin	0eaabc42c7	Update CMakeLists.txt	2020-05-12 11:49:15 +03:00
Egor Pugin	e720a26745	[cmake] Set inactivity timeout during icu download to 300 seconds. Fixes #2972.	2020-05-09 18:55:45 +03:00
Robin Watts	80d4af6ecf	Add a mechanism to avoid creating debug fonts. If TESSERACT_DISABLE_DEBUG_FONTS is defined, tesseract doesn't atetmpt to create any debug fonts. This not only saves memory, but it (combined with the change to optionally use Pix as internal storage for the ImageData) allows us to use an embedded Leptonica library with no format handlers at all.	2020-05-05 00:22:23 +01:00
Robin Watts	6bcb941bcf	Avoid tesseract writing Pix out/reading them back. By default, when we ImageData::SetPix, we write the data out as a PNG, just to read it back in to get a compressed buffer of data. We then use this to generate a new Pix. In builds of Tesseract on systems where we don't have temp files, writing files out is problematic. Not only that, but compressing/uncompressing is slow, and on minimal builds of leptonica, where we've disabled the format writers to reduce memory footprint, we get no compression anyway. In such cases, it'd be far nicer just to keep the original Pix as the internal data. Also, when recovering the pixmap from the ImageData, if we know we're only going to read from the data, we can avoid duplicating it and just use the original. This is exactly the case when GRAPHICS_DISABLED is set. So, introduce a TESSERACT_IMAGEDATA_AS_PIX predefine that we can use to cause the internal data to be a Pix rather than a compressed buffer. Given we don't do compression, and they were writing to memory, this was all just more effort than we needed. Also, if we're using GRAPHICS_DISABLED, we might as well just pixCopy rather than pixClone as only the scaler uses this.	2020-05-04 21:01:22 +01:00
Amit D	acc4c8bff5	Merge pull request #2952 from jannick0/patch-1 [trie.h] pattern definition: fix documentation	2020-04-27 23:44:48 +03:00
Stefan Weil	1188e0a516	Remove old code which was used for Ocropus Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-04-27 16:33:34 +02:00
jannick0	e044163085	[trie.h] pattern definition: fix documentation The fix makes the definition of `\n` consistent with the examples given below the definition. Please note that I did not check this against how it is implemented in the code.	2020-04-19 13:47:42 +02:00
Stefan Weil	4a00b68c63	Fix lambda function for curl code errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 20:46:52 +01:00
Stefan Weil	9f5a3f6ac7	Fix uninitialized local variable in curl code Compiler warning: src/api/baseapi.cpp:1151:27: warning: variable 'curlcode' is uninitialized when used here [-Wuninitialized] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 19:25:33 +01:00
zdenop	6e307074d8	Merge pull request #2894 from stweil/curl Report errors from curl_easy functions	2020-03-18 14:14:07 +01:00
Stefan Weil	ef4f99a994	Run xgetbv instruction only on machines which support it This fixes a regression for older Intel processors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:32:10 +01:00
Stefan Weil	eff4dc0603	Use lambda expressions for reporting curl errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:44:42 +01:00
Stefan Weil	9972c91127	Report errors from curl_easy functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:26:51 +01:00
Stefan Weil	57ff90687d	simd: Check whether the OS supports FMA, AVX, ... The previous check was only for the MS compiler, but not for gcc and clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 16:34:35 +01:00
zdenop	7c3ac569f9	Replace references to the old wiki by new URLs (#2877 ) Replace references to the old wiki by new URLs	2020-02-03 14:59:18 +01:00
Stefan Weil	16553014e0	Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-03 11:37:41 +01:00
Stefan Weil	20bcbc4058	Catch std::runtime_error exception when setting the locale in debug code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-03 07:58:43 +01:00
Robert Sachunsky	cdc8e44a20	ChoiceIterator: skip symbol without choices	2020-01-24 09:19:14 +01:00
jkang-eng	60248f59d4	Fix "tesseract.exe not flushing stdout/stderr" (Issue #2859 ) (#2865 ) * Issue #2859 - Fix "tesseract.exe not flushing stdout/stderr"	2020-01-21 21:51:08 +01:00
Stefan Weil	6f2f310fdf	Remove redundant method from class GenericVector length() is not needed: it can be replaced by size(). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-18 11:30:14 +01:00
Stefan Weil	3d1f82d0e2	tesstrain.sh: Fix command line flag --help Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-05 10:10:55 +01:00
Stefan Weil	cfd39dc2c7	pageres: Fix compiler warnings clang warnings: src/ccstruct/pageres.cpp:903:20: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-int-float-conversion] src/ccstruct/pageres.cpp:904:23: warning: implicit conversion from 'int' to 'float' changes value from -2147483647 to -2147483648 [-Wimplicit-int-float-conversion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-04 09:46:10 +01:00
Stefan Weil	d2a2292f32	mftraining: Fix compiler warning powerpc64le-linux-gnu-g++ warning: src/training/mftraining.cpp:209:5: warning: ‘%04d’ directive output may be truncated writing between 4 and 10 bytes into a region of size 8 [-Wformat-truncation=] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-03 10:13:58 +01:00
zdenop	79f191fe20	Merge pull request #2826 from bertsky/clip-blockpolygon make BlockPolygon usable	2019-12-19 09:14:25 +01:00
Robert Sachunsky	4b0c9f3373	BlockPolygon: clip to image rectangle	2019-12-18 13:29:43 +01:00
Robert Sachunsky	5751a408c9	BlockPolygon: unrotate from internal to image coordinates	2019-12-18 13:29:43 +01:00
amitdo	502ebe8ca9	Autotools: Pango, Cairo and ICU only required by training tools	2019-12-16 17:23:06 +02:00
Stefan Weil	fc84f84b5b	Remove Emacs C modeline in comment line 1 Those files are C++, and the wrong modeline is not needed at all. Remove also some empty descriptions and old history in the comments. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-05 13:57:50 +01:00
Stefan Weil	420cbac876	Clean public API for renderers - Remove unused type definitions for TessTextRenderer, ... in capi.h (they were only used in capi.cpp which now no longer needs them) - Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-03 12:23:58 +01:00
Stefan Weil	56df8e6e19	Fix some typos in comments (most of them found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-02 14:30:13 +01:00
Stefan Weil	a1a139cbd2	Replace AVX_OPT, ..., AVX macros by HAVE_AVX, ... and clean related code - Replace AVX_OPT, AVX2_OPT, FMA_OPT, SSE41_OPT - Replace AVX, AVX2, FMA, SSE4_1 - Write new HAVE_AVX, HAVE_AVX2, HAVE_FMA, HAVE_SSE4_1 into config_auto.h - Put related conditionals in Makefile.am in one place This makes the code clearer and fixes a log message in IntSimdMatrixTest.AVX2. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-28 17:51:37 +01:00
Stefan Weil	074844ce46	Show libcurl version `tesseract --version` now also shows the version of libcurl and related libraries if it was build with libcurl. The preprocessor macro HAVE_LIBCURL is now defined in config_auto.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-28 16:34:52 +01:00
Stefan Weil	cbd3a21cb2	automake: Flat build for src/viewer and src/wordrec Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	0cd2bdbd2b	automake: Flat build for src/textord Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	558462358a	automake: Flat build for src/opencl Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	6eeb486b77	automake: Flat build for src/lstm Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	7ebcc77e3b	automake: Flat build for src/dict Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	6181acf367	automake: Flat build for src/cutil Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	159160518b	automake: Flat build for src/classify Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	9730c7e167	automake: Flat build for src/ccutil Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	b1d449315e	automake: Flat build for src/ccstruct Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	9745a9d111	automake: Flat build for src/ccmain Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	a166efaad6	automake: Flat build for src/arch Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Stefan Weil	cafb1bbfd7	automake: Flat build for src/api Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00
Martin Malmsten	9ed3887432	Added ComposedBlock level to Alto output	2019-11-17 21:06:12 +01:00
zdenop	2d6f38eebf	fix using bilevel tiff in pdf output	2019-11-10 16:11:52 +01:00
Shreeshrii	99dfa8a680	Add separator and training_iteration to checkpoint name (#2752 ) * Add separator and training_iteration to checkpoint name * specify modelname_N.NN_NN_NN.checkpoint for intermediate checkpoint	2019-11-09 12:22:40 +01:00
Stefan Weil	ac46b286a4	Fix issue #2748 Commit `94d0f77f56` tried to fix issue #2741 but created a new problem. This commit should fix both old and new issue. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-08 17:12:20 +01:00
Stefan Weil	0406f7706d	Use BRT_UNKNOWN instead of BRT_NOISE to initialize ColPartition::blob_type_ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-08 07:40:06 +01:00
Stefan Weil	9b46a67efa	Use "C" locale for printing parameters This fixes a test for the Python wrapper `tesserocr` (python setup.py test). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-04 19:21:20 +01:00
Egor Pugin	ab836dbb31	Merge pull request #2743 from DavidMaung/master Exposed the text2image option --ptsize to tesstrain.sh.	2019-11-02 17:09:51 +03:00
Stefan Weil	a306cd7370	Fail if no valid lstmf file was written (fix issue #2741 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-01 21:52:45 +01:00
Stefan Weil	94d0f77f56	Don't create an empty lstmf file If Tesseract cannot find text in the input image, it should not write an empty lstmf file. This problem was reported in issue #2741. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-01 21:43:26 +01:00
maungd@battelle.org	3d7afb69ea	Exposed the text2image option --ptsize to tesstrain.sh. Text2image has the option --ptsize which defaults to 12. This option is not exposed through tesstrain.sh; thus, you cannot use tesstrain.sh to explore training with different font sizes. I made a small modification to expose the --ptsize option to tesstrain.sh. It defaults to 12 if not specified.	2019-11-01 15:10:58 -04:00
Stefan Weil	b5498c70fa	Use pre-calculated lookup tables for all C++ compilers Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-31 20:26:01 +01:00
Egor Pugin	2bcc9d8093	Remove cppan build.	2019-10-30 21:37:38 +03:00
Stefan Weil	ca87b06d59	Fix build for Intel Compiler (issue #2736 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-30 10:09:44 +01:00
Stefan Weil	20a50e9bcb	Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-30 10:06:31 +01:00
Egor Pugin	2a37f5dd62	Update includes to use <>.	2019-10-29 14:50:11 +03:00
Egor Pugin	9e324938ab	Update includes to use <>.	2019-10-29 14:31:38 +03:00
Stefan Weil	629b05d978	Update README.md and other documentation for new include file structure Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-29 12:26:41 +01:00
amitdo	2f8884a64e	Fix autotools build	2019-10-28 21:23:58 +02:00
amitdo	e1bae15547	Fix #include path of public headers	2019-10-28 19:10:30 +02:00
amitdo	dfede8ac01	Move all public headers to include/tesseract	2019-10-28 18:50:31 +02:00
zdenop	cede5b34e7	Add pageseg_apply_music_mask option to allow disabling the musi… (#2732 ) Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-27 17:02:05 +01:00
zdenop	4a37cde0d9	fix inverting (Bilevel BW png) in pdf; fixes # 2059	2019-10-27 14:15:12 +01:00
Nat	52bc15acd9	Add pageseg_apply_music_mask option to allow disabling the music mask	2019-10-24 11:44:05 -05:00
Egor Pugin	c727b556f0	Remove unneeded TESS_API from source file.	2019-10-23 13:26:46 +03:00
Egor Pugin	e2688c39e9	Remove TESS_CALL.	2019-10-23 13:21:59 +03:00
wshwang	4ee95a615a	src/ccutil/bits16.h remove warnings (#2726 )	2019-10-23 11:46:24 +02:00
wshwang	71e291bae5	Remove warning C4312	2019-10-22 13:06:44 +02:00
zdenop	fc629eae3b	Subject: training: show error description for open/delete file	2019-10-21 16:31:57 +02:00
Stefan Weil	90bcff3732	Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-21 13:12:36 +02:00
Stefan Weil	a209a6b4b5	Copy resolution of source image (fix issue #1702 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-20 20:45:35 +02:00
zdenop	36dc2ccf75	fix memory leak at PangoFontInfo::CanRenderString	2019-10-20 16:43:04 +02:00
zdenop	1ec34378d9	test for synthesized font faces.	2019-10-19 15:05:28 +02:00
zdenop	cbbe45d94b	cmake: add minimum required version for pango and icu based on autotools	2019-10-19 15:00:49 +02:00
zdenop	37c7a5dd82	text2image: show pango version	2019-10-19 14:52:06 +02:00
Stefan Weil	73a38b39d5	quadlsq: Fix warnings from LGTM Fix two occurrences of this LGTM warning: Multiplication result may overflow 'double' before it is converted to 'long double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 12:07:54 +02:00
Stefan Weil	22cf0f854d	Use "C" locale for PDF output This fixes wrong output of integers with locale de_DE.UTF-8: - /Width 2.481 - /Height 3.508 + /Width 2481 + /Height 3508 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:30:42 +02:00
Stefan Weil	914a8e40d6	Use "C" locale for ALTO output This fixes wrong output of integers with locale de_DE.UTF-8: - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0"> + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0"> Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:18:27 +02:00
Stefan Weil	3e8cc203f4	Fix build error (undefined local variable) The latest commit `96025c7923` was incomplete. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-18 11:05:31 +02:00
Stefan Weil	96025c7923	Remove unimplemented +/- for parameter files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-17 17:14:43 +02:00
zdenop	a3cfd66f37	do not exit if not existing parameter is used. fixes #1334	2019-10-15 07:56:22 +02:00
zdenop	0150fc57cc	Report when tesseract legacy engine not present. (fix issue #2053 )	2019-10-14 22:55:47 +02:00
Stefan Weil	a1e3150bd7	Add new parameter "document_title" to set the title in OCR output files The title can be set for hOCR and PDF output. Currently it is also used for ALTO, so setting the title can be used as a workaround for issue #2700. The constant unknown_title_ is no longer needed and therefore removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-10 15:42:52 +02:00
Stefan Weil	7a7704bc94	Extend function BoxFileName to handle more common image names The function derives the file name for the .box file from an image name. For training from existing line images, it is useful to directly support the image names which are commonly used. While generated images for Tesseract training typically use the name pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized or NAME.nrm.png for grayscale images. BoxFileName is also now a local function as it is only used locally. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-05 15:59:56 +02:00
jm	fb150265ef	speed optimisation - add the option to disable automatic inverting of line images	2019-10-04 10:09:52 +02:00
Stefan Weil	6b35d6ff6e	Fix comment which referred to unused Tesseract parameter This completes commit `aa2ab68e29`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-03 09:23:25 +02:00
Johannes Künsebeck	aa2ab68e29	Removed unused parameters The following parameters are not used anywhere anymore: * use_definite_ambigs_for_classifier * max_viterbi_list_size * word_to_debug_lengths * fragments_debug * tessedit_redo_xheight * debug_acceptable_wds * tessedit_matcher_log * tessedit_test_adaption_mode * docqual_excuse_outline_errs * crunch_pot_garbage * suspect_space_level * tessedit_consistent_reps * wordrec_display_all_words * wordrec_no_block * wordrec_worst_state * fragments_guide_chopper * segment_adjust_debug * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists) * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists) * classify_min_norm_scale_x * classify_max_norm_scale_x * classify_min_norm_scale_y * classify_max_norm_scale_y * il1_adaption_test * textord_blob_size_bigile * textord_blob_size_smallile * editor_debug_config_file * textord_tabfind_show_color_fit The list was generated by a python script and each parameter occurence checked manually.	2019-10-03 09:18:29 +02:00
Stefan Weil	1e84a6f225	Don't create OCR result files when training data is created The configuration file lstm.train causes Tesseract to generate training data for training of an LSTM line recognizer. In this mode, no other files with OCR results should be written. Without this patch, Tesseract writes a small text file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-02 19:29:27 +02:00
Stefan Weil	286d8275c7	Add support for image or image list by URL This allows OCR of images from the internet without downloading them first: tesseract http://IMAGE_URL OUTPUT ... It uses libcurl. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-10-01 12:10:45 +02:00
Stefan Weil	47d70d7014	Modernize code for LIST (fix some -Wold-style-cast warnings) - Use C++ type casts - Remove unneeded type cast - Simplify code for function pop - Remove macro push_on (it was only used once) This fixes lots of compiler warnings caused by old type casts.	2019-10-01 11:12:00 +02:00
Stefan Weil	672d67859f	mfoutline: Modernize code - Use C++ enums - Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT - Use float constant for MF_SCALE_FACTOR - Replace macros by inline functions - Fix documentation comment This fixes several warnings from clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 21:33:15 +02:00
Stefan Weil	7ec5f0ca02	intmatcher: Avoid conversion from double to float and vice versa This fixes some clang warnings: src/classify/intmatcher.cpp:48:49: warning: implicit conversion loses floating-point precision: 'double' to 'const float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:34: warning: implicit conversion loses floating-point precision: 'double' to 'float' [-Wimplicit-float-conversion] src/classify/intmatcher.cpp:405:64: warning: implicit conversion increases floating-point precision: 'float' to 'double' [-Wdouble-promotion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-30 18:05:26 +02:00
Stefan Weil	6d259ebe44	Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare) This fixes a clang warning: src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of unsigned enum expression >= 0 is always true [-Wtautological-unsigned-enum-zero-compare] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-29 22:13:27 +02:00
Stefan Weil	49e351508c	Re-add strngs.h to public API It is still needed. This partially reverts commit `a730b5c4ff`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 10:34:48 +02:00
Stefan Weil	8ad86d6494	Add missing linker flags for TensorFlow Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-28 09:42:37 +02:00
zdenop	d6aa866430	ignore #pragma optimize for clang-cl	2019-09-27 21:19:37 +02:00
Stefan Weil	74d5ce82a6	Remove vecfuncs.cpp and vecfunc.h Replace the macros which were declared in vecfuncs.h by member functions and move a function which was only used in chop.cpp to that file. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 21:20:03 +02:00
Stefan Weil	7bddad59d1	Optimize class ChoiceIterator Re-order a class variable to avoid memory holes and remove unused class variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-25 09:43:57 +02:00
Noah Metzger	ff4c1d204d	Fixed minor bug with the Choice iterator when lstm_choice_mode is not active. Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-24 15:38:28 +02:00
Stefan Weil	994ec697d8	Remove member functions STRING::string and StringParam::string They were redundant because there exist member functions 'c_str' which do the same. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-23 08:33:08 +02:00
Egor Pugin	1fa7324cf7	Merge pull request #2668 from stweil/api Remove STRING from the public Tesseract API	2019-09-23 01:02:26 +03:00
amitdo	0598879a00	Disable legacy build: Disable bitvec.h	2019-09-22 20:37:13 +02:00
Stefan Weil	a730b5c4ff	Remove STRING from the public Tesseract API Removing STRING from genericvector.h allows eliminating the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
Stefan Weil	8cb677d6a2	Replace STRING arguments for LoadDataFromFile and SaveDataToFile This is a step to eliminate the proprietary STRING data type from the public Tesseract API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-22 20:32:28 +02:00
amitdo	1e13d1d4d5	Disable legacy build: Disable more unneeded code	2019-09-22 20:55:24 +03:00
zdenop	39a63c2837	Merge pull request #2663 from bertsky/fix-lstm-user-patterns fix langdata (user words/patterns) file suffixes for LSTMs:	2019-09-20 15:32:54 +02:00
Stefan Weil	0c7cc5a4dd	Fix CID 1405673 part 2 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-19 19:37:05 +02:00
Robert Schubert	5b976bfb55	fix langdata (user words/patterns) file suffixes for LSTMs: - add another constructor for LSTMRecognizer which takes the language_data_path_prefix configured/selected at runtime and passes it to the internal CCUtil - use this in Tesseract::init_tesseract_lang_data when LSTMs are available (this was missing from `297d7d86ce`)	2019-09-19 19:30:54 +02:00
amitdo	479a7b1ca0	Disabled legacy build: Disable more unneeded code	2019-09-19 19:00:13 +03:00
Stefan Weil	3b030b4aeb	Fix CID 1405673 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 22:04:08 +02:00
Stefan Weil	85e8529a2e	Fix CID 1164624 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-17 21:59:42 +02:00
Stefan Weil	b2999d8190	Fix comment for Textord::make_prop_words Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 15:03:45 +02:00
Stefan Weil	256701e2e0	Re-order initialisation in constructor of class ViterbiStateEntry This fixes compiler warnings caused by commit `091ce345f6`: src/wordrec/lm_state.h💯7: warning: field 'cost' will be initialized after field 'curr_b' [-Wreorder] src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags' will be initialized after field 'dawg_info' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	081521fb9f	Move initial values for class ColPartition from constructor to header file This fixes compiler warnings caused by commit `5b4565b80b`: src/textord/colpartition.cpp:91:24: warning: field 'last_column_' will be initialized after field 'column_set_' [-Wreorder] src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_' will be initialized after field 'nearest_neighbor_above_' [-Wreorder] src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_' will be initialized after field 'owns_blobs_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:33:32 +02:00
Stefan Weil	8f66020821	Re-order initialisation in constructors of classes Dawg and DawgPosition This fixes compiler warnings caused by commit `ecf0f2dee5`: src/dict/dawg.h:202:9: warning: field 'type_' will be initialized after field 'lang_' [-Wreorder] src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized after field 'dawg_ref' [-Wreorder] src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized after field 'punc_ref' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	b466cead8e	Add more initial values for class Classify from constructor to header file This fixes compiler warnings caused by commit `751fcd2b11`: src/classify/classify.cpp:176:7: warning: field 'EnableLearning' will be initialized after field 'il1_adaption_test' [-Wreorder] src/classify/classify.cpp:187:7: warning: field 'dict_' will be initialized after field 'static_classifier_' [-Wreorder] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-16 14:31:32 +02:00
Stefan Weil	91b3248af3	Fix CID 1164666 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 22:01:25 +02:00
Stefan Weil	fc6899d898	Fix CID 1164664 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:52:51 +02:00
Stefan Weil	930e11996c	Fix CID 1375402 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 21:17:12 +02:00
Stefan Weil	408d6e8b72	simd: Check OSXSAVE bit before calling _xgetbv Both checks are needed for AVX, AVX2 and FMA checks. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:35:37 +02:00
Stefan Weil	627faa6f9c	Remove UnicharAmbigs for builds without legacy code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 19:11:30 +02:00
amitdo	2134cd7867	Disabled legacy engine build: Disable code related to ambigs.	2019-09-15 19:11:30 +02:00
Stefan Weil	0c960c3cc5	Fix 1164647 (Uninitialized members) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-15 14:25:48 +02:00
amitdo	994596842e	'Disabled leagcy engine' build: don't include unused header	2019-09-15 12:35:36 +03:00
Egor Pugin	6a9584fbc2	Merge pull request #2650 from stweil/cid Fix several issues reported by Coverity Scan	2019-09-14 21:18:37 +03:00
Stefan Weil	763f4781e8	Fix CID 1164662 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:22:56 +02:00
Stefan Weil	6fd58d2897	Fix CID 1164659 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:20:14 +02:00
Stefan Weil	c3500e8d95	Fix CID 1164657 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 19:11:02 +02:00
Stefan Weil	1d3ee3b2a7	Fix CID 1164649 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:37:00 +02:00
Stefan Weil	bd1083904d	Fix CID 1164648 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:32:29 +02:00
Stefan Weil	80f367c6f4	Fix CID 1164644 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:26:49 +02:00
Stefan Weil	7caded8e6b	Fix CID 1164643 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:24:26 +02:00
Stefan Weil	3127242bcd	Fix CID 1164638 (Uninitialized scalar field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:18:15 +02:00
Stefan Weil	06de3075e0	Fix CID 1164636 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:13:06 +02:00
Stefan Weil	052f9ca0bc	Fix CID 1164634, CID 1164635 (Uninitialized pointer field) Remove the unused dummy member variables. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 17:12:39 +02:00
Stefan Weil	97dda3d535	Fix CID 1386099 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	46f21a4182	Fix CID 1164633 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9ea579bf1b	Fix CID 1164628 ff (Uninitialized pointer field) and optimize class ParamContent Only one of bIt, dIt, iIt and sIt is used, so put all four in a union. This fixes CID 1164628, CID 1164629, CID 1164630 and CID 1164631. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	74b552fc31	Remove unused FeatureEnabled from FEATURE_DEFS_STRUCT Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	9f709404f9	Fix CID 1164622 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	5b1f0dbd4b	Fix CID 1164620 (Uninitialized pointer field) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	951f442303	Fix CID 1386105 (Logically dead code) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	64fc205e78	Fix CID 1402767 (Invalid type in argument to printf format specifier) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 15:43:50 +02:00
Stefan Weil	f62a895f74	Remove unused italic, bold in class BLOCK_RES and class WORD_RES Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-14 11:53:58 +02:00
Stefan Weil	ceb8af889e	Fix CID 1340276 (Uninitialized scalar field) for class BLOB_CHOICE xgap_before_ and xgap_after_ are never used, so remove them. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:15:47 +02:00
Stefan Weil	5fdd32bea8	Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch secondary_beam_size_ is set but never used, so remove it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:09:03 +02:00
Stefan Weil	737173a84d	Fix CID 1375401 (Uninitialized scalar field) for class Dawg Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 22:03:10 +02:00
Stefan Weil	edba74d64f	Fix CID 1400760 (Uninitialized scalar field) for class BLOCK Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:58:05 +02:00
Stefan Weil	8ff321e41a	Fix two issues reported by Coverity Scan and modernize class WERD_RES Report from Coverity Scan: CID 1405560 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member end is not initialized in this constructor nor in any functions that it calls. CID 1405561 [...] Modernize and optimize class WERD_RES. This not only fixes the issues but also reduces the size and eliminates the functions InitNonPointers and InitPointers. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 21:51:36 +02:00
Stefan Weil	ecf0f2dee5	Optimize classes Trie, Dawg and DawgPosition Reduce size from 368 to 352 bytes for Trie, 72 to 64 bytes for Dawg and 40 to 24 bytes for DawgPosition by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-13 08:15:01 +02:00
Stefan Weil	efd8dea587	Optimize classes CLIST_ITERATOR, ELIST_ITERATOR, ELIST2_ITERATOR Reduce size from 56 to 48 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 22:03:03 +02:00
Stefan Weil	751fcd2b11	Optimize class Classify Reduce size from 138016 to 13000 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 21:46:55 +02:00
Stefan Weil	0ad08a99b0	Optimize class TFile Reduce size from 24 to 16 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:17:05 +02:00
Stefan Weil	5b4565b80b	Optimize class ColPartition Reduce size from 248 to 224 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	5a12273650	Optimize struct LMConsistencyInfo Reduce size from 104 to 96 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	091ce345f6	Optimize class ViterbiStateEntry Reduce size from 232 to 216 bytes by avoiding holes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 20:04:27 +02:00
Stefan Weil	913cbe6eae	Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit The class no longer uses bit fields. Re-ordering the member variables avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-12 09:07:48 +02:00
Stefan Weil	a922745d9a	tfnetwork: Fix info text Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-11 19:10:25 +02:00
Stefan Weil	5fa09f184f	RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in recodebeam_test and unicharcompress_test: src/ccutil/unicharcompress.h:84:27: runtime error: left shift of 267 by 28 places cannot be represented in type 'int' code has up to kMaxCodeLen (9) values, so the highest possible value for i is 8, and the shift value can reach 7 * 8 = 56. That requires an uint64_t data type. size_t would fit for 64 bit hosts, but be too small for 32 bit hosts. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	4a2d5a2e8d	OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix this runtime error in osd_test and textlineprojection_test: src/ccmain/osdetect.cpp:109:14: runtime error: division by zero Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
Stefan Weil	5c6fade555	BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer Fix these runtime errors in mastertrainer_test: src/ccutil/bitvector.cpp:119:18: runtime error: null pointer passed as argument 2, which is declared to never be null src/ccutil/bitvector.cpp:124:10: runtime error: null pointer passed as argument 1, which is declared to never be null Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-10 15:56:32 +02:00
zdenop	98c7aaa343	Lstm choice ril (#2635 ) Lstm choice ril	2019-09-06 19:12:00 +02:00
Stefan Weil	9f32032517	ccutil: Remove old comments There is no CLIST2 in the current code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-05 17:52:42 +02:00
Stefan Weil	b6933a1082	Use type bool for boolean values in class BLOBNBOX Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-09-03 19:56:59 +02:00
Noah Metzger	c350077b96	Made the lstm_choice mode compatible with the hocr_char_boxes mode Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:54 +02:00
Noah Metzger	e8b9c10d07	Clean up lstm_choice_mode and cut it down to 2 modes instead of 4 Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-09-02 11:09:53 +02:00
Stefan Weil	fdf4067296	Fix warnings from LGTM This fixes three LGTM warnings: Multiplication result may overflow 'float' before it is converted to 'double'. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 22:04:24 +02:00
Stefan Weil	dc90741f1b	Fix crash when function lookup tables are accessed with NaN Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-30 13:42:09 +02:00
Stefan Weil	7968f50fe6	capi: Add missing PSM_RAW_LINE to TessPageSegMode Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-25 09:08:09 +02:00
zdenop	0ded672067	fix typo	2019-08-18 18:47:32 +02:00
Stefan Weil	00cff79f7f	simd: Check whether the OS supports FMA, AVX, ... Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-16 22:51:17 +02:00
Stefan Weil	43b2e9513b	lstmtrainer: Fix diagnostic message Signed character values must be converted to unsigned integers for %x. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-15 14:31:32 +02:00
Stefan Weil	100d8cd29b	lstmtester: Add missing space in log messages Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-14 14:12:47 +02:00
Stefan Weil	a86251c62b	classify/Makefile: Fix inconsistent style Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 21:35:59 +02:00
Egor Pugin	423a188513	Export some classify vars.	2019-08-13 20:12:21 +03:00
Stefan Weil	46e2a0f106	Remove more code for builds with disabled legacy engine Now the Tesseract library no longer includes unused code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-13 17:49:10 +02:00
Egor Pugin	73f713519c	Merge pull request #2614 from stweil/training Move source files which are used for training only to src/training	2019-08-12 19:35:50 +03:00
Stefan Weil	e84cb24def	Move source files which are used for training only to src/training They are moved from src/classify and src/lstm to src/training. This reduces the size of the Tesseract library. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 17:08:08 +02:00
Stefan Weil	ba17bc8204	OpenCL: Add static attribute for kernel_src It is only used in openclwrapper.cpp. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:13:45 +02:00
Stefan Weil	970622fbd1	Remove unused functions create_edges_window, draw_raw_edge Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 15:04:10 +02:00
Stefan Weil	23e605911f	Remove unused function truncate_path and related files Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:48:56 +02:00
Stefan Weil	bce585286d	Remove global array kPolyBlockNames from Tesseract library It is only used in unittest/layout_test.cc after moving a test from baseapi_test.cc to that file, so it can be made local. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 14:33:55 +02:00
Stefan Weil	beec85e023	Remove UNICHARSET::load_from_inmemory_file and related code The method was only used in unittest where it can be replaced by UNICHARSET::load_from_file which also simplifies the code. This allows removing the class InMemoryFilePointer and fixes a TODO. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-12 13:07:15 +02:00
Stefan Weil	315dd9df3f	cmake: Don't link pthread on Windows Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-07 15:24:00 +02:00
Stefan Weil	b8079d8ce1	universalambigs: Add hack to fix builds with Microsoft compiler The MS compiler only accepts string constants up to 65535 characters, so shorten the string for that compiler to fix the compilation. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-06 15:46:07 +02:00
Zdenko Podobný	c5a50b93ce	move fileio.cpp and fileio.h to training (this fix android build)	2019-08-04 21:26:39 +02:00
Stefan Weil	6acab45837	universalambigs: Replace octal characters by UTF-8 string This improves readability and reduces the file size. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00
Stefan Weil	8127b4dd27	Clean ambigs.h * Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator * Move some declarations to ambigs.cpp Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-08-04 19:21:59 +02:00

... 2 3 4 5 6 ...

1337 Commits