tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-12 07:29:07 +08:00

Author	SHA1	Message	Date
amitdo	efae270dea	Disabled legacy build: Disable more unused code	2020-06-24 22:02:52 +03:00
Stefan Weil	ca0a6c9d37	Merge pull request #3035 from stweil/overflow Avoid buffer overflow (issue #444)	2020-06-24 18:46:47 +02:00
Stefan Weil	2cb5bc7690	Improve debug message in ColPartition::ComputeLimits Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-23 22:52:45 +02:00
Stefan Weil	cfabdfe0af	Avoid buffer overflow (issue #444 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-22 22:19:58 +02:00
Stefan Weil	62b085cb8d	ScrollView: Remove C API callcpp.{cpp,h} Use C++ class ScrollView directly instead of using an intermediate C API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-22 09:14:26 +02:00
Stefan Weil	b2cc00d97f	Replace cprintf by tprintf and remove cprintf cprintf was an indirect way to call tprintf. This indirection is not needed, so remove it and use tprintf directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 19:07:09 +02:00
Stefan Weil	ea1f597fc1	Fix insecure call of tprintf Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 19:03:03 +02:00
Stefan Weil	4a10bb68c7	Fix conversion of images with 16 bpp or 24 bpp to grey The old code used pixConvertRGBToLuminance which only converts 32 bpp images. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 09:09:49 +02:00
Stefan Weil	6f6100ff9f	Classify: Run sort only for more than one element This fixes calls of qsort with a nullptr argument (reported by sanitizers). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-20 21:43:22 +02:00
Stefan Weil	d4cf77c92b	Don't check for limits.h (now unused) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-20 10:39:13 +02:00
Matej Knopp	e900252c1a	Fix CMake build with DISABLED_LEGACY_ENGINE	2020-06-17 19:42:49 +02:00
Stefan Weil	d6ca7a5298	ScrollView: Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-17 16:26:41 +02:00
Stefan Weil	380466e0d3	Allow inlining of function TruncateParam It is only used locally in intproto.cpp, so defining it before the first use and adding the static attribute allows the compiler to inline it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:41 +02:00
Stefan Weil	93cfffeb87	Remove unused argument from function TruncateParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:41 +02:00
Stefan Weil	f08b16a5a0	Remove assertion which is triggered by tests oss-fuzz issue 15149 triggers this assertion. See test case here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15149 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:26 +02:00
Stefan Weil	18d9983f69	StrokeWidth: Remove unused local variable (fixes compiler warning) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:09 +02:00
Stefan Weil	bc61038dd4	SPLIT: Make function bounding_box inline for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:21:36 +02:00
Stefan Weil	0e7701bc3c	SEAM: More inline functions for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:20:14 +02:00
Stefan Weil	e45100ebf7	TBOX: Use inline constructor for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:17:55 +02:00
Stefan Weil	c110958ffa	Fix undefined shift with negative value (oss-fuzz issue 14658) This fixes a bug reported by OSS Fuzz: https://oss-fuzz.com/issue/5697280134348800 The old code passed a negative value (-1) as argument to step_dir when destindex was 0. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 13:25:32 +02:00
Stefan Weil	6ee3698958	Remove old unused code from imagedata.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 16:02:27 +02:00
Stefan Weil	d8500adcf4	Fix crash caused by missing thread synchronization (issues #757 , #1168 and #2191 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 15:53:17 +02:00
Robin Watts	6fec69de1a	Fix intsimdmatrixneon.cpp stack corruption. The intsimdmatrix mechanism ensures that inputs would be resized so that we'd only ever get "whole blocks" of data. I'd assumed that that meant the same thing for scales/outputs too, but this appears not to to be the case, as we can get called (sometimes) with num_out % 8 == 7. Possibly we could benefit from resizing those matrices so that special cases in this innermost loop are not actually required, but unless and until that is done, let's fix the inner loop.	2020-05-27 13:40:17 +01:00
Stefan Weil	a06d0d8449	Add missing include statements for config_auto.h They are required to get the macro DISABLED_LEGACY_ENGINE. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-22 16:34:28 +02:00
Stefan Weil	6732eb9eb5	Clean code for NEON support Include it only for NEON and remove unneeded code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-21 07:03:37 +02:00
Robin Watts	f79e52a7cc	NEON SIMD code. In tests on my pi3b+, a release build of my ghostscript integration takes 2 minutes 27 seconds to render a PDF and OCR it with the vanilla sources. With this NEON coded added the time drops to 37 seconds. I have not tested the configure/Makefile changes as I'm not using them.	2020-05-20 18:54:42 +01:00
zdenop	b5d639dcc5	Merge pull request #2965 from robinwatts/pushback1 thanks.	2020-05-16 20:35:19 +02:00
zdenop	064b4403de	Merge pull request #2966 from robinwatts/pushback2	2020-05-16 20:06:31 +02:00
Robin Watts	3408c36eab	Guard #include "config_auto.h" with HAVE_CONFIG_H. Every other file already does this.	2020-05-15 19:29:03 +01:00
Robin Watts	43437a540b	Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds. If api->Init is called with OEM_DEFAULT in DISABLED_LEGACY_ENGINE build modes, the engine mode is never set, resulting in no words being found.	2020-05-15 14:56:41 +01:00
Julian Gilbey	e7e6999d3b	Move comment about swap meaning for DeSerialize to correct function	2020-05-13 07:02:59 +01:00
Robin Watts	27d513462c	Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR. This means the sources compile perfectly in the absence of config_auto.h/HAVE_CONFIG_H as they were intended to do. TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION by autoconf, so there are no actual changes in compiled code.	2020-05-12 21:45:12 +02:00
Stefan Weil	39f7fb4a1a	Allow line images with larger width (depending on height) Training with normalized line images higher than 36 px also results in larger widths. The limit should therefore depend on the height used for the normalization. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:59:31 +02:00
Stefan Weil	34bdc8b74e	Allow line images with larger width Line images can be larger than the old limit, especially when training is made with newspaper lines. Image too large to learn!! Size = 2641x36 Image too large to learn!! Size = 2704x36 Image too large to learn!! Size = 2751x36 Image too large to learn!! Size = 3738x36 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:50:40 +02:00
Julian Gilbey	ca5735efcb	Destroy box before potentially exiting function	2020-05-12 15:25:16 +01:00
Stefan Weil	d3a0768c32	Merge pull request #2975 from robinwatts/pushback5 Tweak architecture specific SIMD files for ease of compilation	2020-05-12 14:55:32 +02:00
Robin Watts	a9b44ee8c2	Tweak architecture specific SIMD files for ease of compilation. This won't affect anything using the supplied build system. For other projects that include tesseract within them, however, this may make their life easier. For example, I have an integration of Tesseract with Ghostscript, in which tesseract is built as part of the Ghostscript build, without using the tesseract build system. The Ghostscript build system is makefile based, and has to work on a range of make systems, including unix make, gnu make and nmake. As such we have to avoid conditionals in the common makefiles. It therefore becomes hard to build one set of files on x86 systems, and another on (say) ARM systems. Accordingly, this commit makes small tweaks to the architecture specific files, so that they compile on EVERY platform; just they only compile to anything useful on the appropriate platform. Thus the makefiles can build all the files on all the systems, and the preprocessor flags mean that the correct functions are actually built.	2020-05-12 13:09:29 +01:00
Egor Pugin	0eaabc42c7	Update CMakeLists.txt	2020-05-12 11:49:15 +03:00
Egor Pugin	e720a26745	[cmake] Set inactivity timeout during icu download to 300 seconds. Fixes #2972.	2020-05-09 18:55:45 +03:00
Robin Watts	80d4af6ecf	Add a mechanism to avoid creating debug fonts. If TESSERACT_DISABLE_DEBUG_FONTS is defined, tesseract doesn't atetmpt to create any debug fonts. This not only saves memory, but it (combined with the change to optionally use Pix as internal storage for the ImageData) allows us to use an embedded Leptonica library with no format handlers at all.	2020-05-05 00:22:23 +01:00
Robin Watts	6bcb941bcf	Avoid tesseract writing Pix out/reading them back. By default, when we ImageData::SetPix, we write the data out as a PNG, just to read it back in to get a compressed buffer of data. We then use this to generate a new Pix. In builds of Tesseract on systems where we don't have temp files, writing files out is problematic. Not only that, but compressing/uncompressing is slow, and on minimal builds of leptonica, where we've disabled the format writers to reduce memory footprint, we get no compression anyway. In such cases, it'd be far nicer just to keep the original Pix as the internal data. Also, when recovering the pixmap from the ImageData, if we know we're only going to read from the data, we can avoid duplicating it and just use the original. This is exactly the case when GRAPHICS_DISABLED is set. So, introduce a TESSERACT_IMAGEDATA_AS_PIX predefine that we can use to cause the internal data to be a Pix rather than a compressed buffer. Given we don't do compression, and they were writing to memory, this was all just more effort than we needed. Also, if we're using GRAPHICS_DISABLED, we might as well just pixCopy rather than pixClone as only the scaler uses this.	2020-05-04 21:01:22 +01:00
Amit D	acc4c8bff5	Merge pull request #2952 from jannick0/patch-1 [trie.h] pattern definition: fix documentation	2020-04-27 23:44:48 +03:00
Stefan Weil	1188e0a516	Remove old code which was used for Ocropus Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-04-27 16:33:34 +02:00
jannick0	e044163085	[trie.h] pattern definition: fix documentation The fix makes the definition of `\n` consistent with the examples given below the definition. Please note that I did not check this against how it is implemented in the code.	2020-04-19 13:47:42 +02:00
Stefan Weil	4a00b68c63	Fix lambda function for curl code errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 20:46:52 +01:00
Stefan Weil	9f5a3f6ac7	Fix uninitialized local variable in curl code Compiler warning: src/api/baseapi.cpp:1151:27: warning: variable 'curlcode' is uninitialized when used here [-Wuninitialized] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 19:25:33 +01:00
zdenop	6e307074d8	Merge pull request #2894 from stweil/curl Report errors from curl_easy functions	2020-03-18 14:14:07 +01:00
Stefan Weil	ef4f99a994	Run xgetbv instruction only on machines which support it This fixes a regression for older Intel processors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:32:10 +01:00
Stefan Weil	eff4dc0603	Use lambda expressions for reporting curl errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:44:42 +01:00
Stefan Weil	9972c91127	Report errors from curl_easy functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:26:51 +01:00

1 2 3 4 5 ...

1215 Commits