tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-12 23:49:06 +08:00

Author	SHA1	Message	Date
Stefan Weil	93cfffeb87	Remove unused argument from function TruncateParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:41 +02:00
Stefan Weil	f08b16a5a0	Remove assertion which is triggered by tests oss-fuzz issue 15149 triggers this assertion. See test case here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15149 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:26 +02:00
Stefan Weil	18d9983f69	StrokeWidth: Remove unused local variable (fixes compiler warning) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:09 +02:00
zdenop	7f14f11f80	Merge pull request #3023 from stweil/inline	2020-06-16 20:03:42 +02:00
Stefan Weil	bc61038dd4	SPLIT: Make function bounding_box inline for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:21:36 +02:00
Stefan Weil	0e7701bc3c	SEAM: More inline functions for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:20:14 +02:00
Stefan Weil	e45100ebf7	TBOX: Use inline constructor for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:17:55 +02:00
zdenop	58c60e6c98	Merge pull request #3022 from stweil/fix Fix undefined shift with negative value (oss-fuzz issue 14658)	2020-06-16 13:48:22 +02:00
Stefan Weil	c110958ffa	Fix undefined shift with negative value (oss-fuzz issue 14658) This fixes a bug reported by OSS Fuzz: https://oss-fuzz.com/issue/5697280134348800 The old code passed a negative value (-1) as argument to step_dir when destindex was 0. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 13:25:32 +02:00
Stefan Weil	6ee3698958	Remove old unused code from imagedata.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 16:02:27 +02:00
Stefan Weil	d8500adcf4	Fix crash caused by missing thread synchronization (issues #757 , #1168 and #2191 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 15:53:17 +02:00
Stefan Weil	62eae84fea	Merge pull request #2991 from robinwatts/pushback9 Fix intsimdmatrixneon.cpp stack corruption.	2020-05-27 16:31:23 +02:00
Robin Watts	6fec69de1a	Fix intsimdmatrixneon.cpp stack corruption. The intsimdmatrix mechanism ensures that inputs would be resized so that we'd only ever get "whole blocks" of data. I'd assumed that that meant the same thing for scales/outputs too, but this appears not to to be the case, as we can get called (sometimes) with num_out % 8 == 7. Possibly we could benefit from resizing those matrices so that special cases in this innermost loop are not actually required, but unless and until that is done, let's fix the inner loop.	2020-05-27 13:40:17 +01:00
Stefan Weil	ff0a7a38f7	Check compiler options depending on host cpu Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-27 06:52:36 +02:00
Stefan Weil	a06d0d8449	Add missing include statements for config_auto.h They are required to get the macro DISABLED_LEGACY_ENGINE. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-22 16:34:28 +02:00
Stefan Weil	6732eb9eb5	Clean code for NEON support Include it only for NEON and remove unneeded code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-21 07:03:37 +02:00
Stefan Weil	7b0e5b0722	Merge pull request #2978 from robinwatts/pushback6 Add NEON SIMD code	2020-05-21 06:59:01 +02:00
Robin Watts	f79e52a7cc	NEON SIMD code. In tests on my pi3b+, a release build of my ghostscript integration takes 2 minutes 27 seconds to render a PDF and OCR it with the vanilla sources. With this NEON coded added the time drops to 37 seconds. I have not tested the configure/Makefile changes as I'm not using them.	2020-05-20 18:54:42 +01:00
zdenop	3a3c41d1ab	try to fix cmake gcc build - make simd configuration (HAVE_?) global (as autotools).	2020-05-19 18:02:16 +02:00
zdenop	32b3ab40f1	fix cmake msvc build	2020-05-19 16:16:38 +02:00
zdenop	90e81ac939	supress VS warnings in release target C4267 (conversion from 'size_t' to 'type', possible loss of data), C4305 ('context' : truncation from 'type1' to 'type2') and C4267 (var' : conversion from 'size_t' to 'type', possible loss of data)	2020-05-19 16:06:03 +02:00
zdenop	b5d639dcc5	Merge pull request #2965 from robinwatts/pushback1 thanks.	2020-05-16 20:35:19 +02:00
zdenop	064b4403de	Merge pull request #2966 from robinwatts/pushback2	2020-05-16 20:06:31 +02:00
Stefan Weil	5d9b181d67	Merge pull request #2982 from robinwatts/pushback8 Guard #include "config_auto.h" with HAVE_CONFIG_H.	2020-05-16 15:00:40 +02:00
zdenop	acaa90c971	cmake: dont use vector unit compile definition globaly	2020-05-16 12:30:20 +02:00
Robin Watts	3408c36eab	Guard #include "config_auto.h" with HAVE_CONFIG_H. Every other file already does this.	2020-05-15 19:29:03 +01:00
Amit D	b4d3bf616a	Merge pull request #2981 from robinwatts/pushback7 Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds.	2020-05-15 18:09:06 +03:00
Robin Watts	43437a540b	Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds. If api->Init is called with OEM_DEFAULT in DISABLED_LEGACY_ENGINE build modes, the engine mode is never set, resulting in no words being found.	2020-05-15 14:56:41 +01:00
Stefan Weil	84721e9049	Merge pull request #2979 from juliangilbey/correct_swap_comment Trivial code documentation fix: move comment about swap meaning for DeSerialize to correct function	2020-05-13 09:32:55 +02:00
Julian Gilbey	e7e6999d3b	Move comment about swap meaning for DeSerialize to correct function	2020-05-13 07:02:59 +01:00
Robin Watts	27d513462c	Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR. This means the sources compile perfectly in the absence of config_auto.h/HAVE_CONFIG_H as they were intended to do. TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION by autoconf, so there are no actual changes in compiled code.	2020-05-12 21:45:12 +02:00
zdenop	f9f8da1b8c	Merge pull request #2977 from stweil/limit	2020-05-12 19:14:09 +02:00
Stefan Weil	39f7fb4a1a	Allow line images with larger width (depending on height) Training with normalized line images higher than 36 px also results in larger widths. The limit should therefore depend on the height used for the normalization. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:59:31 +02:00
Stefan Weil	34bdc8b74e	Allow line images with larger width Line images can be larger than the old limit, especially when training is made with newspaper lines. Image too large to learn!! Size = 2641x36 Image too large to learn!! Size = 2704x36 Image too large to learn!! Size = 2751x36 Image too large to learn!! Size = 3738x36 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:50:40 +02:00
Egor Pugin	43bbcd4ce2	Merge pull request #2976 from juliangilbey/fix_memory_leak_in_linerec Destroy box before potentially exiting function (preventing a memory leak)	2020-05-12 17:33:42 +03:00
Julian Gilbey	ca5735efcb	Destroy box before potentially exiting function	2020-05-12 15:25:16 +01:00
Stefan Weil	d3a0768c32	Merge pull request #2975 from robinwatts/pushback5 Tweak architecture specific SIMD files for ease of compilation	2020-05-12 14:55:32 +02:00
Robin Watts	a9b44ee8c2	Tweak architecture specific SIMD files for ease of compilation. This won't affect anything using the supplied build system. For other projects that include tesseract within them, however, this may make their life easier. For example, I have an integration of Tesseract with Ghostscript, in which tesseract is built as part of the Ghostscript build, without using the tesseract build system. The Ghostscript build system is makefile based, and has to work on a range of make systems, including unix make, gnu make and nmake. As such we have to avoid conditionals in the common makefiles. It therefore becomes hard to build one set of files on x86 systems, and another on (say) ARM systems. Accordingly, this commit makes small tweaks to the architecture specific files, so that they compile on EVERY platform; just they only compile to anything useful on the appropriate platform. Thus the makefiles can build all the files on all the systems, and the preprocessor flags mean that the correct functions are actually built.	2020-05-12 13:09:29 +01:00
Egor Pugin	0eaabc42c7	Update CMakeLists.txt	2020-05-12 11:49:15 +03:00
Egor Pugin	e720a26745	[cmake] Set inactivity timeout during icu download to 300 seconds. Fixes #2972.	2020-05-09 18:55:45 +03:00
Stefan Weil	fe966cc0b1	Add build script for oss-fuzz fuzzers This is a copy of projects/tesseract-ocr/build.sh including its history from https://github.com/google/oss-fuzz.git. It allows maintaining the build rules with the Tesseract source code. The build rules for Leptonica were slightly modified to avoid unneeded compilations. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-08 17:37:37 +02:00
Stefan Weil	016016df77	Build only required Leptonica components Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-08 17:37:37 +02:00
Guido Vranken	6e9a1e97db	Fix build (#3177 ) * [tesseract-ocr] Fix build * [tesseract-ocr] Disable AFL, lower resolution	2020-05-08 17:37:37 +02:00
jonathanmetzman	db5655333e	Migrate projects using -lFuzzingEngine to $LIB_FUZZING_ENGINE (#2325 ) Migrate from -lFuzzingEngine to $LIB_FUZZING_ENGINE where possible and not causing breakage	2020-05-08 17:37:37 +02:00
Guido Vranken	56b94fb783	Add fuzzer that processes 512x512 images (#2279 )	2020-05-08 17:37:37 +02:00
Guido Vranken	b2d1a11016	Use Leptonica master branch (#2224 )	2020-05-08 17:37:37 +02:00
Guido Vranken	1a7f633ab0	Add Tesseract (#2210 ) * Add Tesseract * Use -lz instead of static library path * Disable Tesseract shared build * Minimal repository cloning (--depth 1) * Improve tessdata directory resolution syntax * Don't hardcode TESSDATA_PREFIX into binary * Don't move, but copy $SRC/tessdata to $OUT Move sometimes results in "inter-device move failed"	2020-05-08 17:37:37 +02:00
Robin Watts	80d4af6ecf	Add a mechanism to avoid creating debug fonts. If TESSERACT_DISABLE_DEBUG_FONTS is defined, tesseract doesn't atetmpt to create any debug fonts. This not only saves memory, but it (combined with the change to optionally use Pix as internal storage for the ImageData) allows us to use an embedded Leptonica library with no format handlers at all.	2020-05-05 00:22:23 +01:00
Robin Watts	6bcb941bcf	Avoid tesseract writing Pix out/reading them back. By default, when we ImageData::SetPix, we write the data out as a PNG, just to read it back in to get a compressed buffer of data. We then use this to generate a new Pix. In builds of Tesseract on systems where we don't have temp files, writing files out is problematic. Not only that, but compressing/uncompressing is slow, and on minimal builds of leptonica, where we've disabled the format writers to reduce memory footprint, we get no compression anyway. In such cases, it'd be far nicer just to keep the original Pix as the internal data. Also, when recovering the pixmap from the ImageData, if we know we're only going to read from the data, we can avoid duplicating it and just use the original. This is exactly the case when GRAPHICS_DISABLED is set. So, introduce a TESSERACT_IMAGEDATA_AS_PIX predefine that we can use to cause the internal data to be a Pix rather than a compressed buffer. Given we don't do compression, and they were writing to memory, this was all just more effort than we needed. Also, if we're using GRAPHICS_DISABLED, we might as well just pixCopy rather than pixClone as only the scaler uses this.	2020-05-04 21:01:22 +01:00
zdenop	79c3ebbbb9	Merge pull request #2962 from stweil/GetPageRes Add TessBaseAPI::GetPageRes again	2020-05-04 15:15:29 +02:00

... 7 8 9 10 11 ...

4950 Commits