tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-22 14:44:07 +08:00

Author	SHA1	Message	Date
zdenop	90e81ac939	supress VS warnings in release target C4267 (conversion from 'size_t' to 'type', possible loss of data), C4305 ('context' : truncation from 'type1' to 'type2') and C4267 (var' : conversion from 'size_t' to 'type', possible loss of data)	2020-05-19 16:06:03 +02:00
zdenop	b5d639dcc5	Merge pull request #2965 from robinwatts/pushback1 thanks.	2020-05-16 20:35:19 +02:00
zdenop	064b4403de	Merge pull request #2966 from robinwatts/pushback2	2020-05-16 20:06:31 +02:00
Stefan Weil	5d9b181d67	Merge pull request #2982 from robinwatts/pushback8 Guard #include "config_auto.h" with HAVE_CONFIG_H.	2020-05-16 15:00:40 +02:00
zdenop	acaa90c971	cmake: dont use vector unit compile definition globaly	2020-05-16 12:30:20 +02:00
Robin Watts	3408c36eab	Guard #include "config_auto.h" with HAVE_CONFIG_H. Every other file already does this.	2020-05-15 19:29:03 +01:00
Amit D	b4d3bf616a	Merge pull request #2981 from robinwatts/pushback7 Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds.	2020-05-15 18:09:06 +03:00
Robin Watts	43437a540b	Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds. If api->Init is called with OEM_DEFAULT in DISABLED_LEGACY_ENGINE build modes, the engine mode is never set, resulting in no words being found.	2020-05-15 14:56:41 +01:00
Stefan Weil	84721e9049	Merge pull request #2979 from juliangilbey/correct_swap_comment Trivial code documentation fix: move comment about swap meaning for DeSerialize to correct function	2020-05-13 09:32:55 +02:00
Julian Gilbey	e7e6999d3b	Move comment about swap meaning for DeSerialize to correct function	2020-05-13 07:02:59 +01:00
Robin Watts	27d513462c	Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR. This means the sources compile perfectly in the absence of config_auto.h/HAVE_CONFIG_H as they were intended to do. TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION by autoconf, so there are no actual changes in compiled code.	2020-05-12 21:45:12 +02:00
zdenop	f9f8da1b8c	Merge pull request #2977 from stweil/limit	2020-05-12 19:14:09 +02:00
Stefan Weil	39f7fb4a1a	Allow line images with larger width (depending on height) Training with normalized line images higher than 36 px also results in larger widths. The limit should therefore depend on the height used for the normalization. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:59:31 +02:00
Stefan Weil	34bdc8b74e	Allow line images with larger width Line images can be larger than the old limit, especially when training is made with newspaper lines. Image too large to learn!! Size = 2641x36 Image too large to learn!! Size = 2704x36 Image too large to learn!! Size = 2751x36 Image too large to learn!! Size = 3738x36 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:50:40 +02:00
Egor Pugin	43bbcd4ce2	Merge pull request #2976 from juliangilbey/fix_memory_leak_in_linerec Destroy box before potentially exiting function (preventing a memory leak)	2020-05-12 17:33:42 +03:00
Julian Gilbey	ca5735efcb	Destroy box before potentially exiting function	2020-05-12 15:25:16 +01:00
Stefan Weil	d3a0768c32	Merge pull request #2975 from robinwatts/pushback5 Tweak architecture specific SIMD files for ease of compilation	2020-05-12 14:55:32 +02:00
Robin Watts	a9b44ee8c2	Tweak architecture specific SIMD files for ease of compilation. This won't affect anything using the supplied build system. For other projects that include tesseract within them, however, this may make their life easier. For example, I have an integration of Tesseract with Ghostscript, in which tesseract is built as part of the Ghostscript build, without using the tesseract build system. The Ghostscript build system is makefile based, and has to work on a range of make systems, including unix make, gnu make and nmake. As such we have to avoid conditionals in the common makefiles. It therefore becomes hard to build one set of files on x86 systems, and another on (say) ARM systems. Accordingly, this commit makes small tweaks to the architecture specific files, so that they compile on EVERY platform; just they only compile to anything useful on the appropriate platform. Thus the makefiles can build all the files on all the systems, and the preprocessor flags mean that the correct functions are actually built.	2020-05-12 13:09:29 +01:00
Egor Pugin	0eaabc42c7	Update CMakeLists.txt	2020-05-12 11:49:15 +03:00
Egor Pugin	e720a26745	[cmake] Set inactivity timeout during icu download to 300 seconds. Fixes #2972.	2020-05-09 18:55:45 +03:00
Stefan Weil	fe966cc0b1	Add build script for oss-fuzz fuzzers This is a copy of projects/tesseract-ocr/build.sh including its history from https://github.com/google/oss-fuzz.git. It allows maintaining the build rules with the Tesseract source code. The build rules for Leptonica were slightly modified to avoid unneeded compilations. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-08 17:37:37 +02:00
Stefan Weil	016016df77	Build only required Leptonica components Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-08 17:37:37 +02:00
Guido Vranken	6e9a1e97db	Fix build (#3177 ) * [tesseract-ocr] Fix build * [tesseract-ocr] Disable AFL, lower resolution	2020-05-08 17:37:37 +02:00
jonathanmetzman	db5655333e	Migrate projects using -lFuzzingEngine to $LIB_FUZZING_ENGINE (#2325 ) Migrate from -lFuzzingEngine to $LIB_FUZZING_ENGINE where possible and not causing breakage	2020-05-08 17:37:37 +02:00
Guido Vranken	56b94fb783	Add fuzzer that processes 512x512 images (#2279 )	2020-05-08 17:37:37 +02:00
Guido Vranken	b2d1a11016	Use Leptonica master branch (#2224 )	2020-05-08 17:37:37 +02:00
Guido Vranken	1a7f633ab0	Add Tesseract (#2210 ) * Add Tesseract * Use -lz instead of static library path * Disable Tesseract shared build * Minimal repository cloning (--depth 1) * Improve tessdata directory resolution syntax * Don't hardcode TESSDATA_PREFIX into binary * Don't move, but copy $SRC/tessdata to $OUT Move sometimes results in "inter-device move failed"	2020-05-08 17:37:37 +02:00
Robin Watts	80d4af6ecf	Add a mechanism to avoid creating debug fonts. If TESSERACT_DISABLE_DEBUG_FONTS is defined, tesseract doesn't atetmpt to create any debug fonts. This not only saves memory, but it (combined with the change to optionally use Pix as internal storage for the ImageData) allows us to use an embedded Leptonica library with no format handlers at all.	2020-05-05 00:22:23 +01:00
Robin Watts	6bcb941bcf	Avoid tesseract writing Pix out/reading them back. By default, when we ImageData::SetPix, we write the data out as a PNG, just to read it back in to get a compressed buffer of data. We then use this to generate a new Pix. In builds of Tesseract on systems where we don't have temp files, writing files out is problematic. Not only that, but compressing/uncompressing is slow, and on minimal builds of leptonica, where we've disabled the format writers to reduce memory footprint, we get no compression anyway. In such cases, it'd be far nicer just to keep the original Pix as the internal data. Also, when recovering the pixmap from the ImageData, if we know we're only going to read from the data, we can avoid duplicating it and just use the original. This is exactly the case when GRAPHICS_DISABLED is set. So, introduce a TESSERACT_IMAGEDATA_AS_PIX predefine that we can use to cause the internal data to be a Pix rather than a compressed buffer. Given we don't do compression, and they were writing to memory, this was all just more effort than we needed. Also, if we're using GRAPHICS_DISABLED, we might as well just pixCopy rather than pixClone as only the scaler uses this.	2020-05-04 21:01:22 +01:00
zdenop	79c3ebbbb9	Merge pull request #2962 from stweil/GetPageRes Add TessBaseAPI::GetPageRes again	2020-05-04 15:15:29 +02:00
Stefan Weil	9173e6e3f7	Add TessBaseAPI::GetPageRes again It is now added unconditionally, so it is always available for the unittest. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-04 14:03:39 +02:00
Amit D	acc4c8bff5	Merge pull request #2952 from jannick0/patch-1 [trie.h] pattern definition: fix documentation	2020-04-27 23:44:48 +03:00
zdenop	23be532f7d	Merge pull request #2957 from stweil/master	2020-04-27 19:56:32 +02:00
Stefan Weil	1188e0a516	Remove old code which was used for Ocropus Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-04-27 16:33:34 +02:00
jannick0	e044163085	[trie.h] pattern definition: fix documentation The fix makes the definition of `\n` consistent with the examples given below the definition. Please note that I did not check this against how it is implemented in the code.	2020-04-19 13:47:42 +02:00
Egor Pugin	cdebe13d81	[ci] Add fail-fast: false strategy.	2020-03-30 01:53:41 +03:00
Stefan Weil	4a00b68c63	Fix lambda function for curl code errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 20:46:52 +01:00
Stefan Weil	9f5a3f6ac7	Fix uninitialized local variable in curl code Compiler warning: src/api/baseapi.cpp:1151:27: warning: variable 'curlcode' is uninitialized when used here [-Wuninitialized] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 19:25:33 +01:00
zdenop	6e307074d8	Merge pull request #2894 from stweil/curl Report errors from curl_easy functions	2020-03-18 14:14:07 +01:00
Egor Pugin	916875d74a	[sw] Fix mingw build.	2020-03-17 17:57:10 +03:00
Egor Pugin	04a7650b51	Update README.md	2020-03-14 03:23:14 +03:00
Egor Pugin	e1cf69fd9e	[ci] Update.	2020-03-13 23:45:38 +03:00
Egor Pugin	a6c8d4c692	[ci] Merge three configs into one.	2020-03-13 19:37:22 +03:00
Stefan Weil	ef4f99a994	Run xgetbv instruction only on machines which support it This fixes a regression for older Intel processors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:32:10 +01:00
Stefan Weil	a7c9c566ee	Update submodule googletest to tagged release release-1.10.0 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:29:46 +01:00
Stefan Weil	a350108592	Update submodule abseil to tagged release 20200225 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:29:09 +01:00
Stefan Weil	eff4dc0603	Use lambda expressions for reporting curl errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:44:42 +01:00
Stefan Weil	9972c91127	Report errors from curl_easy functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:26:51 +01:00
Egor Pugin	90405ad0e3	Merge pull request #2893 from stweil/piccolo Update piccolo2d-core and piccolo2d-extras	2020-02-23 19:20:44 +03:00
Egor Pugin	bbd2c31b91	Merge pull request #2895 from stweil/avx simd: Check whether the OS supports FMA, AVX, ...	2020-02-23 19:20:18 +03:00

... 24 25 26 27 28 ...

5785 Commits