tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-25 00:07:49 +08:00

Author	SHA1	Message	Date
Stefan Weil	4c0b98bd12	Replace undefined shift operations by multiplications Shift operations are undefined for negative numbers, but at least on Intel they return the same value as a multiplication with 2 ^ shift value. This fixes runtime errors reported by sanitizers and OSS-Fuzz: intmatcher.cpp:821:59: runtime error: left shift of negative value -14 intmatcher.cpp:823:75: runtime error: left shift of negative value -512 intmatcher.cpp:820:50: runtime error: left shift of negative value -80 See issue #2297 and https://oss-fuzz.com/testcase-detail/4845195990925312 for details. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Stefan Weil	896698a4f5	Fix runtime error (left shift of negative value) Runtime error: src/training/util.h:37:28: runtime error: left shift of negative value -17 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-12 06:56:54 +01:00
Egor Pugin	59cd716609	Merge pull request #2311 from stweil/global Remove globals.h	2019-03-11 22:33:16 +03:00
Stefan Weil	5202208a8c	Remove globals.h It only included other files which are already included where needed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 19:01:23 +01:00
Stefan Weil	e78b5f2af3	Update test submodule Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-11 13:00:17 +01:00
Noah Metzger	bc2b919805	Integrated Timesteps per symbol into ChoiceIterator Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Noah Metzger	754e38d2b4	Added the option to get the timesteps separated by the suggested segmentation Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>	2019-03-11 10:50:56 +01:00
Egor Pugin	d2c3309df9	Update appveyor.yml	2019-03-11 02:12:13 +03:00
zdenop	e817607280	archive_version_details is available from libArchive version 3.2.0	2019-03-10 22:57:48 +01:00
Egor Pugin	c4dd537206	[cmake] Add visibility to all target_link_libraries calls.	2019-03-11 00:11:25 +03:00
Egor Pugin	b0f61dfd1c	Propagate libarchive to tess users.	2019-03-11 00:06:50 +03:00
Egor Pugin	37b0c36e32	Add libarchive dependency to cppan and sw builds.	2019-03-11 00:03:45 +03:00
zdenop	5cfe4cc1f0	Merge pull request #2286 from Shreeshrii/lstmbox Rename function to TessBaseAPIGetTsvText to be consistent to Create method	2019-03-10 21:41:52 +01:00
zdenop	02a1ffe87a	Report libArchive support	2019-03-10 20:08:45 +01:00
zdenop	4ed44d70c5	cmake: enable libArchive support for non_cppan build	2019-03-10 20:08:19 +01:00
zdenop	e4bf971ad6	Merge pull request #2306 from stweil/fuzz Fix two issues reported by OSS-Fuzz	2019-03-10 19:24:42 +01:00
Stefan Weil	b3aff7d633	Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature This fixes issue #2299, an issue which was already reported by static code analyzers and now by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597. The Tesseract code assigns an address which is out-of-bounds to a pointer variable, but increments that variable later. So this is a false positive. Change the code nevertheless to satisfy OSS-Fuzz. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 18:26:40 +01:00
Stefan Weil	91d0a71d51	Fix assertion caused by wrong unicharset (issue #2301 ) Credit to OSS-Fuzz: This fixes an issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592. OSS-Fuzz triggered this assertion: contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:42:54 +01:00
zdenop	0e72733121	Merge pull request #2305 from stweil/fuzz Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298)	2019-03-10 16:36:26 +01:00
Stefan Weil	71d4990c6d	Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298 ) Credit to OSS-Fuzz: This fixes a security issue which was reported by OSS-Fuzz, see details at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590. Add also some assertions to catch similar bugs. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-10 16:12:30 +01:00
Robert Schubert	3912cb1c33	LSTM char_whitelist/blacklist (`6ac2ff0`): more robust - unicharset can be null too	2019-03-09 10:40:40 +01:00
Stefan Weil	b7279f6d67	unittest: Remove tmp directory from repository and create it during build This fixes out of tree builds. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-08 16:08:16 +01:00
Stefan Weil	bd95c9d2b8	unittest: Add missing libarchive It is needed for the tests if Tesseract was built with libarchive. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-08 15:50:14 +01:00
Stefan Weil	b20f89006e	unittest: Add another file from Abseil It is needed for newer versions of Abseil. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-08 15:46:38 +01:00
Robert Schubert	b45999088c	LSTM char_whitelist/blacklist (`6ac2ff0`): multi-code chars - move decision from ComputeTopN to ContinueContext, where it belongs: block context continuations which emit final codes translating to disabled unichar_ids. (The normal logic for fallback from top2 > top2 > rest will apply.) - pass UNICHARSET refs appropriately	2019-03-08 12:30:16 +01:00
Robert Schubert	8012d5e653	LSTM char_whitelist/blacklist (`6ac2ff0`): also sublangs	2019-03-07 18:32:50 +01:00
Robert Schubert	6ac2ff083e	trying to add tessedit_char_whitelist etc. again: - ignore matrix outputs in ComputeTopN if they belong to a disabled unichar_id - pass UNICHARSET refs to check that - in SetBlackAndWhitelist, also update the unicharset of the lstm_recognizer_ instance, if any	2019-03-07 01:37:23 +01:00
zdenop	f80085c0bf	Merge pull request #2289 from Armyke/master Added an additional optional --tmp_dir parameter to specify the tempo…	2019-03-06 15:03:14 +01:00
zdenop	fe5c82fd24	Merge pull request #2291 from cjmayo/man_configfile Document that configfile can be a file path	2019-03-06 10:19:27 +01:00
Chris Mayo	a9d3efb6e3	Document that configfile can be a file path Useful for custom config or when pointing tessdata to alternate traineddata.	2019-03-05 19:47:54 +00:00
zdenop	868a623f8d	Merge pull request #2290 from stweil/libarchive Add initial support for traineddata files in standard archive formats	2019-03-05 17:42:13 +01:00
Stefan Weil	1c7e00611b	Add initial support for traineddata files in standard archive formats This requires libarchive-dev. Tesseract can now load traineddata files in any of the archive formats which are supported by libarchive. Example of a zipped BagIt archive: $ unzip -l /usr/local/share/tessdata/zip.traineddata Archive: /usr/local/share/tessdata/zip.traineddata Length Date Time Name --------- ---------- ----- ---- 55 2019-03-05 15:27 bagit.txt 0 2019-03-05 15:25 data/ 1557 2019-03-05 15:28 manifest-sha256.txt 1082890 2019-03-05 15:25 data/eng.word-dawg 1487588 2019-03-05 15:25 data/eng.lstm 7477 2019-03-05 15:25 data/eng.unicharset 63346 2019-03-05 15:25 data/eng.shapetable 976552 2019-03-05 15:25 data/eng.inttemp 13408 2019-03-05 15:25 data/eng.normproto 4322 2019-03-05 15:25 data/eng.punc-dawg 4738 2019-03-05 15:25 data/eng.lstm-number-dawg 1410 2019-03-05 15:25 data/eng.freq-dawg 844 2019-03-05 15:25 data/eng.pffmtable 6360 2019-03-05 15:25 data/eng.lstm-unicharset 1012 2019-03-05 15:25 data/eng.lstm-recoder 1047 2019-03-05 15:25 data/eng.unicharambigs 4322 2019-03-05 15:25 data/eng.lstm-punc-dawg 16109842 2019-03-05 15:25 data/eng.bigram-dawg 80 2019-03-05 15:25 data/eng.version 6426 2019-03-05 15:25 data/eng.number-dawg 3694794 2019-03-05 15:25 data/eng.lstm-word-dawg --------- ------- 23468070 21 files `combine_tessdata -d` and `combine_tessdata -u` also work. The traineddata files in the new format can be generated with standard tools like zip or tar. More work is needed for other training tools and big endian support. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-05 17:18:48 +01:00
Armyke	56b04d4ea7	Added the same --tmp_dir flag to tesstrain_utils.sh	2019-03-04 14:05:25 +00:00
Armyke	25fa392887	Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive	2019-03-04 13:26:53 +00:00
Stefan Weil	7fbde96a04	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:26:07 +01:00
Stefan Weil	38fac625cd	Format new code with clang-format Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 20:01:48 +01:00
Shree	a0202bac70	Rename function to TessBaseAPIGetTsvText to be consistent to the Create method	2019-03-02 16:29:53 +00:00
zdenop	5de2a21b3f	Merge pull request #2283 from Shreeshrii/lstmbox Add missing renderers to C-API	2019-03-02 15:15:34 +01:00
zdenop	198c90b124	Merge pull request #2285 from stweil/opt PAGE_RES_IT: Optimize compare operators by using inline code	2019-03-02 15:13:14 +01:00
Stefan Weil	9c90894ff0	PAGE_RES_IT: Optimize compare operators by using inline code Avoiding a function call will make both == and != operator faster. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:57:16 +01:00
Egor Pugin	7cc97c25ca	Merge pull request #2284 from stweil/fix Fix some compiler warnings	2019-03-02 16:35:55 +03:00
Stefan Weil	295996ed05	commandlineflags: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:21:04 +01:00
Stefan Weil	eb14726aac	ICOORD: Fix old type casts This fixes compiler warnings and avoids unnecessary conversions between float and double. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	fb0f1bcf66	BoxChar: Fix compiler warnings (signed/unsigned) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 14:04:54 +01:00
Stefan Weil	0e1a1fc3cf	Validator: Fix compiler warnings (signed/unsigned) This also fixes a regression in validate_grapheme_test introduced by commit `32e9d7c8f5`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-03-02 13:05:03 +01:00
Shree	c7e8131efc	Add TSV option to C-API	2019-03-02 09:50:54 +00:00
Shree	22c099348b	rename LSTMBOX to LSTMBox	2019-03-02 09:11:47 +00:00
zdenop	f5a7ca26e7	Merge pull request #2244 from Shreeshrii/mya Fix Myanmar validation rules as per Unicode charts	2019-03-01 18:37:36 +01:00
zdenop	2ba8e0061a	Merge branch 'master' into mya	2019-03-01 18:37:24 +01:00
zdenop	0b354f2b84	Merge pull request #2282 from Shreeshrii/configs Add lstmbox and wordstrbox to C-API	2019-03-01 18:33:29 +01:00

... 23 24 25 26 27 ...

4862 Commits