tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-27 10:34:12 +08:00

Author	SHA1	Message	Date
Stefan Weil	18f7ab751e	WERD_RES: Remove comparisons which are constant This fixes warnings from LGTM: Comparison is always false because id >= 0. Comparison is always true because mirrored >= 1. Comparison is always false because id >= 0. INVALID_UNICHAR_ID is -1, so the warnings are correct. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 20:06:38 +02:00
Stefan Weil	238c872753	GENERIC_2D_ARRAY: Pass parameters by reference This fixes warnings from LGTM: This parameter of type FontClassInfo is 192 bytes - consider passing a pointer/reference instead. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 19:48:13 +02:00
Stefan Weil	a7982185c9	genericvector: Pass parameters by reference This fixes warnings like the following one from LGTM: This parameter of type ParamsTrainingHypothesis is 112 bytes - consider passing a pointer/reference instead. Most parameters can also get the const attribute. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 19:47:49 +02:00
Stefan Weil	819c43d377	chop: Use more efficient float calculations for sqrt This fixes warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. While the sqrt function always calculates with double, here the overloaded std::sqrt can be used to handle the float arguments more efficiently. Replace also an old C++ type cast by a static_cast. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 18:59:23 +02:00
Stefan Weil	f264464ec6	rect: Use more efficient float calculations for ceil, floor This fixes warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. While the floor function always calculates with double, here the overloaded std::floor can be used to handle the float arguments more efficiently. Replace also old C++ type casts by static_cast. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 18:51:06 +02:00
zdenop	1e4768c1f5	Merge pull request #1957 from stweil/lgtm Fix some warnings from static code analyzer LGTM	2018-10-06 18:42:12 +02:00
zdenop	e78c33cfc3	Merge pull request #1956 from stweil/valgrind Fix constructor for class Dict (uninitialized member variables)	2018-10-06 18:32:39 +02:00
Stefan Weil	b26866bb3b	intproto: Use more efficient float calculations for floor This fixes warnings from LGTM: Multiplication result may overflow 'float' before it is converted to 'double'. While the floor function always calculates with double, here the overloaded std::floor can be used to handle the float arguments more efficiently. Replace also old C++ type casts by static_cast. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 18:29:38 +02:00
Stefan Weil	06a8de0b8b	genericvector: Rewrite code to satisfy static code analyzer Warning from LGTM: Resource data_ is acquired by class GenericVector<FontSpacingInfo *> but not released in the destructor. LGTM complains about data_ not being deleted in the destructor. The destructor calls the clear() method, but the delete there was conditional which confuses the static code analyzer. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 18:24:13 +02:00
Stefan Weil	c2a8aa00b8	Fix constructor for class Dict (uninitialized member variables) wildcard_unichar_id_, apostrophe_unichar_id_, question_unichar_id_ and slash_unichar_id_ were not initialized in the constructor. slash_unichar_id_ was used later in a conditional. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 17:52:52 +02:00
zdenop	9efedc15b2	Merge pull request #1954 from stweil/unicharset Fix use of wrong UNICHARSET	2018-10-06 15:04:31 +02:00
zdenop	76cd80e1d7	Merge pull request #1953 from stweil/fix lstmtraining: Remove dead code for purified model name	2018-10-06 15:02:39 +02:00
Stefan Weil	8dc9e9fd14	Fix use of wrong UNICHARSET Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 13:21:09 +02:00
Stefan Weil	0e71e5a754	lstmtraining: Remove dead code for purified model name The purified model name `model_output` was unused, so remove the comment and the unused code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-06 09:34:17 +02:00
Egor Pugin	0e43ae5cf4	Merge pull request #1951 from stweil/checkdir combine_tessdata, lstmtraining: Check for write failures	2018-10-05 23:38:01 +03:00
Stefan Weil	f4e982e041	combine_tessdata: Handle failures when extracting Report an error and terminate if that fails. Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main() and add missing return at end of main(). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-05 21:39:18 +02:00
Stefan Weil	7434590b9a	lstmtraining: Check write permission for output model This is done by creating a temporary file. Report an error and terminate if that fails. Use also EXIT_SUCCESS and EXIT_FAILURE for the return values of main(). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-05 20:38:02 +02:00
zdenop	660dbaa9d5	implement parameter min_characters_to_try for minimum characters to try to skip page entirely. fixes #1729	2018-10-05 19:05:28 +02:00
zdenop	2cb609d202	Merge pull request #1950 from stweil/manpage Merge and enhance documentation on language and script models	2018-10-05 18:09:31 +02:00
Stefan Weil	3315931859	Merge and enhance documentation on language and script models Add also links to the user forum and to the Wiki and update the history text. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-05 16:55:21 +02:00
zdenop	551abb2114	Merge pull request #1949 from stweil/manpage Document some more config options for tesseract	2018-10-05 16:38:06 +02:00
Stefan Weil	383dcf70b5	Document some more config options for tesseract Clarify also the name(s) of the generated OCR result file(s): Tesseract does not create a file named outbase.txt by default. Fix also a sentence in the language section. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-05 16:03:51 +02:00
Egor Pugin	e03ee932d2	Merge pull request #1947 from stweil/doc Update tesseract man page and add Makefile rule to build HTML manpages	2018-10-05 00:25:07 +03:00
Stefan Weil	b70a456788	Add Makefile rule to build HTML manpages They can be built optionally by `make html` (only for automake builds). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 22:36:03 +02:00
Stefan Weil	3e9b0acc5c	Update tesseract man page - move Tesseract 4 release note to other release notes - format command line options in text - add link to release notes (wiki) - add link to contributors (GitHub) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 22:10:22 +02:00
zdenop	f2c44a0ba8	Merge pull request #1946 from stweil/psm Don't set page segmentation mode for unlv config	2018-10-04 22:00:40 +02:00
Stefan Weil	c6f759148b	Don't set page segmentation mode for unlv config Setting the page segmentation mode to 6 ("Assume a single uniform block of text") typically improves the layout detection for such texts, but should not be done in the config file. unlvtests/runtestset.sh adds `--psm 6` explicitly, so test results won't change when using that script. This is similar to commit `ecfee53bac`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 21:01:18 +02:00
Egor Pugin	a86292b111	Merge pull request #1944 from stweil/psm Allow orientation detection with any traineddata	2018-10-04 18:29:45 +03:00
Stefan Weil	26bfd2b9d3	Allow orientation detection with any traineddata While orientation and script detection (OSD) normally requires osd.traineddata to detect both, it must also be possible to do only orientation detection with eng.traineddata or any other traineddata. Enforce osd.traineddata only if there was no `-l` command line option. Commit `27ce472666` was too restrictive. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 17:07:14 +02:00
zdenop	6b9f1f100b	Merge pull request #1943 from stweil/psm Don't set page segmentation mode for hocr, pdf and tsv configs	2018-10-04 16:24:52 +02:00
Stefan Weil	ecfee53bac	Don't set page segmentation mode for hocr, pdf and tsv configs Setting the page segmentation mode in those config files gives unexpected results: the text recognized when no config or only txt is given changes if both txt and any of hocr, pdf or tsv is chosen. In a test set of nearly 200 pages from historical books, using segmentation mode 1 is typically slightly better than the default, but there are also cases where it is much worse. Therefore the user should be able to decide which page segmentation mode is best. Old results for hocr, pdf or tsv now need an explicit `--psm 1` for reproduction. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-04 12:05:49 +02:00
zdenop	b15fbf1d0f	Merge pull request #1941 from Shreeshrii/master Update man page and readme reg two OCR engines in Tesseract 4	2018-10-04 07:49:08 +02:00
Shree Devi Kumar	d160067308	Update README about both OCR engines in tesseract 4	2018-10-04 04:17:49 +00:00
Shree Devi Kumar	0c39d3446b	Update tesseract man page about both OCR engines in tesseract 4	2018-10-04 04:01:26 +00:00
zdenop	1beeeee215	fix version info in VERSION	2018-10-03 23:51:41 +02:00
Zdenko Podobný	dcc50a867f	Merge branch 'master' of https://github.com/tesseract-ocr/tesseract * 'master' of https://github.com/tesseract-ocr/tesseract: Fix CID 1164579 (Explicit null dereferenced) print help for tesstrain.sh; fixes #1469 Fix CID 1395882 (Uninitialized scalar variable) Fix comments Move content of ipoints.h to points.h and remove ipoints.h remove duplicate help from combine_lang_model Fix typo. use tprintf instead of printf to be able disable messages by quiet option (issue #1240) add "sudo ldconfig" to install instruction. fixes #1212 unittest: Replace NULL by nullptr unittest: Format code tesseract app: check if input file exists; fixes #1023 Format code (replace ( xxx ) by (xxx)) Simplify boolean expressions Win32: use the ISO C and C++ conformant name "_putenv" instead of deprecated "putenv"	2018-10-03 19:21:42 +02:00
zdenop	423798722f	Merge pull request #1938 from stweil/coverity Fix two reports from CoverityScan and clean related code	2018-10-02 12:34:08 +02:00
Stefan Weil	04703ca8df	Fix CID 1164579 (Explicit null dereferenced) The report from Coverity Scan is a false positive. Nevertheless the code can be rewritten and optimized a little bit to fix that report. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-02 11:48:28 +02:00
Zdenko Podobný	7dbf5a030f	print help for tesstrain.sh; fixes #1469	2018-10-02 11:35:10 +02:00
Stefan Weil	9a1f14f2aa	Fix CID 1395882 (Uninitialized scalar variable) The implementation for ICOORD only allows division by scale != 0. Do the same for FCOORD by asserting that scale != 0.0f, so undefined program behaviour will be caught. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-02 11:34:14 +02:00
Stefan Weil	ce6ff20939	Fix comments Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-02 11:26:36 +02:00
Stefan Weil	8c56b8f58c	Move content of ipoints.h to points.h and remove ipoints.h Both include files depended on each other, so it did not make sense to separate them. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-10-02 11:21:27 +02:00
zdenop	57a6f1d22e	remove duplicate help from combine_lang_model	2018-10-01 21:22:51 +02:00
Egor Pugin	6ee7f4eac2	Fix typo.	2018-09-29 17:04:25 +03:00
zdenop	14b83d3090	use tprintf instead of printf to be able disable messages by quiet option (issue #1240)	2018-09-29 13:49:08 +02:00
zdenop	d9372662ec	add "sudo ldconfig" to install instruction. fixes #1212	2018-09-29 13:33:36 +02:00
zdenop	d5b6222856	Merge pull request #1935 from stweil/style Format code and fix some style issues	2018-09-29 09:32:56 +02:00
Stefan Weil	4ec9c86226	unittest: Replace NULL by nullptr Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-09-29 09:27:12 +02:00
Stefan Weil	9e66fb918f	unittest: Format code It was formatted with clang-format-7 -i unittest/.{c,h}. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2018-09-29 09:19:13 +02:00
zdenop	1a096441d0	tesseract app: check if input file exists; fixes #1023	2018-09-29 08:51:00 +02:00

... 32 33 34 35 36 ...

4734 Commits