tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-03 00:49:01 +08:00

Author	SHA1	Message	Date
Stefan Weil	bf334e0477	ccmain/paragraphs: Fix memory leak Coverity report: CID 1164737 (#1 of 1): Resource leak (RESOURCE_LEAK) 49. leaked_storage: Variable p going out of scope leaks the storage it points to. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2016-10-24 13:37:03 +02:00
Zdenko Podobný	c943fc1a33	sets justification for ParagraphInfo; fixes #429	2016-09-18 20:31:45 +02:00
Stefan Weil	a5b61e2b35	ccmain: Remove unused constants In osdetect.cpp, a local definition of kMinCredibleResolution was identical to a global one, so the local one could be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2016-09-06 21:49:27 +02:00
Stefan Weil	f9051083d9	Fix order of arguments for tprintf Format string and arguments did not match. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2016-03-17 10:25:12 +01:00
Tom Morris	6700edd8bc	Cleanup TSV renderer Remove all references to hocr, hocr.tsv, etc. Remove dead code for font info, input filename, HTML escapes. Improved comments. Fixed indentation.	2016-03-01 13:41:19 -05:00
Sundar M. Vaidya	738fe4f757	Adds BoolParam tessedit_create_hocrtsv in class Tesseract.	2016-03-01 12:30:39 -05:00
Egor Pugin	f4366c1f5a	Merge pull request #89 from ceisserer/master Initialize output parameters of word_char_quality() to zero before early exit	2016-02-17 22:26:36 +03:00
Zdenko Podobný	1db94823a9	Add info for progress monitor, make it visible in doxygen doc; remove commented code	2016-01-05 17:21:53 +01:00
zdenop	c53add706e	Merge pull request #27 from tesseract-ocr/monitor Monitor	2016-01-05 16:28:42 +01:00
amitdo	c2f5e9b849	If there is no explicit renderer(s), default to TessTextRenderer Revert `fd429c32`, `43834da7`, `05de195e`. See #49, #59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.	2015-12-11 19:06:49 +02:00
Stefan Weil	9f87c36e23	Fix duplicate fclose Coverity bug report: CID 1270401 (#1 of 1): Use after free As the comment (which was also fixed) says, ReadNextBox() already calls fclose(box_file), so don't call it a 2nd time. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-11-25 07:32:55 +01:00
Stefan Weil	39de21c91b	ccmain: Remove unused private class member This fixes a warning from clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-11-10 19:08:47 +01:00
Stefan Weil	edf765b952	Remove unneeded const qualifiers This fixes compiler warnings like this one: api/baseapi.h:739:32: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-11-05 06:36:42 +01:00
Stefan Weil	c714330d2f	ccmain: Remove unused local variables Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-11-04 09:44:52 +01:00
Stefan Weil	318b88daa6	ccmain: Fix typos in comments and strings Most of them were found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-09-14 21:59:16 +02:00
Clemens Eisserer	0fd413405f	Initialize output parameters of word_char_quality() to zero before early exit	2015-09-02 17:05:14 +02:00
James R. Barlow	18ac7ae7ef	Get OpenCL to compile on OS X However, the output of the OpenCL build is garbage....	2015-08-26 02:03:07 -07:00
Zdenko Podobný	bb19f2c16b	Fixes #76 - enable OpenMP support	2015-08-14 21:39:40 +02:00
Zdenko Podobný	66a76a9477	Revert "temporary add config/*, configure and Makefile.in for release" This reverts commits `ec9581d8f2`, `1afe382c4e`, `4b2cfabcc1`	2015-07-31 21:44:43 +02:00
Zdenko Podobný	41478fd5a1	implement build without cube (-DNO_CUBE_BUILD)	2015-07-24 11:51:44 +02:00
Jim O'Regan	524a61452d	Doxygen Squashed commit from https://github.com/tesseract-ocr/tesseract/tree/more-doxygen closes #14 Commits: `6317305` doxygen `9f42f69` doxygen `0fc4d52` doxygen `37b4b55` fix typo `bded8f1` some more doxy `020eb00` slight tweak `524666d` doxygenify `2a36a3e` doxygenify `229d218` doxygenify `7fd28ae` doxygenify `a8c64bc` doxygenify `f5d21b6` fix `5d8ede8` doxygenify `a58a4e0` language_model.cpp `fa85709` lm_pain_points.cpp lm_state.cpp `6418da3` merge `06190ba` Merge branch 'old_doxygen_merge' into more-doxygen `84acf08` Merge branch 'master' into more-doxygen `50fe1ff` pagewalk.cpp cube_reco_context.cpp `2982583` change to relative `192a24a` applybox.cpp, take one `8eeb053` delete docs for obsolete params `52e4c77` modernise classify/ocrfeatures.cpp `2a1cba6` modernise cutil/emalloc.cpp `773e006` silence doxygen warning `aeb1731` silence doxygen warning `f18387f` silence doxygen; new params are unused? `15ad6bd` doxygenify cutil/efio.cpp `c8b5dad` doxygenify cutil/danerror.cpp `784450f` the globals and exceptions parts are obsolete; remove `8bca324` doxygen classify/normfeat.cpp `9bcbe16` doxygen classify/normmatch.cpp `aa9a971` doxygen ccmain/cube_control.cpp `c083ff2` doxygen ccmain/cube_reco_context.cpp `f842850` params changed `5c94f12` doxygen ccmain/cubeclassifier.cpp `15ba750` case sensitive `f5c71d4` case sensitive `f85655b` doxygen classify/intproto.cpp `4bbc7aa` partial doxygen classify/mfx.cpp `dbb6041` partial doxygen classify/intproto.cpp `2aa72db` finish doxygen classify/intproto.cpp `0b8de99` doxygen training/mftraining.cpp `0b5b35c` partial doxygen ccstruct/coutln.cpp `b81c766` partial doxygen ccstruct/coutln.cpp `40fc415` finished? doxygen ccstruct/coutln.cpp `6e4165c` doxygen classify/clusttool.cpp `0267dec` doxygen classify/cutoffs.cpp `7f0c70c` doxygen classify/fpoint.cpp `512f3bd` ignore ~ files `5668a52` doxygen classify/intmatcher.cpp `84788d4` doxygen classify/kdtree.cpp `29f36ca` doxygen classify/mfoutline.cpp `40b94b1` silence doxygen warnings `6c511b9` doxygen classify/mfx.cpp `f9b4080` doxygen classify/outfeat.cpp `aa1df05` doxygen classify/picofeat.cpp `cc5f466` doxygen training/cntraining.cpp `cce044f` doxygen training/commontraining.cpp `167e216` missing param `9498383` renamed params `37eeac2` renamed param `d87b5dd` case `c8ee174` renamed params `b858db8` typo `4c2a838` h2 context? `81a2c0c` fix some param names; add some missing params, no docs `bcf8a4c` add some missing params, no docs `af77f86` add some missing params, no docs; fix some param names `01df24e` fix some params `6161056` fix some params `68508b6` fix some params `285aeb6` doxygen complains here no matter what `529bcfa` rm some missing params, typos `cd21226` rm some missing params, add some new ones `48a4bc2` fix params `c844628` missing param `312ce37` missing param; rename one `ec2fdec` missing param `05e15e0` missing params `d515858` change "<" to < to make doxygen happy `b476a28` wrong place	2015-07-20 18:48:00 +01:00
Zdenko Podobný	ec9581d8f2	temporary add configure and Makefile.in for release	2015-07-11 09:42:43 +02:00
Ray Smith	a303ab9d00	Misc fixes, mostly clang formatting, but some bug fixes in matrix, werd, and tesstrain_utils. Also updates unicharset to match traineddata files.	2015-07-09 14:28:20 -07:00
Ray Smith	b1d99dfe23	Added a backup adaptive classifier to take over from primary when it fills on a large document	2015-06-12 11:10:53 -07:00
Ray Smith	78b5e1a77d	Fixed occurrence of small rotated blocks in loosely spaced text	2015-06-12 11:05:00 -07:00
Ray Smith	ab0f4e2c38	Clang fixes to earlier changes and build compatability with Google environment	2015-06-12 10:53:21 -07:00
Zdenko Podobný	d8a55d739d	Fix potential null pointer dereference in ccmain/paragraphs.cpp.	2015-05-22 21:17:33 +02:00
Zdenko Podobný	438edd6c7b	added row attributes to hocr output	2015-05-17 22:13:59 +02:00
Zdenko Podobný	917e994caa	extend ETEXT_DESC by progress_callback	2015-05-17 21:56:40 +02:00
Ray Smith	1e3b671298	Fixes to make yesterday's changes compile	2015-05-13 09:58:59 -07:00
Ray Smith	03f3c9dc88	Misc fixes missed from previous commits	2015-05-12 18:13:15 -07:00
Ray Smith	84920b92b3	Font and classifier output structure cleanup. Font recognition was poor, due to forcing a 1st and 2nd choice at a character level, when the total score for the correct font is often correct at the word level, so allowed the propagation of a full set of fonts and scores to the word recognizer, which can now decide word level fonts using the scores instead of simple votes. Change precipitated a cleanup of output data structures for classifier results, eliminating ScoredClass and INT_RESULT_STRUCT, with a few extra elements going in UnicharRating, and using that wherever possible. That added the extra complexity of 1-rating due to a flip between 0 is good and 0 is bad for the internal classifier scores before they are converted to rating and certainty.	2015-05-12 17:24:34 -07:00
Ray Smith	0e868ef377	Major change to improve layout analysis for heavily diacritic languages: Tha, Vie, Kan, Tel etc. There is a new overlap detector that detects when diacritics cause a big increase in textline overlap. In such cases, diacritics from overlap regions are kept separate from layout analysis completely, allowing textline formation to happen without them. The diacritics are then assigned to 0, 1 or 2 close words at the end of layout analysis, using and modifying an old noise detection data path. The stored diacritics are used or not during recognition according to the character classifier's liking for them.	2015-05-12 16:47:02 -07:00
Ray Smith	b6d0184806	Fixed problems with shifted baselines so recognition can recover from layout analysis errors.	2015-05-12 15:53:45 -07:00
Ray Smith	4a3caefd92	Add ability to build under android (without cube or scrollview).	2015-05-12 15:41:15 -07:00
Ray Smith	53fc4456cc	Fixed issue 1252: Refactored LearnBlob and its call hierarchy to make it a member of Classify. Eliminated the flexfx scheme for calling global feature extractor functions through an array of function pointers. Deleted dead code I found as a by-product. This CL does not change BlobToTrainingSample or ExtractFeatures to be full members of Classify (the eventual goal) as that would make it even bigger, since there are a lot of callers to these functions. When ExtractFeatures and BlobToTrainingSample are members of Classify they will be able to access control parameters in Classify, which will greatly simplify developing variations to the feature extraction process.	2015-05-12 15:22:34 -07:00
Ray Smith	25d0968d09	Major refactor to improve speed on difficut images, especially when running a heap checker. SEAM and SPLIT have been begging for a refactor for a LONG time. This change does most of the work of turning them into proper classes: Moved relevant code into SEAM/SPLIT/TBLOB/EDGEPT etc from global helper functions. Made the splits full data members of SEAM in an array instead of 3 separate pointers. This greatly reduces the amount of new/delete happening in the chopper, which is the main goal. Deleted redundant files: olutil., makechop. Brought other code into SEAM in order to keep its data members private with only priority having accessors.	2015-05-12 14:59:14 -07:00
Zdenko Podobný	4c7c960bfd	fix issue 1417	2015-02-07 22:22:20 +01:00
Zdenko Podobný	15d48361b4	fix VS2010 build;	2015-02-05 17:27:18 +01:00
Zdenko Podobný	9bca55c73b	fix space issue in revision `36883b4faf`	2015-01-30 22:24:26 +01:00
Zdenko Podobný	36883b4faf	preserve interword spaces patch - Issue 1409	2015-01-27 22:58:04 +01:00
Ray Smith	f927728169	Fixed issue 1207	2014-10-09 13:28:03 -07:00
Zdenko Podobný	d0cb1071b2	remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285)	2014-10-07 23:37:34 +02:00
Ray Smith	55d11ad3c2	Moved params from global in page layout to tesseractclass, improved single column layout analysis	2014-10-07 09:31:00 -07:00
Ray Smith	a441993100	Fixed issue 1301	2014-10-07 09:27:25 -07:00
Zdenko Podobný	9e8629d9ef	allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ )	2014-09-19 23:33:47 +02:00
Ray Smith	2f197cd653	Fixed issues 899/1220/1246 (mixed eng+ara)	2014-09-17 18:27:49 -07:00
Zdenko Podobný	ff87944171	fix typo	2014-09-07 18:23:47 +02:00
Zdenko Podobný	d1aa61c110	fix issue 1285: reimplement option to select pdf compression	2014-09-06 09:32:22 +02:00
Ray Smith	09b439b05a	Fixed issue 1241, but disabled due to making accuracy worse	2014-08-13 13:33:10 -07:00

1 2 3 4 5

222 Commits