tesseract

mirror of https://github.com/tesseract-ocr/tesseract.git synced 2024-12-04 01:39:16 +08:00

Author	SHA1	Message	Date
Stefan Weil	f462389673	renderer for TessPDFRenderer Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	d55e5f4803	Replace more GenericVector by std::vector Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	4a28d33c58	Replace GenericVector by std::vector in strngs.h and more places Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	3ddc88cccb	Use std::vector in TessPDFRenderer Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	7c679e777d	Use std::vector for allowed_scripts Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	32d53479ae	Use std::vector for vars_vec, vars_values Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	085f6b2572	Use std::list for paragraph models Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	4ebba72919	Use std::vector for paragraph models Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Stefan Weil	524fc67165	Fix tesseract --list-langs Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-28 21:03:29 +01:00
Egor Pugin	986b57dd4e	Export symbol for unit test.	2020-12-28 04:58:26 +03:00
Egor Pugin	3187f2ef08	Move doubleptr.h to unittests as it is used only there.	2020-12-28 02:32:27 +03:00
Egor Pugin	4175679da6	Revert kdpair, genericheap changes.	2020-12-28 02:31:45 +03:00
Stefan Weil	289a34a40a	Add const attribute for pdf_ttf That moves its data into the text segment and reduces the total size slightly: text data bss dec hex filename 39788 693 0 40481 9e21 old/libtesseract_la-pdfrenderer.o 40360 88 0 40448 9e00 new/libtesseract_la-pdfrenderer.o Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-26 17:51:56 +01:00
Stefan Weil	7dca63caf1	More fixes for namespace tesseract Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-26 17:41:53 +01:00
Stefan Weil	7188b160ae	Fix build with --disable-graphics Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-26 17:36:24 +01:00
Egor Pugin	aecbf79791	Add missing merge_unicharsets training tool to cmake and sw build.	2020-12-26 15:57:22 +03:00
Stefan Weil	317ef988a0	Add missing namespace prefix for GlobalParams() (fix build for some unit tests) Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-26 13:44:43 +01:00
Stefan Weil	418064f639	Add missing namespace prefix (fix build for merge_unicharsets) Signed-off-by: Stefan Weil <sw@weil.de>	2020-12-26 13:09:39 +01:00
Egor Pugin	c8b8d266d6	Fix some of vector<bool> cases for msvc.	2020-12-26 04:17:13 +03:00
Egor Pugin	6b22972bc2	Fix linux build.	2020-12-26 04:15:42 +03:00
Egor Pugin	c3e04abe1e	Inherit STRING from std::string.	2020-12-26 03:48:35 +03:00
Egor Pugin	4fc467a922	Inherit GenericVector from std::vector. Inherit kdpairs from std::pair. Rewrite some move ctors to modern C++ style.	2020-12-26 03:23:09 +03:00
Egor Pugin	04d3cfcf2f	Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract	2020-12-26 00:55:37 +03:00
Egor Pugin	79a86f2582	Move all tesseract symbols into tesseract namespace. Fix include order in many places.	2020-12-26 00:55:30 +03:00
zdenop	ceadc4ddb8	remove inline declaration	2020-12-25 16:28:00 +01:00
Egor Pugin	14d52a79ba	Remove .rc files. No need to add them into dll/exe.	2020-12-25 18:06:35 +03:00
zdenop	044921267f	embed pdf.ttf to tesseract library #2551	2020-12-25 13:20:36 +01:00
Stefan Weil	cc133aa394	Fix text for fonts_dir parameter Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 21:32:05 +01:00
Stefan Weil	34abba8698	Add terminating linefeed to fonts.conf Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 21:32:05 +01:00
Stefan Weil	17a64eef1e	Simplify code for PangoFontInfo::HardInitFontConfig Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 21:32:05 +01:00
Stefan Weil	707ee70966	Use deprecated pango_fc_font_get_glyph for old Pango versions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 12:02:37 +01:00
Stefan Weil	f759142c95	Remove buggy Windows implementation for getting glyph from font Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 09:07:09 +01:00
Stefan Weil	7669d36a37	Use HarfBuzz instead of deprecated pango_fc_font_get_glyph This fixes the crash on MacOS with M1. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 09:03:05 +01:00
Stefan Weil	8c859a7329	Fix type cast from PangoFont to PangoFcFont The original code crashes in pango_fc_font_get_glyph on MacOS with M1. Replacing the type cast with the macro made for that conversion gives at least an error message before crashing: (process:12546): GLib-GObject-WARNING **: 08:38:02.472: invalid cast from 'PangoCairoCoreTextFont' to 'PangoFcFont' zsh: segmentation fault ./pango_font_info_test Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-22 08:45:11 +01:00
Stefan Weil	3efedabda3	automake: Flat build for src/training Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-19 15:25:21 +01:00
Stefan Weil	6fcf8d23bc	Use more compiler and linker flags from pkg-config This fixes some build issues with Homebrew on MacOS. Signed-off-by: Stefan Weil <stefan@Sabines-Mac-mini.fritz.box>	2020-12-13 13:24:46 +01:00
Stefan Weil	490bd3ec8f	Fix build with enabled TensorFlow Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-04 10:56:23 +01:00
Stefan Weil	ac116d1b28	Fix regression in Network::Serialize (fix issue #3167 ) The regression was caused by a wrong string serialization in commit `4613738a5e`. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-12-03 19:36:58 +01:00
zdenop	279b0b2e37	Merge pull request #3160 from stweil/string2 Replace more occurrences of STRING by std::string of char*	2020-11-27 18:24:17 +01:00
Stefan Weil	65b11a1e12	Pack class SVMenuNode Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-11-26 17:17:27 +01:00
Stefan Weil	a1849bc65c	Pack struct CLASS_STRUCT Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-11-26 17:17:27 +01:00
Stefan Weil	0bb46ac2e0	Pack struct BlamerBundle Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-11-26 17:17:27 +01:00
Stefan Weil	bf3774cc91	Use more const char* Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-11-26 17:01:17 +01:00
Stefan Weil	4613738a5e	Use const char* for filename and network_spec parameters This replaces the proprietary STRING data type (764 instead of 838 lines remaining). It also removes STRING from osdetect.h and serialis.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-11-26 17:01:17 +01:00
Stefan Weil	fbc4c809d9	Replace STRING by std::string Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-31 14:08:39 +01:00
Stefan Weil	92b6c652f3	Use std::vector for scales_ Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-29 08:00:11 +01:00
Stefan Weil	c15dd26b84	Don't pass scales_ to IntSimdMatrix::Init Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-28 20:35:53 +01:00
Stefan Weil	fe76142a3d	Remove GenericVector::scale() again Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-28 16:24:59 +01:00
Stefan Weil	eaf72ace31	Prefer result from inverted image if the mean confidence is better Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-26 20:37:47 +01:00
Stefan Weil	cfb1fb2540	Try OCR on inverted line only if mean confidence is below 50 % The old code looked for the minimum confidence which triggered very often a 2nd OCR without improving the result. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-26 09:32:09 +01:00
Robin Watts	436008bd37	Tweak SIMDDetect for ANDROID Neon. cpufeatures.h should be cpu-features.h, with the latest NDK at least. The #if 0'd section is not required because armv8 always includes NEON.	2020-10-19 12:04:29 +01:00
Robin Watts	db10c7b577	intsimdmatrixneon.cpp: Do biasing in SIMD.	2020-10-12 04:30:46 -07:00
Robin Watts	d1e49d6dd2	intsimdmatrixavx2: Do biasing in SIMD. We also move to relying on both scales and output having been padded to accomodate us writing more results than are actually needed here. This was allowed for a few commits back.	2020-10-12 04:30:46 -07:00
Robin Watts	872816897a	Rejig intsimdmatrix to reduce FP ops. Avoid 1) floating point division by 127, 2) conversion of bias to double, 3) FP addition, in favour of 1) integer multiplication by 127, and 2) integer addition. (Also costs extra work in the serialisation/deserialisation of the scale values, and conversion of weights to int formats, but these are all one offs).	2020-10-12 04:30:46 -07:00
Robin Watts	aba1800f69	Round output buffers for intSimdMatrix. In order to allow intSimdMatrix implementations to 'overwrite' their outputs, ensure that the output buffers are always padded to the next block size. This doesn't make any difference yet, but it enables optimisations further down the line, especially when the biasing is pulled into the SIMD.	2020-10-12 11:47:16 +01:00
Robin Watts	9dfdac51c6	Tweak scales array for intSimdMatrix case. Currently, the size of the scales array is not rounded up in the same way as the weights are. This blocks us pushing the scale calculations into the SIMD, as when we "overread" the end of the scale array, we potentially get errors. Here, we adjust the intSimdMatrix stuff to ensure that the scales array reserves enough entries to allow such overreads to work. This doesn't make any difference for now, but opens the way for future optimisations.	2020-10-12 11:47:16 +01:00
amitdo	958f23453e	Improve disabled legacy engine build	2020-10-12 11:47:16 +01:00
Stefan Weil	ac14ab32c6	Remove dummy functions from globaloc.cpp and related code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-04 12:24:26 +02:00
Stefan Weil	7c4ef88dab	Remove unused functions FontUtils::GetAllRenderableCharacters They used the function pango_coverage_max which does nothing and which has been deprecated since pango version 1.44. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-10-03 12:04:40 +02:00
Le Duc Nam	eb8f1674bf	Correct "NoImages" in debug pdf file Issues: Debug information for "NoImages" just be binary image, it don't show up the result of photo_mask_pix to developer Fix: Substract binary image to photo_mask_pix, the result are "NoImages" binary pix	2020-09-06 23:31:30 +07:00
Robert Sachunsky	640c14e080	AutoPageSeg/FindBlocks/GridRemoveUnderlinePartitions: avoid self-deletion When checking horizontal line partitions for possible interpretation as underline formatting, avoid confusing the hline partition itself with an overlapping neighbour (which would delete it).	2020-08-24 19:13:48 +02:00
Robert Sachunsky	65a077d3e9	FindAndRemoveLines/FindVerticalAlignment: decrease fixed vline min length When detecting vertical separators, the blob aligner is used to glue line segments (often segmented due to artificial cracks). But (unlike LineFinder) it has many parameters that are not relative to pixel density/resolution. This change decreases the minimum absolute length in pixels for vertical separators.	2020-08-24 19:13:36 +02:00
Robert Sachunsky	0228d93684	textord debugging: invert default top/bottom bounaries, improve description	2020-08-24 19:13:25 +02:00
Stefan Weil	d33edbc4b1	Merge pull request #3066 from robinwatts/pushback14 Remove unused char constant that causes a warning.	2020-07-17 15:55:51 +02:00
Robin Watts	578462109b	Remove unused char constant that causes a warning. The kDictWildcard is never actually used, so removing it makes no difference. It causes warnings in MSVC builds as MSVC doesn't know how to pack a unicode value into chars.	2020-07-17 14:22:37 +01:00
Robin Watts	150e2e54fe	Squash some warnings in MSVC build. In particular, "defined but not used" (caused by GRAPHICS_DISABLED), double constants being truncated to floats, and implicit casts.	2020-07-16 10:08:40 +01:00
zdenop	7fa200bfb7	Merge pull request #3064 from robinwatts/pushback12 Fix Memory leak when using TESSERACT_IMAGEDATA_AS_PIX	2020-07-15 19:08:58 +02:00
Robin Watts	7f45b719d1	Fix Memory leak when using TESSERACT_IMAGEDATA_AS_PIX If building with TESSERACT_IMAGEDATA_AS_PIX, then tesseract doesn't compress/decompress images, but rather holds the data as internal Pix structures. Unfortunately, I forgot to make the ImageData destructor free these, so memory leaked during use. Fixed here.	2020-07-15 12:35:35 +01:00
zdenop	135c8a49b5	Merge pull request #3061 from stweil/neon Always use NEON by default for ARMv8	2020-07-11 09:11:54 +02:00
zdenop	875bd48bd5	Merge pull request #3058 from stweil/scrollview Disable more code and data with GRAPHICS_DISABLED	2020-07-11 09:11:27 +02:00
Stefan Weil	548a832b98	Use strtok_s for MSVC in class SVNetwork strtok_s can be used with MSVC as a replacement for strtok_r, so less special handling is needed in the code and class SVNetwork can be made smaller by removing member has_content. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-07-10 17:47:05 +02:00
Stefan Weil	2db2223b39	Always use NEON by default for ARMv8 Signed-off-by: Stefan Weil <stefan.weil@bib.uni-mannheim.de>	2020-07-10 15:27:09 +02:00
Stefan Weil	cb3880fb15	Disable more code and data with GRAPHICS_DISABLED Some runtime parameters which are only relevant with graphics enabled were now removed from builds when graphics was disabled. TableFinder::DisplayColSegmentGrid is never used, so remove it completely. Builds with --disable-graphics significantly reduce the code size and avoid some function calls which might be important for certain applications: text data bss dec hex filename 3219230 41136 13920 3274286 31f62e .libs/libtesseract.so (--disable-graphics, old) 3211347 40976 13600 3265923 31d583 .libs/libtesseract.so (--disable-graphics, new) 3360942 43656 15392 3419990 342f56 .libs/libtesseract.so (default) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-07-09 11:23:33 +02:00
Stefan Weil	22e6c2e5a7	Fix division by 0.0 in BaselineRow::PerpDistanceFromBaseline It was reported by oss-fuzz (issue 23962). Add log output to find real images which trigger that issue. Avoid also some conversions from float to double by always using float. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-07-08 17:59:02 +02:00
Stefan Weil	8137cf35a6	Use const char* for filename parameters This replaces the proprietary STRING data type (801 instead of 838 lines remaining). It also removes STRING from osdetect.h and serialis.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-07-07 14:20:09 +02:00
Stefan Weil	51dff483e7	Fix runtime error caused by too large TBOX Runtime error reported by sanitizer: src/ccstruct/rect.h:191:44: runtime error: 50961 is outside the range of representable values of type 'short' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/ccstruct/rect.h:191:44 in Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-30 20:51:52 +02:00
Stefan Weil	2269a500ef	Fix runtime error with null pointer argument Runtime error reported by sanitizer: src/ccstruct/coutln.cpp:1018:19: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:48:14: note: nonnull attribute specified here SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/ccstruct/coutln.cpp:1018:19 in Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-29 19:13:39 +02:00
Stefan Weil	411ffa90c6	Fix unsigned integer overflow Runtime errors reported by sanitizer: src/textord/pithsync.cpp:75:31: runtime error: unsigned integer overflow: 2147483648 + 2147483648 cannot be represented in type 'unsigned int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/textord/pithsync.cpp:75:31 in src/textord/pithsync.cpp:75:43: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/textord/pithsync.cpp:75:43 in src/textord/pithsync.cpp:125:29: runtime error: unsigned integer overflow: 2147483648 + 2147483648 cannot be represented in type 'unsigned int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/textord/pithsync.cpp:125:29 in src/textord/pithsync.cpp:125:41: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/textord/pithsync.cpp:125:41 in Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-29 19:13:39 +02:00
Stefan Weil	7c046c121f	Fix out of bounds array access Runtime error with enabled sanitizer: src/textord/colpartition.cpp:2243:66: runtime error: index -1 out of bounds for type 'tesseract::ColPartition *[6]' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/textord/colpartition.cpp:2243:66 in Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-29 16:10:37 +02:00
zdenop	4ef709554b	Update imagedata.cpp stop PreScale if pixScale failed (fixes #3025)	2020-06-25 20:32:51 +02:00
amitdo	efae270dea	Disabled legacy build: Disable more unused code	2020-06-24 22:02:52 +03:00
Stefan Weil	ca0a6c9d37	Merge pull request #3035 from stweil/overflow Avoid buffer overflow (issue #444)	2020-06-24 18:46:47 +02:00
Stefan Weil	2cb5bc7690	Improve debug message in ColPartition::ComputeLimits Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-23 22:52:45 +02:00
Stefan Weil	cfabdfe0af	Avoid buffer overflow (issue #444 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-22 22:19:58 +02:00
Stefan Weil	62b085cb8d	ScrollView: Remove C API callcpp.{cpp,h} Use C++ class ScrollView directly instead of using an intermediate C API. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-22 09:14:26 +02:00
Stefan Weil	b2cc00d97f	Replace cprintf by tprintf and remove cprintf cprintf was an indirect way to call tprintf. This indirection is not needed, so remove it and use tprintf directly. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 19:07:09 +02:00
Stefan Weil	ea1f597fc1	Fix insecure call of tprintf Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 19:03:03 +02:00
Stefan Weil	4a10bb68c7	Fix conversion of images with 16 bpp or 24 bpp to grey The old code used pixConvertRGBToLuminance which only converts 32 bpp images. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-21 09:09:49 +02:00
Stefan Weil	6f6100ff9f	Classify: Run sort only for more than one element This fixes calls of qsort with a nullptr argument (reported by sanitizers). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-20 21:43:22 +02:00
Stefan Weil	d4cf77c92b	Don't check for limits.h (now unused) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-20 10:39:13 +02:00
Matej Knopp	e900252c1a	Fix CMake build with DISABLED_LEGACY_ENGINE	2020-06-17 19:42:49 +02:00
Stefan Weil	d6ca7a5298	ScrollView: Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-17 16:26:41 +02:00
Stefan Weil	380466e0d3	Allow inlining of function TruncateParam It is only used locally in intproto.cpp, so defining it before the first use and adding the static attribute allows the compiler to inline it. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:41 +02:00
Stefan Weil	93cfffeb87	Remove unused argument from function TruncateParam Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:41 +02:00
Stefan Weil	f08b16a5a0	Remove assertion which is triggered by tests oss-fuzz issue 15149 triggers this assertion. See test case here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15149 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:26 +02:00
Stefan Weil	18d9983f69	StrokeWidth: Remove unused local variable (fixes compiler warning) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 20:17:09 +02:00
Stefan Weil	bc61038dd4	SPLIT: Make function bounding_box inline for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:21:36 +02:00
Stefan Weil	0e7701bc3c	SEAM: More inline functions for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:20:14 +02:00
Stefan Weil	e45100ebf7	TBOX: Use inline constructor for better performance Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 17:17:55 +02:00
Stefan Weil	c110958ffa	Fix undefined shift with negative value (oss-fuzz issue 14658) This fixes a bug reported by OSS Fuzz: https://oss-fuzz.com/issue/5697280134348800 The old code passed a negative value (-1) as argument to step_dir when destindex was 0. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-16 13:25:32 +02:00
Stefan Weil	6ee3698958	Remove old unused code from imagedata.h Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 16:02:27 +02:00
Stefan Weil	d8500adcf4	Fix crash caused by missing thread synchronization (issues #757 , #1168 and #2191 ) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-06-14 15:53:17 +02:00
Robin Watts	6fec69de1a	Fix intsimdmatrixneon.cpp stack corruption. The intsimdmatrix mechanism ensures that inputs would be resized so that we'd only ever get "whole blocks" of data. I'd assumed that that meant the same thing for scales/outputs too, but this appears not to to be the case, as we can get called (sometimes) with num_out % 8 == 7. Possibly we could benefit from resizing those matrices so that special cases in this innermost loop are not actually required, but unless and until that is done, let's fix the inner loop.	2020-05-27 13:40:17 +01:00
Stefan Weil	a06d0d8449	Add missing include statements for config_auto.h They are required to get the macro DISABLED_LEGACY_ENGINE. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-22 16:34:28 +02:00
Stefan Weil	6732eb9eb5	Clean code for NEON support Include it only for NEON and remove unneeded code. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-21 07:03:37 +02:00
Robin Watts	f79e52a7cc	NEON SIMD code. In tests on my pi3b+, a release build of my ghostscript integration takes 2 minutes 27 seconds to render a PDF and OCR it with the vanilla sources. With this NEON coded added the time drops to 37 seconds. I have not tested the configure/Makefile changes as I'm not using them.	2020-05-20 18:54:42 +01:00
zdenop	b5d639dcc5	Merge pull request #2965 from robinwatts/pushback1 thanks.	2020-05-16 20:35:19 +02:00
zdenop	064b4403de	Merge pull request #2966 from robinwatts/pushback2	2020-05-16 20:06:31 +02:00
Robin Watts	3408c36eab	Guard #include "config_auto.h" with HAVE_CONFIG_H. Every other file already does this.	2020-05-15 19:29:03 +01:00
Robin Watts	43437a540b	Fix OEM_DEFAULT in DISABLED_LEGACY_ENGINE builds. If api->Init is called with OEM_DEFAULT in DISABLED_LEGACY_ENGINE build modes, the engine mode is never set, resulting in no words being found.	2020-05-15 14:56:41 +01:00
Julian Gilbey	e7e6999d3b	Move comment about swap meaning for DeSerialize to correct function	2020-05-13 07:02:59 +01:00
Robin Watts	27d513462c	Avoid using PACKAGE_VERSION in favour of TESSERACT_VERSION_STR. This means the sources compile perfectly in the absence of config_auto.h/HAVE_CONFIG_H as they were intended to do. TESSERACT_VERSION_STR is set to be precisely PACKAGE_VERSION by autoconf, so there are no actual changes in compiled code.	2020-05-12 21:45:12 +02:00
Stefan Weil	39f7fb4a1a	Allow line images with larger width (depending on height) Training with normalized line images higher than 36 px also results in larger widths. The limit should therefore depend on the height used for the normalization. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:59:31 +02:00
Stefan Weil	34bdc8b74e	Allow line images with larger width Line images can be larger than the old limit, especially when training is made with newspaper lines. Image too large to learn!! Size = 2641x36 Image too large to learn!! Size = 2704x36 Image too large to learn!! Size = 2751x36 Image too large to learn!! Size = 3738x36 Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-05-12 16:50:40 +02:00
Julian Gilbey	ca5735efcb	Destroy box before potentially exiting function	2020-05-12 15:25:16 +01:00
Stefan Weil	d3a0768c32	Merge pull request #2975 from robinwatts/pushback5 Tweak architecture specific SIMD files for ease of compilation	2020-05-12 14:55:32 +02:00
Robin Watts	a9b44ee8c2	Tweak architecture specific SIMD files for ease of compilation. This won't affect anything using the supplied build system. For other projects that include tesseract within them, however, this may make their life easier. For example, I have an integration of Tesseract with Ghostscript, in which tesseract is built as part of the Ghostscript build, without using the tesseract build system. The Ghostscript build system is makefile based, and has to work on a range of make systems, including unix make, gnu make and nmake. As such we have to avoid conditionals in the common makefiles. It therefore becomes hard to build one set of files on x86 systems, and another on (say) ARM systems. Accordingly, this commit makes small tweaks to the architecture specific files, so that they compile on EVERY platform; just they only compile to anything useful on the appropriate platform. Thus the makefiles can build all the files on all the systems, and the preprocessor flags mean that the correct functions are actually built.	2020-05-12 13:09:29 +01:00
Egor Pugin	0eaabc42c7	Update CMakeLists.txt	2020-05-12 11:49:15 +03:00
Egor Pugin	e720a26745	[cmake] Set inactivity timeout during icu download to 300 seconds. Fixes #2972.	2020-05-09 18:55:45 +03:00
Robin Watts	80d4af6ecf	Add a mechanism to avoid creating debug fonts. If TESSERACT_DISABLE_DEBUG_FONTS is defined, tesseract doesn't atetmpt to create any debug fonts. This not only saves memory, but it (combined with the change to optionally use Pix as internal storage for the ImageData) allows us to use an embedded Leptonica library with no format handlers at all.	2020-05-05 00:22:23 +01:00
Robin Watts	6bcb941bcf	Avoid tesseract writing Pix out/reading them back. By default, when we ImageData::SetPix, we write the data out as a PNG, just to read it back in to get a compressed buffer of data. We then use this to generate a new Pix. In builds of Tesseract on systems where we don't have temp files, writing files out is problematic. Not only that, but compressing/uncompressing is slow, and on minimal builds of leptonica, where we've disabled the format writers to reduce memory footprint, we get no compression anyway. In such cases, it'd be far nicer just to keep the original Pix as the internal data. Also, when recovering the pixmap from the ImageData, if we know we're only going to read from the data, we can avoid duplicating it and just use the original. This is exactly the case when GRAPHICS_DISABLED is set. So, introduce a TESSERACT_IMAGEDATA_AS_PIX predefine that we can use to cause the internal data to be a Pix rather than a compressed buffer. Given we don't do compression, and they were writing to memory, this was all just more effort than we needed. Also, if we're using GRAPHICS_DISABLED, we might as well just pixCopy rather than pixClone as only the scaler uses this.	2020-05-04 21:01:22 +01:00
Amit D	acc4c8bff5	Merge pull request #2952 from jannick0/patch-1 [trie.h] pattern definition: fix documentation	2020-04-27 23:44:48 +03:00
Stefan Weil	1188e0a516	Remove old code which was used for Ocropus Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-04-27 16:33:34 +02:00
jannick0	e044163085	[trie.h] pattern definition: fix documentation The fix makes the definition of `\n` consistent with the examples given below the definition. Please note that I did not check this against how it is implemented in the code.	2020-04-19 13:47:42 +02:00
Stefan Weil	4a00b68c63	Fix lambda function for curl code errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 20:46:52 +01:00
Stefan Weil	9f5a3f6ac7	Fix uninitialized local variable in curl code Compiler warning: src/api/baseapi.cpp:1151:27: warning: variable 'curlcode' is uninitialized when used here [-Wuninitialized] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-18 19:25:33 +01:00
zdenop	6e307074d8	Merge pull request #2894 from stweil/curl Report errors from curl_easy functions	2020-03-18 14:14:07 +01:00
Stefan Weil	ef4f99a994	Run xgetbv instruction only on machines which support it This fixes a regression for older Intel processors. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-03-08 17:32:10 +01:00
Stefan Weil	eff4dc0603	Use lambda expressions for reporting curl errors Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:44:42 +01:00
Stefan Weil	9972c91127	Report errors from curl_easy functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 22:26:51 +01:00
Stefan Weil	57ff90687d	simd: Check whether the OS supports FMA, AVX, ... The previous check was only for the MS compiler, but not for gcc and clang. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-23 16:34:35 +01:00
zdenop	7c3ac569f9	Replace references to the old wiki by new URLs (#2877 ) Replace references to the old wiki by new URLs	2020-02-03 14:59:18 +01:00
Stefan Weil	16553014e0	Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-03 11:37:41 +01:00
Stefan Weil	20bcbc4058	Catch std::runtime_error exception when setting the locale in debug code Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-02-03 07:58:43 +01:00
Robert Sachunsky	cdc8e44a20	ChoiceIterator: skip symbol without choices	2020-01-24 09:19:14 +01:00
jkang-eng	60248f59d4	Fix "tesseract.exe not flushing stdout/stderr" (Issue #2859 ) (#2865 ) * Issue #2859 - Fix "tesseract.exe not flushing stdout/stderr"	2020-01-21 21:51:08 +01:00
Stefan Weil	6f2f310fdf	Remove redundant method from class GenericVector length() is not needed: it can be replaced by size(). Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-18 11:30:14 +01:00
Stefan Weil	3d1f82d0e2	tesstrain.sh: Fix command line flag --help Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-05 10:10:55 +01:00
Stefan Weil	cfd39dc2c7	pageres: Fix compiler warnings clang warnings: src/ccstruct/pageres.cpp:903:20: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-int-float-conversion] src/ccstruct/pageres.cpp:904:23: warning: implicit conversion from 'int' to 'float' changes value from -2147483647 to -2147483648 [-Wimplicit-int-float-conversion] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-04 09:46:10 +01:00
Stefan Weil	d2a2292f32	mftraining: Fix compiler warning powerpc64le-linux-gnu-g++ warning: src/training/mftraining.cpp:209:5: warning: ‘%04d’ directive output may be truncated writing between 4 and 10 bytes into a region of size 8 [-Wformat-truncation=] Signed-off-by: Stefan Weil <sw@weilnetz.de>	2020-01-03 10:13:58 +01:00
zdenop	79f191fe20	Merge pull request #2826 from bertsky/clip-blockpolygon make BlockPolygon usable	2019-12-19 09:14:25 +01:00
Robert Sachunsky	4b0c9f3373	BlockPolygon: clip to image rectangle	2019-12-18 13:29:43 +01:00
Robert Sachunsky	5751a408c9	BlockPolygon: unrotate from internal to image coordinates	2019-12-18 13:29:43 +01:00
amitdo	502ebe8ca9	Autotools: Pango, Cairo and ICU only required by training tools	2019-12-16 17:23:06 +02:00
Stefan Weil	fc84f84b5b	Remove Emacs C modeline in comment line 1 Those files are C++, and the wrong modeline is not needed at all. Remove also some empty descriptions and old history in the comments. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-05 13:57:50 +01:00
Stefan Weil	420cbac876	Clean public API for renderers - Remove unused type definitions for TessTextRenderer, ... in capi.h (they were only used in capi.cpp which now no longer needs them) - Fix typo in comment Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-03 12:23:58 +01:00
Stefan Weil	56df8e6e19	Fix some typos in comments (most of them found by codespell) Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-12-02 14:30:13 +01:00
Stefan Weil	a1a139cbd2	Replace AVX_OPT, ..., AVX macros by HAVE_AVX, ... and clean related code - Replace AVX_OPT, AVX2_OPT, FMA_OPT, SSE41_OPT - Replace AVX, AVX2, FMA, SSE4_1 - Write new HAVE_AVX, HAVE_AVX2, HAVE_FMA, HAVE_SSE4_1 into config_auto.h - Put related conditionals in Makefile.am in one place This makes the code clearer and fixes a log message in IntSimdMatrixTest.AVX2. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-28 17:51:37 +01:00
Stefan Weil	074844ce46	Show libcurl version `tesseract --version` now also shows the version of libcurl and related libraries if it was build with libcurl. The preprocessor macro HAVE_LIBCURL is now defined in config_auto.h. Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-28 16:34:52 +01:00
Stefan Weil	cbd3a21cb2	automake: Flat build for src/viewer and src/wordrec Signed-off-by: Stefan Weil <sw@weilnetz.de>	2019-11-26 16:20:46 +01:00

1 2 3 4 5 ...

1395 Commits