Commit Graph

1395 Commits

Author SHA1 Message Date
Egor Pugin
cad8cb31bb Add missing includes. 2020-12-31 17:58:36 +03:00
Egor Pugin
65e230f1a2 Fix linux build. 2020-12-31 17:46:49 +03:00
Egor Pugin
a4daf19dd3 Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2020-12-31 17:37:37 +03:00
Stefan Weil
96fbe776ea Partially revert cad0eb4d26 (fix layout_test)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-31 15:36:28 +01:00
Egor Pugin
a32c8b2d93 Remove GenericVector::compare_callback. This fixes several tests after previous commit. 2020-12-31 17:26:40 +03:00
Egor Pugin
c86325e2f7 Use TESS_API for every public symbol. Public symbol is exported from the library. This also applies to unit test and training symbols. Users will be limited to public api, but set of exported symbols will be wider still.
Remove TESS_LOCAL.
Fix several symbol issues that made visible with these changes.

All build systems must set -fvisibility-hidden for *nix systems.
2020-12-31 16:32:29 +03:00
Egor Pugin
4d817d09a5 Remove custom string hasher. 2020-12-31 14:26:23 +03:00
Egor Pugin
250fc0023e Misc. 2020-12-31 14:24:52 +03:00
Egor Pugin
3a66282e92 Remove GOOGLE_TESSERACT ifdefs. 2020-12-31 14:23:52 +03:00
Egor Pugin
d0a730e3d0 Misc. 2020-12-31 13:25:10 +03:00
Egor Pugin
c812d9d894 Use template instead of overloads. 2020-12-31 13:20:21 +03:00
Stefan Weil
cad0eb4d26 Replace more GenericVector by std::vector
This fixes two LGTM alerts and might improve the performance:

    This parameter of type GenericVector<STRING> is 80 bytes -
    consider passing a const pointer/reference instead.

    This parameter of type GenericVectorEqEq<const ParagraphMode*> is 80 bytes -
    consider passing a const pointer/reference instead.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-31 09:28:35 +01:00
Stefan Weil
fc4002dda8 Remove helpers.h from public API
Remove also outdated references to apitypes.h which no longer exists.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-31 09:06:16 +01:00
Egor Pugin
dfbd394a72 Export all simd matrices. 2020-12-31 03:27:18 +03:00
Egor Pugin
2c054b531c Fix linux build. 2020-12-31 03:06:39 +03:00
Egor Pugin
4ddc919ed0 Correctly use DEBUG macro. C++ compilers do not define it. Instead they define NDEBUG in optimized compilations. 2020-12-31 02:50:07 +03:00
Egor Pugin
3af30419db Move MAX_PATH def out from public header. 2020-12-31 02:35:28 +03:00
Egor Pugin
a0509b2feb Use std::swap instead of manual function. 2020-12-31 02:17:54 +03:00
Egor Pugin
89273c915d Remove empty DLLSYM macro. 2020-12-31 02:10:46 +03:00
Stefan Weil
4366d811d4 Fix TFile::DeSerialize, TFile::Serialize for empty vectors
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 19:15:56 +01:00
Stefan Weil
30eeb7f01a Replace some old-style type casts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-30 17:56:59 +01:00
Stefan Weil
faf0407dff Remove RecognizeForChopTest from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-30 17:55:40 +01:00
Stefan Weil
588ac3fed2 Remove TessTruthCallback from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-30 15:38:11 +01:00
Stefan Weil
ebafb19a43 Replace GenericVector<ParamsTrainingHypothesis> by std::vector<ParamsTrainingHypothesis>
This fixes an LGTM alert:

    This parameter of type ParamsTrainingHypothesis is 136 bytes -
    consider passing a const pointer/reference instead.

It might also improve the performance.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 13:26:44 +01:00
Stefan Weil
688ef20f62 Replace GenericVector<RowInfo> by std::vector<RowInfo>
This fixes an LGTM alert:

    This parameter of type RowInfo is 144 bytes -
    consider passing a const pointer/reference instead.

It might also improve the performance.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
536a676250 Replace GenericVector<WordData> by std::vector<WordData>
This fixes an LGTM alert:

    This parameter of type WordData is 112 bytes -
    consider passing a const pointer/reference instead.

It might also improve the performance.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
fbc807ce99 Remove unused local function CharCoverageMapToBitmap
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
83d97ffc80 Remove redundant comparison
This fixes an LGTM alert:

    Comparison is always true because i >= 2.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
f3acab507d Fix arguments for tprintf
This fixes two LGTM alerts:

    This argument should be of type 'int' but is of type '_Bit_reference'

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
53503b34be Fix declaration for C_BLOB
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 11:33:29 +01:00
Stefan Weil
7866677a0c avx2: Remove unused local variables
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 11:33:29 +01:00
Stefan Weil
96e3b52936 Remove unused function CompareSTRING
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 11:33:29 +01:00
Stefan Weil
2cf70d6164 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 10:51:12 +01:00
Stefan Weil
3a34f17037 Order and clean include statements
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 10:50:39 +01:00
Stefan Weil
3603c740e7 Fix ShapeTable::AddUnicharToResults (fix mastertrainer_test)
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 07:10:29 +01:00
Stefan Weil
4c94d09047 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 07:10:29 +01:00
Stefan Weil
deec8ef46f Replace std::list by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 07:10:29 +01:00
Stefan Weil
4043204c2b Use old genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-30 07:10:29 +01:00
Egor Pugin
482824c109 Fix trie's word sort comparator. 2020-12-30 02:37:53 +03:00
Egor Pugin
37e760d9c2 [test] Fix unicharset. 21->18 failed tests remaining. 2020-12-30 02:11:58 +03:00
Stefan Weil
f4e380f64a Remove serialis.h from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-29 11:28:50 +01:00
Stefan Weil
e2683e17fc Remove unused DocumentData::SaveToBuffer
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-29 10:43:00 +01:00
Egor Pugin
f190c85682
Update src/api/tesseractmain.cpp
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2020-12-29 00:22:28 +03:00
Stefan Weil
c8be22f313 Fix nullptr assignment in TessBaseAPI
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
90af3e7b5c Remove strngs.h from public API
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
03884c370c Replace STRING by std::string in ResultIterator
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
2369aa5604 Use std::vector, std::string in baseapi.h
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
72663a9a81 Use std::vector, std::string in baseapi.h
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
fec9c11c8c Use std::vector, std::string in baseapi.h
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
64e902ddf7 Remove genericvector.h from public API
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
f462389673 renderer for TessPDFRenderer
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
d55e5f4803 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
4a28d33c58 Replace GenericVector by std::vector in strngs.h and more places
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
3ddc88cccb Use std::vector in TessPDFRenderer
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
7c679e777d Use std::vector for allowed_scripts
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
32d53479ae Use std::vector for vars_vec, vars_values
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
085f6b2572 Use std::list for paragraph models
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
4ebba72919 Use std::vector for paragraph models
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
524fc67165 Fix tesseract --list-langs
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Egor Pugin
986b57dd4e Export symbol for unit test. 2020-12-28 04:58:26 +03:00
Egor Pugin
3187f2ef08 Move doubleptr.h to unittests as it is used only there. 2020-12-28 02:32:27 +03:00
Egor Pugin
4175679da6 Revert kdpair, genericheap changes. 2020-12-28 02:31:45 +03:00
Stefan Weil
289a34a40a Add const attribute for pdf_ttf
That moves its data into the text segment and reduces the total size
slightly:

   text	   data	    bss	    dec	    hex	filename
  39788	    693	      0	  40481	   9e21	old/libtesseract_la-pdfrenderer.o
  40360	     88	      0	  40448	   9e00	new/libtesseract_la-pdfrenderer.o

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 17:51:56 +01:00
Stefan Weil
7dca63caf1 More fixes for namespace tesseract
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 17:41:53 +01:00
Stefan Weil
7188b160ae Fix build with --disable-graphics
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 17:36:24 +01:00
Egor Pugin
aecbf79791 Add missing merge_unicharsets training tool to cmake and sw build. 2020-12-26 15:57:22 +03:00
Stefan Weil
317ef988a0 Add missing namespace prefix for GlobalParams() (fix build for some unit tests)
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 13:44:43 +01:00
Stefan Weil
418064f639 Add missing namespace prefix (fix build for merge_unicharsets)
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 13:09:39 +01:00
Egor Pugin
c8b8d266d6 Fix some of vector<bool> cases for msvc. 2020-12-26 04:17:13 +03:00
Egor Pugin
6b22972bc2 Fix linux build. 2020-12-26 04:15:42 +03:00
Egor Pugin
c3e04abe1e Inherit STRING from std::string. 2020-12-26 03:48:35 +03:00
Egor Pugin
4fc467a922 Inherit GenericVector from std::vector. Inherit kdpairs from std::pair. Rewrite some move ctors to modern C++ style. 2020-12-26 03:23:09 +03:00
Egor Pugin
04d3cfcf2f Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract 2020-12-26 00:55:37 +03:00
Egor Pugin
79a86f2582 Move all tesseract symbols into tesseract namespace. Fix include order in many places. 2020-12-26 00:55:30 +03:00
zdenop
ceadc4ddb8 remove inline declaration 2020-12-25 16:28:00 +01:00
Egor Pugin
14d52a79ba Remove .rc files. No need to add them into dll/exe. 2020-12-25 18:06:35 +03:00
zdenop
044921267f embed pdf.ttf to tesseract library #2551 2020-12-25 13:20:36 +01:00
Stefan Weil
cc133aa394 Fix text for fonts_dir parameter
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
34abba8698 Add terminating linefeed to fonts.conf
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
17a64eef1e Simplify code for PangoFontInfo::HardInitFontConfig
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
707ee70966 Use deprecated pango_fc_font_get_glyph for old Pango versions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 12:02:37 +01:00
Stefan Weil
f759142c95 Remove buggy Windows implementation for getting glyph from font
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:07:09 +01:00
Stefan Weil
7669d36a37 Use HarfBuzz instead of deprecated pango_fc_font_get_glyph
This fixes the crash on MacOS with M1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:03:05 +01:00
Stefan Weil
8c859a7329 Fix type cast from PangoFont to PangoFcFont
The original code crashes in pango_fc_font_get_glyph on MacOS with M1.

Replacing the type cast with the macro made for that conversion
gives at least an error message before crashing:

    (process:12546): GLib-GObject-WARNING **: 08:38:02.472: invalid cast from 'PangoCairoCoreTextFont' to 'PangoFcFont'
    zsh: segmentation fault  ./pango_font_info_test

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 08:45:11 +01:00
Stefan Weil
3efedabda3 automake: Flat build for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-19 15:25:21 +01:00
Stefan Weil
6fcf8d23bc Use more compiler and linker flags from pkg-config
This fixes some build issues with Homebrew on MacOS.

Signed-off-by: Stefan Weil <stefan@Sabines-Mac-mini.fritz.box>
2020-12-13 13:24:46 +01:00
Stefan Weil
490bd3ec8f Fix build with enabled TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-04 10:56:23 +01:00
Stefan Weil
ac116d1b28 Fix regression in Network::Serialize (fix issue #3167)
The regression was caused by a wrong string serialization in
commit 4613738a5e.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-03 19:36:58 +01:00
zdenop
279b0b2e37
Merge pull request #3160 from stweil/string2
Replace more occurrences of STRING by std::string of char*
2020-11-27 18:24:17 +01:00
Stefan Weil
65b11a1e12 Pack class SVMenuNode
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
a1849bc65c Pack struct CLASS_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
0bb46ac2e0 Pack struct BlamerBundle
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:17:27 +01:00
Stefan Weil
bf3774cc91 Use more const char*
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
4613738a5e Use const char* for filename and network_spec parameters
This replaces the proprietary STRING data type
(764 instead of 838 lines remaining).

It also removes STRING from osdetect.h and serialis.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
fbc4c809d9 Replace STRING by std::string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-31 14:08:39 +01:00
Stefan Weil
92b6c652f3 Use std::vector for scales_
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-29 08:00:11 +01:00
Stefan Weil
c15dd26b84 Don't pass scales_ to IntSimdMatrix::Init
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-28 20:35:53 +01:00
Stefan Weil
fe76142a3d Remove GenericVector::scale() again
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-28 16:24:59 +01:00
Stefan Weil
eaf72ace31 Prefer result from inverted image if the mean confidence is better
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-26 20:37:47 +01:00
Stefan Weil
cfb1fb2540 Try OCR on inverted line only if mean confidence is below 50 %
The old code looked for the minimum confidence which triggered
very often a 2nd OCR without improving the result.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-26 09:32:09 +01:00