Commit Graph

356 Commits

Author SHA1 Message Date
Shree Devi Kumar
106b3d1ed0 No --psm 6 for lstm.train 2021-01-12 12:42:53 +01:00
Egor Pugin
8cb1c62259 More std::vector. 2021-01-07 15:13:59 +03:00
Egor Pugin
9710bc0465 More std::vector. 2021-01-07 13:57:57 +03:00
Stefan Weil
d000df7e00 Remove remaining parts of tessopt (fix autotools build)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-05 23:06:17 +01:00
Egor Pugin
8e947a98b5 Remove emalloc. Replace it with malloc. To be replaced with new later. 2021-01-06 00:30:52 +03:00
Egor Pugin
6e94564152 [training] More unique ptrs. 2021-01-05 17:03:26 +03:00
Egor Pugin
4415209fd6 Remove tessopt. This fixes mastertrainer test in shared build. 2021-01-05 17:00:27 +03:00
Egor Pugin
c946a5610c Remove unused header. 2021-01-05 16:45:24 +03:00
Egor Pugin
8950e49a5d Remove unused var. 2021-01-05 16:45:07 +03:00
Egor Pugin
fb98b9b2f5 Use unique_ptr. 2021-01-05 16:00:22 +03:00
Egor Pugin
aa80aa5de1 More std::vector. 2021-01-05 15:54:30 +03:00
Egor Pugin
ca514ad91e [test] Return early on error. 2021-01-05 15:37:43 +03:00
Egor Pugin
4ed601956e More std::vector. 2021-01-05 14:46:11 +03:00
Stefan Weil
bb6dbd2cd8 Fix autotoools build with --disable-legacy
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-03 20:28:30 +01:00
Egor Pugin
664a718a63 Rename platform.h to export.h. 2021-01-01 00:18:36 +03:00
Egor Pugin
2c84c4beb2 [cmake] Make pango include dirs public. 2020-12-31 20:47:34 +03:00
Egor Pugin
32cb90f114 [cmake] Make pango deps public. 2020-12-31 20:33:01 +03:00
Egor Pugin
0cdb718835 Remove deleted util.h header. 2020-12-31 20:16:20 +03:00
Egor Pugin
9e1e6305b2 [cmake] Fix build. 2020-12-31 19:56:55 +03:00
Egor Pugin
6306393c91 [cmake] Implement shared builds. 2020-12-31 19:32:03 +03:00
Egor Pugin
07a1533a01 Move training lib sources into their own dirs. 2020-12-31 18:27:03 +03:00
Egor Pugin
1a53ca099a [cmake] tessopt is a static library. 2020-12-31 18:26:33 +03:00
Egor Pugin
a32c8b2d93 Remove GenericVector::compare_callback. This fixes several tests after previous commit. 2020-12-31 17:26:40 +03:00
Egor Pugin
c86325e2f7 Use TESS_API for every public symbol. Public symbol is exported from the library. This also applies to unit test and training symbols. Users will be limited to public api, but set of exported symbols will be wider still.
Remove TESS_LOCAL.
Fix several symbol issues that made visible with these changes.

All build systems must set -fvisibility-hidden for *nix systems.
2020-12-31 16:32:29 +03:00
Egor Pugin
4d817d09a5 Remove custom string hasher. 2020-12-31 14:26:23 +03:00
Egor Pugin
250fc0023e Misc. 2020-12-31 14:24:52 +03:00
Egor Pugin
3a66282e92 Remove GOOGLE_TESSERACT ifdefs. 2020-12-31 14:23:52 +03:00
Stefan Weil
fc4002dda8 Remove helpers.h from public API
Remove also outdated references to apitypes.h which no longer exists.

Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-31 09:06:16 +01:00
Stefan Weil
fbc807ce99 Remove unused local function CharCoverageMapToBitmap
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 12:14:43 +01:00
Stefan Weil
2cf70d6164 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 10:51:12 +01:00
Stefan Weil
3a34f17037 Order and clean include statements
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 10:50:39 +01:00
Stefan Weil
4c94d09047 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-30 07:10:29 +01:00
Stefan Weil
4043204c2b Use old genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-30 07:10:29 +01:00
Stefan Weil
f4e380f64a Remove serialis.h from public API
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-29 11:28:50 +01:00
Stefan Weil
90af3e7b5c Remove strngs.h from public API
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
fec9c11c8c Use std::vector, std::string in baseapi.h
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
64e902ddf7 Remove genericvector.h from public API
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
d55e5f4803 Replace more GenericVector by std::vector
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Stefan Weil
4a28d33c58 Replace GenericVector by std::vector in strngs.h and more places
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-28 21:03:29 +01:00
Egor Pugin
aecbf79791 Add missing merge_unicharsets training tool to cmake and sw build. 2020-12-26 15:57:22 +03:00
Stefan Weil
418064f639 Add missing namespace prefix (fix build for merge_unicharsets)
Signed-off-by: Stefan Weil <sw@weil.de>
2020-12-26 13:09:39 +01:00
Egor Pugin
79a86f2582 Move all tesseract symbols into tesseract namespace. Fix include order in many places. 2020-12-26 00:55:30 +03:00
Stefan Weil
cc133aa394 Fix text for fonts_dir parameter
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
34abba8698 Add terminating linefeed to fonts.conf
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
17a64eef1e Simplify code for PangoFontInfo::HardInitFontConfig
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 21:32:05 +01:00
Stefan Weil
707ee70966 Use deprecated pango_fc_font_get_glyph for old Pango versions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 12:02:37 +01:00
Stefan Weil
f759142c95 Remove buggy Windows implementation for getting glyph from font
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:07:09 +01:00
Stefan Weil
7669d36a37 Use HarfBuzz instead of deprecated pango_fc_font_get_glyph
This fixes the crash on MacOS with M1.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 09:03:05 +01:00
Stefan Weil
8c859a7329 Fix type cast from PangoFont to PangoFcFont
The original code crashes in pango_fc_font_get_glyph on MacOS with M1.

Replacing the type cast with the macro made for that conversion
gives at least an error message before crashing:

    (process:12546): GLib-GObject-WARNING **: 08:38:02.472: invalid cast from 'PangoCairoCoreTextFont' to 'PangoFcFont'
    zsh: segmentation fault  ./pango_font_info_test

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-22 08:45:11 +01:00
Stefan Weil
3efedabda3 automake: Flat build for src/training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-19 15:25:21 +01:00
Stefan Weil
6fcf8d23bc Use more compiler and linker flags from pkg-config
This fixes some build issues with Homebrew on MacOS.

Signed-off-by: Stefan Weil <stefan@Sabines-Mac-mini.fritz.box>
2020-12-13 13:24:46 +01:00
Stefan Weil
bf3774cc91 Use more const char*
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
4613738a5e Use const char* for filename and network_spec parameters
This replaces the proprietary STRING data type
(764 instead of 838 lines remaining).

It also removes STRING from osdetect.h and serialis.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-11-26 17:01:17 +01:00
Stefan Weil
7c4ef88dab Remove unused functions FontUtils::GetAllRenderableCharacters
They used the function pango_coverage_max which does nothing and
which has been deprecated since pango version 1.44.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-10-03 12:04:40 +02:00
Stefan Weil
cb3880fb15 Disable more code and data with GRAPHICS_DISABLED
Some runtime parameters which are only relevant with graphics enabled
were now removed from builds when graphics was disabled.

TableFinder::DisplayColSegmentGrid is never used, so remove it completely.

Builds with --disable-graphics significantly reduce the code size and avoid
some function calls which might be important for certain applications:

   text	   data	    bss	    dec	    hex	filename
3219230	  41136	  13920	3274286	 31f62e	.libs/libtesseract.so (--disable-graphics, old)
3211347	  40976	  13600	3265923	 31d583	.libs/libtesseract.so (--disable-graphics, new)
3360942	  43656	  15392	3419990	 342f56	.libs/libtesseract.so (default)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-07-09 11:23:33 +02:00
Stefan Weil
8137cf35a6 Use const char* for filename parameters
This replaces the proprietary STRING data type
(801 instead of 838 lines remaining).

It also removes STRING from osdetect.h and serialis.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-07-07 14:20:09 +02:00
Stefan Weil
62b085cb8d ScrollView: Remove C API callcpp.{cpp,h}
Use C++ class ScrollView directly instead of using an intermediate C API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-06-22 09:14:26 +02:00
Matej Knopp
e900252c1a Fix CMake build with DISABLED_LEGACY_ENGINE 2020-06-17 19:42:49 +02:00
Egor Pugin
0eaabc42c7
Update CMakeLists.txt 2020-05-12 11:49:15 +03:00
Egor Pugin
e720a26745
[cmake] Set inactivity timeout during icu download to 300 seconds.
Fixes #2972.
2020-05-09 18:55:45 +03:00
Stefan Weil
16553014e0 Replace references to the old wiki by new URLs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-02-03 11:37:41 +01:00
Stefan Weil
3d1f82d0e2 tesstrain.sh: Fix command line flag --help
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-05 10:10:55 +01:00
Stefan Weil
d2a2292f32 mftraining: Fix compiler warning
powerpc64le-linux-gnu-g++ warning:

    src/training/mftraining.cpp:209:5: warning:
        ‘%04d’ directive output may be truncated writing between 4 and 10 bytes
        into a region of size 8 [-Wformat-truncation=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-03 10:13:58 +01:00
amitdo
502ebe8ca9 Autotools: Pango, Cairo and ICU only required by training tools 2019-12-16 17:23:06 +02:00
Stefan Weil
6181acf367 automake: Flat build for src/cutil
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-26 16:20:46 +01:00
Stefan Weil
cafb1bbfd7 automake: Flat build for src/api
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-26 16:20:46 +01:00
Shreeshrii
99dfa8a680 Add separator and training_iteration to checkpoint name (#2752)
* Add separator and training_iteration to checkpoint name
* specify modelname_N.NN_NN_NN.checkpoint for intermediate checkpoint
2019-11-09 12:22:40 +01:00
maungd@battelle.org
3d7afb69ea Exposed the text2image option --ptsize to tesstrain.sh. Text2image has the
option --ptsize which defaults to 12.  This option is not exposed through
tesstrain.sh; thus, you cannot use tesstrain.sh to explore training with
different font sizes.  I made a small modification to expose the --ptsize
option to tesstrain.sh.  It defaults to 12 if not specified.
2019-11-01 15:10:58 -04:00
Egor Pugin
2bcc9d8093 Remove cppan build. 2019-10-30 21:37:38 +03:00
Egor Pugin
2a37f5dd62 Update includes to use <>. 2019-10-29 14:50:11 +03:00
amitdo
2f8884a64e Fix autotools build 2019-10-28 21:23:58 +02:00
amitdo
e1bae15547 Fix #include path of public headers 2019-10-28 19:10:30 +02:00
zdenop
fc629eae3b Subject: training: show error description for open/delete file 2019-10-21 16:31:57 +02:00
zdenop
36dc2ccf75 fix memory leak at PangoFontInfo::CanRenderString 2019-10-20 16:43:04 +02:00
zdenop
1ec34378d9 test for synthesized font faces. 2019-10-19 15:05:28 +02:00
zdenop
cbbe45d94b cmake: add minimum required version for pango and icu based on autotools 2019-10-19 15:00:49 +02:00
zdenop
37c7a5dd82 text2image: show pango version 2019-10-19 14:52:06 +02:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Stefan Weil
a730b5c4ff Remove STRING from the public Tesseract API
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
Stefan Weil
8cb677d6a2 Replace STRING arguments for LoadDataFromFile and SaveDataToFile
This is a step to eliminate the proprietary STRING data type
from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
Stefan Weil
97dda3d535 Fix CID 1386099 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
951f442303 Fix CID 1386105 (Logically dead code)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
64fc205e78 Fix CID 1402767 (Invalid type in argument to printf format specifier)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
43b2e9513b lstmtrainer: Fix diagnostic message
Signed character values must be converted to unsigned integers for %x.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-15 14:31:32 +02:00
Stefan Weil
100d8cd29b lstmtester: Add missing space in log messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-14 14:12:47 +02:00
Stefan Weil
e84cb24def Move source files which are used for training only to src/training
They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 17:08:08 +02:00
Stefan Weil
315dd9df3f cmake: Don't link pthread on Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 15:24:00 +02:00
Zdenko Podobný
c5a50b93ce move fileio.cpp and fileio.h to training (this fix android build) 2019-08-04 21:26:39 +02:00
Egor Pugin
c58efee4ba Use pangocairo-1.43 for the moment. Remove private pango header. 2019-08-01 11:55:18 +03:00
Egor Pugin
f1a567e814
Try to fix #2599 2019-08-01 11:35:15 +03:00
Stefan Weil
23ef93ac4d cmake: Add missing pthread library
It is needed for C++ threads since commit 85068be405.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-26 07:45:51 +02:00
Stefan Weil
a2b13b49ff Simplify shell code (fixes warning from Codacy)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 21:33:24 +02:00
Stefan Weil
467f8f4140 Fix training script for macOS (issue #2578)
Bash on macOS does not support "|&":

    tesstrain_utils.sh: line 80: syntax error near unexpected token `&'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 17:18:44 +02:00
Stefan Weil
fcfdb7e56f Remove unused include statements
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:48:31 +02:00
Stefan Weil
85068be405 lstmtester: Replace SVSync::StartThread by std::thread
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
93427391c1 Replace SVAutoLock by std::lock_guard
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
36026e3c35 Replace SVMutex by std::mutex
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
bdc7abf518 Fix format strings for size_t arguments (CID 1402762, 1402767)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:57:19 +02:00
Egor Pugin
3b6f071ee8 Implement CMake+SW build.
Currently only Windows is supported.
You could try it as following:

    mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1
2019-07-08 18:50:30 +03:00
zhuangzhuang1988
18c67f4989 fix tesstrain.py error 2019-07-08 14:35:17 +08:00