Commit Graph

222 Commits

Author SHA1 Message Date
Stefan Weil
100d8cd29b lstmtester: Add missing space in log messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-14 14:12:47 +02:00
Stefan Weil
e84cb24def Move source files which are used for training only to src/training
They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 17:08:08 +02:00
Stefan Weil
315dd9df3f cmake: Don't link pthread on Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 15:24:00 +02:00
Zdenko Podobný
c5a50b93ce move fileio.cpp and fileio.h to training (this fix android build) 2019-08-04 21:26:39 +02:00
Egor Pugin
c58efee4ba Use pangocairo-1.43 for the moment. Remove private pango header. 2019-08-01 11:55:18 +03:00
Egor Pugin
f1a567e814
Try to fix #2599 2019-08-01 11:35:15 +03:00
Stefan Weil
23ef93ac4d cmake: Add missing pthread library
It is needed for C++ threads since commit 85068be405.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-26 07:45:51 +02:00
Stefan Weil
a2b13b49ff Simplify shell code (fixes warning from Codacy)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 21:33:24 +02:00
Stefan Weil
467f8f4140 Fix training script for macOS (issue #2578)
Bash on macOS does not support "|&":

    tesstrain_utils.sh: line 80: syntax error near unexpected token `&'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 17:18:44 +02:00
Stefan Weil
fcfdb7e56f Remove unused include statements
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:48:31 +02:00
Stefan Weil
85068be405 lstmtester: Replace SVSync::StartThread by std::thread
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
93427391c1 Replace SVAutoLock by std::lock_guard
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
36026e3c35 Replace SVMutex by std::mutex
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
bdc7abf518 Fix format strings for size_t arguments (CID 1402762, 1402767)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:57:19 +02:00
Egor Pugin
3b6f071ee8 Implement CMake+SW build.
Currently only Windows is supported.
You could try it as following:

    mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1
2019-07-08 18:50:30 +03:00
zhuangzhuang1988
18c67f4989 fix tesstrain.py error 2019-07-08 14:35:17 +08:00
Stefan Weil
1c1eb76c36 Use C++-11 code instead of TessCallback for Dawg::iterate_words
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
eeec9c66d4 training: Use C++-11 code for TestCallback
This allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
zhuangzhuang1988
99cb088708 close log file handle before move it. 2019-07-01 10:53:12 +08:00
zhuangzhuang1988
a3a361f73d fix logger file encoding error. 2019-06-28 18:29:52 +08:00
Stefan Weil
ea20bf0373 Remove dummy code from LSTMTrainer::InitTensorFlowNetwork
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 21:01:40 +02:00
Stefan Weil
41f91c96c8 cmake: Build training tools also on Linux and macOS
This enables CI tests for the code in src/training on Linux and macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 20:27:56 +02:00
Stefan Weil
df98bb7368 Move LSTMTrainer from libtesseract to libtesseract_training
LSTMTrainer is only used for training, so the shared library for
Tesseract can be made smaller.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 16:23:51 +02:00
Stefan Weil
bd13069fe8 Simplify class LSTMTrainer
The function pointers and callbacks file_reader_, file_writer_,
checkpointer_reader_ and checkpoint_writer_ are always set to
the same values. Replacing them by direct function calls
simplifies the code and allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 09:18:13 +02:00
zdenop
60b4c68d31 tesstrain_utils.sh: remove redundant code 2019-06-20 18:42:29 +02:00
zdenop
60aee9f821 create OUTPUT_DIR did not exist; fixes #2497 2019-06-16 15:07:16 +02:00
zdenop
fad96db497
Merge pull request #2494 from Shreeshrii/master
Allow saving of box/tiff pairs during legacy tesseract training
2019-06-14 20:44:49 +02:00
Shree
6fa4587949 Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:35:39 +00:00
Shree
45cdf741ae Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:32:41 +00:00
Shree
832c6edb97 Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:25:54 +00:00
James R. Barlow
a9890afd12 Fix text2image compilation on C++17 compilers
C++17 drops support for `std::random_shuffle`, breaking C++17 compilers
that run to compile text2image.cpp. std::shuffle is valid on C++11
through C++17, so use std::shuffle instead.

Due to the use `std::random_shuffle`, `text2image --render_ngrams`
would not give consistent results for different compilers or platforms.
With the current change, the same random number generator is used for
all platforms and initialized to the same seed, so training output
should be consistent.
2019-06-13 16:07:20 -07:00
Stefan Weil
9a4bd041c8 Fix build for unittests
Commit 29f2cff203 was the wrong fix
for the compiler warnings because it broke the unittest build.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 21:36:34 +02:00
Stefan Weil
ac999b2409 Remove unused macros
This fixes compiler warnings from clang++ like these ones:

    src/ccutil/params.cpp:34:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:67:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:68:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:78:9: warning: macro is not used [-Wunused-macros]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 20:27:21 +02:00
Stefan Weil
29f2cff203 training: Add missing static attributes
That fixes several warnings from clang++ like the following one:

    src/training/combine_lang_model.cpp:36:1: warning: no previous extern declaration for non-static variable 'FLAGS_lang_is_rtl' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 11:33:52 +02:00
Stefan Weil
a139d553a7 training: Move declarations from cpp files to h file
That fixes several warnings from clang++ like the following one:

    src/training/commontraining.cpp:95:1: warning: no previous extern declaration for non-static variable 'FLAGS_D' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
Stefan Weil
4bec4a69a0 Add missing static attributes
This fixes lots of compiler warnings like these ones:

    src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations]
    src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
Stefan Weil
2441e4d8ac Implement check for Tensorflow header file
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.

Simplify also the rules for linking Tensorflow.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 16:52:14 +02:00
Stefan Weil
9cdf041448 Remove "third_party/" in comments and update path names
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:12:52 +02:00
Stefan Weil
4382ab1a34 Support build with Tensorflow
It expects include files in /usr/include/tensorflow.

* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
  (so the Tensorflow sources are not needed for the build)
* Update Makefiles

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:11:31 +02:00
Bharat123rox
945ccac85a Fix syntax error 2019-05-22 10:10:12 +05:30
Bharat123rox
7f31a0634d Some LGTM fixes and potential bugfixes 2019-05-21 23:24:50 +05:30
Stefan Weil
d2ca81e794 Remove local definition of M_PI
It is defined for all platforms when math.h or cmath is included
after defining the macro _USE_MATH_DEFINES.

Define _USE_MATH_DEFINES before any include statement to make sure
that M_PI gets defined. It is not necessary to define it conditionally
only for Windows.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-20 21:18:52 +02:00
Stefan Weil
6b1e709b19 Fix Doxygen comments for void functions
Void functions should not use @return. It causes compiler warnings
like this one:

    src/classify/intproto.cpp:326:5: warning:
      '@return' command used in a comment that is attached to a function
      returning void [-Wdocumentation]

Some non-void functions also were documented with @return none.
Fix those comments, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 21:57:17 +02:00
Stefan Weil
4fbc0a257b commandlineflags: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:46 +02:00
Stefan Weil
78a957b989 Remove spaces a line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-13 18:54:42 +02:00
Stefan Weil
72c874140e Modernize code by replacing C type casts
This was done using clang-tidy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 09:04:51 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Robert Schubert
25a42ea42f fixed failure report for tesstrain commands:
- with `set -e` in effect, looking at stdout
  to detect failure is too late
2019-04-06 08:13:03 +02:00
Robert Schubert
d5584e793e fixed failure report for tesstrain commands:
- with `set -e` in effect, it does not make sense
  to query `$?` indirectly
2019-04-06 08:13:03 +02:00
Stefan Weil
802f42e821 Remove BOOL8, TRUE, FALSE from host.h
Remove unneeded include statements for host.h, add required ones and
update the comments for the remaining include statements.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:27:20 +02:00