Commit Graph

1487 Commits

Author SHA1 Message Date
Stefan Weil
286d8275c7 Add support for image or image list by URL
This allows OCR of images from the internet without downloading them first:

    tesseract http://IMAGE_URL OUTPUT ...

It uses libcurl.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:10:45 +02:00
Stefan Weil
47d70d7014 Modernize code for LIST (fix some -Wold-style-cast warnings)
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)

This fixes lots of compiler warnings caused by old type casts.
2019-10-01 11:12:00 +02:00
Stefan Weil
672d67859f mfoutline: Modernize code
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment

This fixes several warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 21:33:15 +02:00
Stefan Weil
7ec5f0ca02 intmatcher: Avoid conversion from double to float and vice versa
This fixes some clang warnings:

    src/classify/intmatcher.cpp:48:49: warning:
      implicit conversion loses floating-point precision:
      'double' to 'const float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:34: warning:
      implicit conversion loses floating-point precision:
      'double' to 'float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:64: warning:
      implicit conversion increases floating-point precision:
      'float' to 'double' [-Wdouble-promotion]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 18:05:26 +02:00
Stefan Weil
6d259ebe44 Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare)
This fixes a clang warning:

    src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
      unsigned enum expression >= 0 is always true
      [-Wtautological-unsigned-enum-zero-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-29 22:13:27 +02:00
Stefan Weil
49e351508c Re-add strngs.h to public API
It is still needed.
This partially reverts commit a730b5c4ff.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 10:34:48 +02:00
Stefan Weil
8ad86d6494 Add missing linker flags for TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 09:42:37 +02:00
zdenop
d6aa866430 ignore #pragma optimize for clang-cl 2019-09-27 21:19:37 +02:00
Stefan Weil
74d5ce82a6 Remove vecfuncs.cpp and vecfunc.h
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 21:20:03 +02:00
Stefan Weil
7bddad59d1 Optimize class ChoiceIterator
Re-order a class variable to avoid memory holes and
remove unused class variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 09:43:57 +02:00
Noah Metzger
ff4c1d204d Fixed minor bug with the Choice iterator when lstm_choice_mode is not active.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-24 15:38:28 +02:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Egor Pugin
1fa7324cf7
Merge pull request #2668 from stweil/api
Remove STRING from the public Tesseract API
2019-09-23 01:02:26 +03:00
amitdo
0598879a00 Disable legacy build: Disable bitvec.h 2019-09-22 20:37:13 +02:00
Stefan Weil
a730b5c4ff Remove STRING from the public Tesseract API
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
Stefan Weil
8cb677d6a2 Replace STRING arguments for LoadDataFromFile and SaveDataToFile
This is a step to eliminate the proprietary STRING data type
from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
amitdo
1e13d1d4d5 Disable legacy build: Disable more unneeded code 2019-09-22 20:55:24 +03:00
zdenop
39a63c2837
Merge pull request #2663 from bertsky/fix-lstm-user-patterns
fix langdata (user words/patterns) file suffixes for LSTMs:
2019-09-20 15:32:54 +02:00
Stefan Weil
0c7cc5a4dd Fix CID 1405673 part 2 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-19 19:37:05 +02:00
Robert Schubert
5b976bfb55 fix langdata (user words/patterns) file suffixes for LSTMs:
- add another constructor for LSTMRecognizer
  which takes the language_data_path_prefix configured/selected
  at runtime and passes it to the internal CCUtil
- use this in Tesseract::init_tesseract_lang_data when LSTMs
  are available

(this was missing from 297d7d86ce)
2019-09-19 19:30:54 +02:00
amitdo
479a7b1ca0 Disabled legacy build: Disable more unneeded code 2019-09-19 19:00:13 +03:00
Stefan Weil
3b030b4aeb Fix CID 1405673 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-17 22:04:08 +02:00
Stefan Weil
85e8529a2e Fix CID 1164624 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-17 21:59:42 +02:00
Stefan Weil
b2999d8190 Fix comment for Textord::make_prop_words
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 15:03:45 +02:00
Stefan Weil
256701e2e0 Re-order initialisation in constructor of class ViterbiStateEntry
This fixes compiler warnings caused by
commit 091ce345f6:

    src/wordrec/lm_state.h💯7: warning: field 'cost'
      will be initialized after field 'curr_b' [-Wreorder]
    src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags'
      will be initialized after field 'dawg_info' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 14:33:32 +02:00
Stefan Weil
081521fb9f Move initial values for class ColPartition from constructor to header file
This fixes compiler warnings caused by
commit 5b4565b80b:

    src/textord/colpartition.cpp:91:24: warning: field 'last_column_'
      will be initialized after field 'column_set_' [-Wreorder]
    src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_'
      will be initialized after field 'nearest_neighbor_above_' [-Wreorder]
    src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_'
      will be initialized after field 'owns_blobs_' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 14:33:32 +02:00
Stefan Weil
8f66020821 Re-order initialisation in constructors of classes Dawg and DawgPosition
This fixes compiler warnings caused by
commit ecf0f2dee5:

    src/dict/dawg.h:202:9: warning: field 'type_' will be initialized
      after field 'lang_' [-Wreorder]
    src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized
      after field 'dawg_ref' [-Wreorder]
    src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized
      after field 'punc_ref' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 14:31:32 +02:00
Stefan Weil
b466cead8e Add more initial values for class Classify from constructor to header file
This fixes compiler warnings caused by
commit 751fcd2b11:

    src/classify/classify.cpp:176:7: warning:
      field 'EnableLearning' will be initialized after
      field 'il1_adaption_test' [-Wreorder]
    src/classify/classify.cpp:187:7: warning:
      field 'dict_' will be initialized after
      field 'static_classifier_' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 14:31:32 +02:00
Stefan Weil
91b3248af3 Fix CID 1164666 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 22:01:25 +02:00
Stefan Weil
fc6899d898 Fix CID 1164664 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 21:52:51 +02:00
Stefan Weil
930e11996c Fix CID 1375402 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 21:17:12 +02:00
Stefan Weil
408d6e8b72 simd: Check OSXSAVE bit before calling _xgetbv
Both checks are needed for AVX, AVX2 and FMA checks.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 19:35:37 +02:00
Stefan Weil
627faa6f9c Remove UnicharAmbigs for builds without legacy code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 19:11:30 +02:00
amitdo
2134cd7867 Disabled legacy engine build: Disable code related to ambigs. 2019-09-15 19:11:30 +02:00
Stefan Weil
0c960c3cc5 Fix 1164647 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-15 14:25:48 +02:00
amitdo
994596842e 'Disabled leagcy engine' build: don't include unused header 2019-09-15 12:35:36 +03:00
Egor Pugin
6a9584fbc2
Merge pull request #2650 from stweil/cid
Fix several issues reported by Coverity Scan
2019-09-14 21:18:37 +03:00
Stefan Weil
763f4781e8 Fix CID 1164662 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 19:22:56 +02:00
Stefan Weil
6fd58d2897 Fix CID 1164659 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 19:20:14 +02:00
Stefan Weil
c3500e8d95 Fix CID 1164657 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 19:11:02 +02:00
Stefan Weil
1d3ee3b2a7 Fix CID 1164649 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:37:00 +02:00
Stefan Weil
bd1083904d Fix CID 1164648 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:32:29 +02:00
Stefan Weil
80f367c6f4 Fix CID 1164644 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:26:49 +02:00
Stefan Weil
7caded8e6b Fix CID 1164643 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:24:26 +02:00
Stefan Weil
3127242bcd Fix CID 1164638 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:18:15 +02:00
Stefan Weil
06de3075e0 Fix CID 1164636 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:13:06 +02:00
Stefan Weil
052f9ca0bc Fix CID 1164634, CID 1164635 (Uninitialized pointer field)
Remove the unused dummy member variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 17:12:39 +02:00
Stefan Weil
97dda3d535 Fix CID 1386099 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
46f21a4182 Fix CID 1164633 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
9ea579bf1b Fix CID 1164628 ff (Uninitialized pointer field) and optimize class ParamContent
Only one of bIt, dIt, iIt and sIt is used, so put all four in a union.
This fixes CID 1164628, CID 1164629, CID 1164630 and CID 1164631.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
74b552fc31 Remove unused FeatureEnabled from FEATURE_DEFS_STRUCT
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
9f709404f9 Fix CID 1164622 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
5b1f0dbd4b Fix CID 1164620 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
951f442303 Fix CID 1386105 (Logically dead code)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
64fc205e78 Fix CID 1402767 (Invalid type in argument to printf format specifier)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 15:43:50 +02:00
Stefan Weil
f62a895f74 Remove unused italic, bold in class BLOCK_RES and class WORD_RES
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-14 11:53:58 +02:00
Stefan Weil
ceb8af889e Fix CID 1340276 (Uninitialized scalar field) for class BLOB_CHOICE
xgap_before_ and xgap_after_ are never used, so remove them.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 22:15:47 +02:00
Stefan Weil
5fdd32bea8 Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch
secondary_beam_size_ is set but never used, so remove it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 22:09:03 +02:00
Stefan Weil
737173a84d Fix CID 1375401 (Uninitialized scalar field) for class Dawg
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 22:03:10 +02:00
Stefan Weil
edba74d64f Fix CID 1400760 (Uninitialized scalar field) for class BLOCK
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 21:58:05 +02:00
Stefan Weil
8ff321e41a Fix two issues reported by Coverity Scan and modernize class WERD_RES
Report from Coverity Scan:

    CID 1405560 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
    2. uninit_member: Non-static class member end is not initialized in
    this constructor nor in any functions that it calls.

    CID 1405561 [...]

Modernize and optimize class WERD_RES. This not only fixes the issues
but also reduces the size and eliminates the functions InitNonPointers
and InitPointers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 21:51:36 +02:00
Stefan Weil
ecf0f2dee5 Optimize classes Trie, Dawg and DawgPosition
Reduce size from 368 to 352 bytes for Trie, 72 to 64 bytes for Dawg
and 40 to 24 bytes for DawgPosition by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-13 08:15:01 +02:00
Stefan Weil
efd8dea587 Optimize classes CLIST_ITERATOR, ELIST_ITERATOR, ELIST2_ITERATOR
Reduce size from 56 to 48 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 22:03:03 +02:00
Stefan Weil
751fcd2b11 Optimize class Classify
Reduce size from 138016 to 13000 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 21:46:55 +02:00
Stefan Weil
0ad08a99b0 Optimize class TFile
Reduce size from 24 to 16 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 20:17:05 +02:00
Stefan Weil
5b4565b80b Optimize class ColPartition
Reduce size from 248 to 224 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 20:04:27 +02:00
Stefan Weil
5a12273650 Optimize struct LMConsistencyInfo
Reduce size from 104 to 96 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 20:04:27 +02:00
Stefan Weil
091ce345f6 Optimize class ViterbiStateEntry
Reduce size from 232 to 216 bytes by avoiding holes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 20:04:27 +02:00
Stefan Weil
913cbe6eae Modernize and optimize BLOBNBOX and remove BLOBNBOX::ConstructionInit
The class no longer uses bit fields. Re-ordering the member variables
avoids holes and reduces the size of BLOBNBOX from 168 to 152 bytes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-12 09:07:48 +02:00
Stefan Weil
a922745d9a tfnetwork: Fix info text
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-11 19:10:25 +02:00
Stefan Weil
5fa09f184f RecodedCharIDHash: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix this runtime error in recodebeam_test and unicharcompress_test:

    src/ccutil/unicharcompress.h:84:27: runtime error:
      left shift of 267 by 28 places cannot be represented in type 'int'

code has up to kMaxCodeLen (9) values, so the highest possible value for
i is 8, and the shift value can reach 7 * 8 = 56.

That requires an uint64_t data type.
size_t would fit for 64 bit hosts, but be too small for 32 bit hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
Stefan Weil
4a2d5a2e8d OSResults: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix this runtime error in osd_test and textlineprojection_test:

    src/ccmain/osdetect.cpp:109:14: runtime error: division by zero

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
Stefan Weil
5c6fade555 BitVector: Fix runtime errors detected by UndefinedBehaviorSanitizer
Fix these runtime errors in mastertrainer_test:

    src/ccutil/bitvector.cpp:119:18: runtime error:
      null pointer passed as argument 2, which is declared to never be null
    src/ccutil/bitvector.cpp:124:10: runtime error:
      null pointer passed as argument 1, which is declared to never be null

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-10 15:56:32 +02:00
zdenop
98c7aaa343
Lstm choice ril (#2635)
Lstm choice ril
2019-09-06 19:12:00 +02:00
Stefan Weil
9f32032517 ccutil: Remove old comments
There is no CLIST2 in the current code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-05 17:52:42 +02:00
Stefan Weil
b6933a1082 Use type bool for boolean values in class BLOBNBOX
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-03 19:56:59 +02:00
Noah Metzger
c350077b96 Made the lstm_choice mode compatible with the hocr_char_boxes mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:54 +02:00
Noah Metzger
e8b9c10d07 Clean up lstm_choice_mode and cut it down to 2 modes instead of 4
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-02 11:09:53 +02:00
Stefan Weil
fdf4067296 Fix warnings from LGTM
This fixes three LGTM warnings:

    Multiplication result may overflow 'float' before it is converted to 'double'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 22:04:24 +02:00
Stefan Weil
dc90741f1b Fix crash when function lookup tables are accessed with NaN
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-30 13:42:09 +02:00
Stefan Weil
7968f50fe6 capi: Add missing PSM_RAW_LINE to TessPageSegMode
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-25 09:08:09 +02:00
zdenop
0ded672067 fix typo 2019-08-18 18:47:32 +02:00
Stefan Weil
00cff79f7f simd: Check whether the OS supports FMA, AVX, ...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-16 22:51:17 +02:00
Stefan Weil
43b2e9513b lstmtrainer: Fix diagnostic message
Signed character values must be converted to unsigned integers for %x.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-15 14:31:32 +02:00
Stefan Weil
100d8cd29b lstmtester: Add missing space in log messages
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-14 14:12:47 +02:00
Stefan Weil
a86251c62b classify/Makefile: Fix inconsistent style
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-13 21:35:59 +02:00
Egor Pugin
423a188513 Export some classify vars. 2019-08-13 20:12:21 +03:00
Stefan Weil
46e2a0f106 Remove more code for builds with disabled legacy engine
Now the Tesseract library no longer includes unused code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-13 17:49:10 +02:00
Egor Pugin
73f713519c
Merge pull request #2614 from stweil/training
Move source files which are used for training only to src/training
2019-08-12 19:35:50 +03:00
Stefan Weil
e84cb24def Move source files which are used for training only to src/training
They are moved from src/classify and src/lstm to src/training.

This reduces the size of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 17:08:08 +02:00
Stefan Weil
ba17bc8204 OpenCL: Add static attribute for kernel_src
It is only used in openclwrapper.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 15:13:45 +02:00
Stefan Weil
970622fbd1 Remove unused functions create_edges_window, draw_raw_edge
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 15:04:10 +02:00
Stefan Weil
23e605911f Remove unused function truncate_path and related files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 14:48:56 +02:00
Stefan Weil
bce585286d Remove global array kPolyBlockNames from Tesseract library
It is only used in unittest/layout_test.cc after moving a test from
baseapi_test.cc to that file, so it can be made local.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 14:33:55 +02:00
Stefan Weil
beec85e023 Remove UNICHARSET::load_from_inmemory_file and related code
The method was only used in unittest where it can be replaced by
UNICHARSET::load_from_file which also simplifies the code.

This allows removing the class InMemoryFilePointer and fixes a TODO.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-12 13:07:15 +02:00
Stefan Weil
315dd9df3f cmake: Don't link pthread on Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-07 15:24:00 +02:00
Stefan Weil
b8079d8ce1 universalambigs: Add hack to fix builds with Microsoft compiler
The MS compiler only accepts string constants up to 65535 characters,
so shorten the string for that compiler to fix the compilation.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-06 15:46:07 +02:00
Zdenko Podobný
c5a50b93ce move fileio.cpp and fileio.h to training (this fix android build) 2019-08-04 21:26:39 +02:00
Stefan Weil
6acab45837 universalambigs: Replace octal characters by UTF-8 string
This improves readability and reduces the file size.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-04 19:21:59 +02:00
Stefan Weil
8127b4dd27 Clean ambigs.h
* Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator
* Move some declarations to ambigs.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-08-04 19:21:59 +02:00
Stefan Weil
23ef93ac4d cmake: Add missing pthread library
It is needed for C++ threads since commit 85068be405.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-26 07:45:51 +02:00
Stefan Weil
e6ca7f3ec6 hocrrenderer: Add missing escaping of special characters in HTML output
This converts special character like '<' or '>' to the
correct HTML entities.

Optimize also the code a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-19 13:53:36 +02:00
Stefan Weil
2679cae5d8 Simplify code by using ClipToRange
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-19 13:37:39 +02:00
Stefan Weil
4b2927ae41 LSTMRecognizer: Add non const get functions
This allows removing several const casts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 11:26:51 +02:00
Stefan Weil
4cb3f34c09 Improve formatting of hOCR output with character boxes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 11:07:18 +02:00
Stefan Weil
9195a904a7 Use auto data type for results of std::ftell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 10:56:17 +02:00
Stefan Weil
4132194c49 Remove unused filesize_ from class InputBuffer
This also simplifies the constructors.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-18 10:48:27 +02:00
Stefan Weil
a2b13b49ff Simplify shell code (fixes warning from Codacy)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 21:33:24 +02:00
Stefan Weil
d4e0ab3014 Use long instead of off_t for result from ftell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 21:14:42 +02:00
Stefan Weil
467f8f4140 Fix training script for macOS (issue #2578)
Bash on macOS does not support "|&":

    tesstrain_utils.sh: line 80: syntax error near unexpected token `&'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 17:18:44 +02:00
Stefan Weil
f92181561c Fix some compiler warnings (unused local variables)
gcc warnings:

    src/classify/protos.cpp:85:7: warning: unused variable ‘i’ [-Wunused-variable]
    src/classify/protos.cpp:86:7: warning: unused variable ‘Bit’ [-Wunused-variable]
    src/classify/protos.cpp:89:14: warning: unused variable ‘Config’ [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-17 07:47:28 +02:00
Stefan Weil
a419f2d78b Modernize BIT_VECTOR a little bit
This removes one more user of Emalloc / Efree.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-16 22:09:08 +02:00
zdenop
c8374cc528
Merge pull request #2576 from noahmetzger/LSTMChoiceRIL
Implemented improved character bounding box algorithm
2019-07-16 12:25:17 +02:00
zdenop
f4925077e8
Merge pull request #2574 from stweil/fix
classify: Use fixed size bit vector
2019-07-16 12:22:48 +02:00
zdenop
cb5c78be7d
Merge pull request #2572 from adaptech-cz/wordBoundsOn2ndPass
Give word's bounds to callback also during second pass
2019-07-16 12:19:31 +02:00
Noah Metzger
3a5e508934 Implemented improved bounding box algorithm
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-16 11:38:50 +02:00
Stefan Weil
028fff6edd classify: Use fixed size bit vector
The vector was already limited to MAX_NUM_PROTOS (512) entries or 64 bytes
in the old code. Now it uses that size right from the start which avoids
reallocating it later when entries are added.

The old code which reallocated the vector to expand it was buggy because
the realloc function can return a different pointer, but the code still
used the original pointer to reset the new bits.

Function ExpandBitVector is now unused and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-16 10:18:11 +02:00
Robert Pösel
f99fcd7691 Give word's bounds to callback also during second pass 2019-07-16 09:11:06 +02:00
Stefan Weil
5bbb7f59a6 Remove structures.*
It only provided the functions new_cell, free_cell which could be replaced by new, delete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-16 07:03:52 +02:00
Stefan Weil
3621272051 Remove cutil_class.*
It is no longer needed since commit 4523ce9f7d.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-16 07:03:52 +02:00
Stefan Weil
ea462b2c03 Remove unused functions reverse16, reverse32
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 21:50:46 +02:00
Stefan Weil
c8cb925813 Remove non portable sleep by std::this_thread::sleep_for
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 16:00:07 +02:00
Stefan Weil
fcfdb7e56f Remove unused include statements
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:48:31 +02:00
Stefan Weil
ba0c55adc5 svutil: Remove SVSync::StartThread and SVSync::ExitThread
Both are unused now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
85068be405 lstmtester: Replace SVSync::StartThread by std::thread
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
43a281893f scrollview: Replace SVSync::StartThread by std::thread
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
a6d723bf10 Replace SVSync::StartThread by std::thread and use std::this_thread::yield
Using yield instead of a sleep makes running imagedata_test much faster.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 14:30:51 +02:00
Stefan Weil
13bb4623b1 Use std::lock_guard to protect a code block
This is simpler than using lock() / unlock() explicitly.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
93427391c1 Replace SVAutoLock by std::lock_guard
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
c0b8ee3b82 Replace CCUtilMutex by std::mutex
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
Stefan Weil
36026e3c35 Replace SVMutex by std::mutex
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-15 12:01:28 +02:00
zdenop
56d4fdce00
Merge pull request #2554 from noahmetzger/LSTMChoiceRIL
Improved lstm_choice_mode
2019-07-15 11:51:52 +02:00
Noah Metzger
2dd5d0d60a Fixed a bug when first decode iteration stays empty and added some comments.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-15 10:05:22 +02:00
Stefan Weil
61eab60fe3 arch: Reduce number of include files for dot product functions
dotproductavx.h and dotproductsse.h declared only two functions.
Move those declarations to dotproduct.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-12 23:18:00 +02:00
Stefan Weil
2d5b166876 Add dot product implementation for Intel FMA (double = tessdata_best)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-12 23:18:00 +02:00
Stefan Weil
9259ed8f26 Optimize tprintf implementation
It no longer uses a local buffer, so it needs less memory
and no mutex.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 20:59:07 +02:00
Stefan Weil
2aebd10fb7 FPRow: Add missing initialisation for scalar (CID 1402754)
Modernize the code also a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 17:15:55 +02:00
Stefan Weil
bdc7abf518 Fix format strings for size_t arguments (CID 1402762, 1402767)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:57:19 +02:00
Noah Metzger
11a4cd298b Added parameters for the LSTM CTC Choice mode
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Noah Metzger
f2d685a90f Added CTC-based Symbolchoices.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-07-10 16:34:41 +02:00
Stefan Weil
ee04347347 Fix format string for 64 bit integer (CID 1402986)
Commit c1264c189e was not the right fix.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 16:20:50 +02:00
Stefan Weil
890b810a9e tfnetwork: Add missing return statement (CID 1402992)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-10 08:20:52 +02:00
Egor Pugin
3b6f071ee8 Implement CMake+SW build.
Currently only Windows is supported.
You could try it as following:

    mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1
2019-07-08 18:50:30 +03:00
Egor Pugin
84ffcc0d38
Merge pull request #2548 from zhuangzhuang/fix_tesstrain_py_error
fix tesstrain.py error
2019-07-08 11:25:41 +03:00
zhuangzhuang1988
18c67f4989 fix tesstrain.py error 2019-07-08 14:35:17 +08:00
zhuangzhuang
9eb997fc0b fix windows stdout messy code (#2546)
* fix windows stdout messy code

* fix type name error

* remoe unnecessary  codepoint check.
2019-07-08 09:33:53 +03:00
Stefan Weil
d653bb61f3 genericvector: Remove redundant declarations
tesseract::FileReader and tesseract::FileWriter are already declared
in serialis.h which is included by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-05 18:47:15 +02:00
Dmitry Bely
74145f0686 Fix crash in Tesseract::classify_word_and_language() when tessedit_timing_debug is enabled 2019-07-05 12:36:25 +02:00
zdenop
01535706ec
Merge pull request #2539 from stweil/tesscallback
Replace tesscallback.h and related proprietary data types by C++-11 functionals
2019-07-05 10:52:06 +02:00
Stefan Weil
134eb39960 Remove tesscallback.h
It is no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
3bae459823 Use C++-11 code instead of TessCallback for WERD_RES::ConditionalBlobMerge
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
e61c828dcd Use C++-11 code instead of TessCallback for UNICHARSET::load_via_fgets
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
0ea8ada308 Use C++-11 code instead of TessCallback for WidthCallback
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
1c1eb76c36 Use C++-11 code instead of TessCallback for Dawg::iterate_words
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
3fb15b3891 Use C++-11 code instead of TessCallback for ObjectCache::Get
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
56d8210909 Use C++-11 code instead of TessCallback for TruthCallback
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
c33b05be55 Use C++-11 code instead of TessCallback for PointerVector::compact
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
cc0405298b Use C++-11 code instead of TessCallback for read, write
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
242e1db7fa Use C++-11 code instead of TessCallback for function set_compare_callback
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
ffd8101986 Use C++-11 code instead of TessCallback for function set_clear_callback
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
ded24d0367 ccmain: Use C++-11 code instead of TessCallback1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
eeec9c66d4 training: Use C++-11 code for TestCallback
This allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 16:03:30 +02:00
Stefan Weil
201ba0dd53 Fix handling of single pages from multipage TIFF files (issue #2537)
That case now uses Leptonica to deliver the desired image instead of
using an inefficient loop in the Tesseract code.

See commit 54fafc4e2e which used similar
code in the past.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 15:56:57 +02:00
Stefan Weil
f1c6564cd7 Revert "fix read wrong tiff page."
This reverts commit 75d230a7ac.

That commit introduced new problems (memory leak, potential endless loop)
and style issues.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 15:44:07 +02:00
Stefan Weil
fd001c3ab9 Fix linker error with disabled legacy engine (issue #2532)
Commit 3871caae86 introduced a build
regression when the legacy engine was disabled.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-04 13:47:38 +02:00
zhuangzhuang1988
75d230a7ac fix read wrong tiff page. 2019-07-04 12:32:18 +08:00
zhuangzhuang1988
4d4c16bce1 fix start ScrollView.jar failed when lstmtraining 2019-07-03 07:27:50 +02:00
zhuangzhuang1988
99cb088708 close log file handle before move it. 2019-07-01 10:53:12 +08:00
zhuangzhuang1988
a3a361f73d fix logger file encoding error. 2019-06-28 18:29:52 +08:00
Stefan Weil
5895534b5e Update enum from unicode/uchar.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-25 10:55:33 +02:00
Stefan Weil
c1264c189e Fix format string for 64 bit integer
This fixes also a warning from gcc.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-23 09:31:09 +02:00
Stefan Weil
dfd35d3e27 baseapi: Remove old code
The workaround is no longer needed because _splitpath and _MAX_FNAME
were removed in commit cc0d87c5b8.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-23 09:15:32 +02:00
Stefan Weil
dd261e8d42 Replace code using _splitpath_s (win32)
That simplifies the code and removes a dependency on "newer"
versions of Windows.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-23 09:15:15 +02:00
Stefan Weil
f522b039a5 Remove outdated comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 21:03:19 +02:00
Stefan Weil
ea20bf0373 Remove dummy code from LSTMTrainer::InitTensorFlowNetwork
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 21:01:40 +02:00
Stefan Weil
41f91c96c8 cmake: Build training tools also on Linux and macOS
This enables CI tests for the code in src/training on Linux and macOS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 20:27:56 +02:00
Egor Pugin
ab28a03e93
Merge pull request #2514 from stweil/tessresultcallback
Move LSTMTrainer from libtesseract to libtesseract_training
2019-06-22 18:34:49 +03:00
Stefan Weil
df98bb7368 Move LSTMTrainer from libtesseract to libtesseract_training
LSTMTrainer is only used for training, so the shared library for
Tesseract can be made smaller.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 16:23:51 +02:00
Stefan Weil
cb2957b3d2 Replace callback by direct function calls in TessBaseAPI::GetComponentImages
The new code avoids dynamic memory allocation, uses faster function calls
and allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 14:54:31 +02:00
Stefan Weil
3159f42257 Remove unused GenericVector::dot_product
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 12:59:21 +02:00
Stefan Weil
bef73d9956 Remove unused GenericVector::compact
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 12:59:08 +02:00
Egor Pugin
3c6a04ea1a
Merge pull request #2512 from stweil/tessresultcallback
Simplify class LSTMTrainer
2019-06-22 13:41:21 +03:00
Stefan Weil
2a9b2fb32a Remove wrong description for GenericVector::set_compare_callback and simplify code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 11:22:07 +02:00
Stefan Weil
bd13069fe8 Simplify class LSTMTrainer
The function pointers and callbacks file_reader_, file_writer_,
checkpointer_reader_ and checkpoint_writer_ are always set to
the same values. Replacing them by direct function calls
simplifies the code and allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-22 09:18:13 +02:00
Stefan Weil
3871caae86 Simplify indirect call of LMPainPoints::GeneratePainPoint
It does neither need a temporary TessResultCallback2 nor the function
LMPainPoints::GenerateForBlamer.

This also allows removing more code from tesscallback.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-21 17:09:33 +02:00
zdenop
60b4c68d31 tesstrain_utils.sh: remove redundant code 2019-06-20 18:42:29 +02:00
Stefan Weil
5f23290655 tesscallback: Remove more unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-20 08:38:00 +02:00
Stefan Weil
2c78735d97 ocrfeatures: Remove locally used functions from global interface
ReadFeature, WriteFeature are only used locally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-17 15:09:39 +02:00
zdenop
a3593d994b
Merge pull request #2499 from stweil/embedded
Remove code for embedded build
2019-06-17 10:24:45 +02:00
Stefan Weil
674d6a90d8 Remove code for embedded build
That code is unrelated to Tesseract and can be easily implemented
by external projects which require it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-17 09:55:33 +02:00
zdenop
60aee9f821 create OUTPUT_DIR did not exist; fixes #2497 2019-06-16 15:07:16 +02:00
zdenop
fad96db497
Merge pull request #2494 from Shreeshrii/master
Allow saving of box/tiff pairs during legacy tesseract training
2019-06-14 20:44:49 +02:00
Shree
6fa4587949 Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:35:39 +00:00
Shree
45cdf741ae Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:32:41 +00:00
Shree
832c6edb97 Allow saving of box/tiff pairs during base tesseract training 2019-06-14 09:25:54 +00:00
James R. Barlow
a9890afd12 Fix text2image compilation on C++17 compilers
C++17 drops support for `std::random_shuffle`, breaking C++17 compilers
that run to compile text2image.cpp. std::shuffle is valid on C++11
through C++17, so use std::shuffle instead.

Due to the use `std::random_shuffle`, `text2image --render_ngrams`
would not give consistent results for different compilers or platforms.
With the current change, the same random number generator is used for
all platforms and initialized to the same seed, so training output
should be consistent.
2019-06-13 16:07:20 -07:00
Stefan Weil
fefd521a49 Add dot product implementation using std::inner_product
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-31 12:07:17 +02:00
Stefan Weil
e0c2f0a782 Fix crash in PreloadRenderers with nullptr outputbase
The crash could be triggered by a wrong command line.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-29 07:46:29 +02:00
Stefan Weil
9a4bd041c8 Fix build for unittests
Commit 29f2cff203 was the wrong fix
for the compiler warnings because it broke the unittest build.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 21:36:34 +02:00
Stefan Weil
2c23e7ead5 scanedg: Add const attributes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 20:27:21 +02:00
Stefan Weil
4b3bbd908a Remove EXTERN macro
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 20:27:21 +02:00
Stefan Weil
ac999b2409 Remove unused macros
This fixes compiler warnings from clang++ like these ones:

    src/ccutil/params.cpp:34:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:67:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:68:9: warning: macro is not used [-Wunused-macros]
    src/cutil/oldlist.cpp:78:9: warning: macro is not used [-Wunused-macros]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 20:27:21 +02:00
Stefan Weil
8c8eb21bc5 Fix compiler errors for old gcc
Travis CI with gcc 4.8 failed with errors.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 15:38:40 +02:00
Stefan Weil
a86143a41d Remove some unused functions, constants and variables
This fixes compiler warnings, for example:

    src/ccutil/strngs.cpp:36:11: warning: unused variable 'kMaxDoubleSize' [-Wunused-const-variable]
    src/viewer/svutil.cpp:320:13: warning: unused function 'TessFreeAddrInfo' [-Wunused-function]
    src/ccstruct/werd.cpp:32:19: warning: unused variable 'CANT_SCALE_EDGESTEPS' [-Wunused-const-variable]
    src/textord/bbgrid.cpp:103:10: warning: unused variable 'old_tright' [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 11:52:31 +02:00
Stefan Weil
29f2cff203 training: Add missing static attributes
That fixes several warnings from clang++ like the following one:

    src/training/combine_lang_model.cpp:36:1: warning: no previous extern declaration for non-static variable 'FLAGS_lang_is_rtl' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 11:33:52 +02:00
Stefan Weil
a139d553a7 training: Move declarations from cpp files to h file
That fixes several warnings from clang++ like the following one:

    src/training/commontraining.cpp:95:1: warning: no previous extern declaration for non-static variable 'FLAGS_D' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
Stefan Weil
389285010c featdefs: Add missing include statement
It is needed for PicoFeatureLength. This fixes a compiler warning.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
Stefan Weil
4bec4a69a0 Add missing static attributes
This fixes lots of compiler warnings like these ones:

    src/api/baseapi.cpp:113:13: warning: no previous extern declaration for non-static variable 'kInputFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:117:13: warning: no previous extern declaration for non-static variable 'kOldVarsFile' [-Wmissing-variable-declarations]
    src/api/baseapi.cpp:97:10: warning: no previous extern declaration for non-static variable 'stream_filelist' [-Wmissing-variable-declarations]
    src/ccmain/equationdetect.cpp:46:10: warning: no previous extern declaration for non-static variable 'equationdetect_save_bi_image' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:09 +02:00
Stefan Weil
7e7811ff92 bits16: Modernize code
This also fixes warnings like the following one from clang++:

    src/ccmain/pgedit.cpp:114:15: warning: declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-26 08:53:08 +02:00
Stefan Weil
334d9b4633 unicodes: Optimize code by using constexpr and removing unused globals
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-25 14:51:28 +02:00
Stefan Weil
23d05a5e1b featdefs: Optimize code by using constexpr
This also fixes some warnings from clang++:

    src/classify/featdefs.cpp:47:15: warning: declaration requires a global constructor [-Wglobal-constructors]
    src/classify/featdefs.cpp:57:15: warning: declaration requires a global constructor [-Wglobal-constructors]
    src/classify/featdefs.cpp:66:15: warning: declaration requires a global constructor [-Wglobal-constructors]
    src/classify/featdefs.cpp:75:15: warning: declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-25 14:46:36 +02:00
Stefan Weil
7628112273 Fix broken build for Leptonica < 1.77
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-25 14:23:43 +02:00
Stefan Weil
55901a480f Remove classify/cutoffs.h
It only defined CLASS_CUTOFF_ARRAY and some unused definitions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-25 13:54:44 +02:00
zdenop
82458db630 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-05-25 11:14:28 +02:00
zdenop
539673b503 fix '--enable-visibility' build 2019-05-25 11:13:33 +02:00
zdenop
8de022ab1c
Merge pull request #2461 from stweil/tensorflow
Support build with Tensorflow
2019-05-25 10:52:37 +02:00
Stefan Weil
32dcfd06ba Replace Tensorflow by TensorFlow
The name is written in camel case, see https://www.tensorflow.org/.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 17:14:28 +02:00
Stefan Weil
2441e4d8ac Implement check for Tensorflow header file
This looks for one of the header files which are included by Tesseract.
It currently uses a hard coded path which works for Debian / Ubuntu.

Simplify also the rules for linking Tensorflow.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 16:52:14 +02:00
Stefan Weil
9cdf041448 Remove "third_party/" in comments and update path names
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:12:52 +02:00
Stefan Weil
4382ab1a34 Support build with Tensorflow
It expects include files in /usr/include/tensorflow.

* Add configure option --with-tensorflow (disabled by default)
* Fix data type tensorflow::int64
* Remove "third_party/" in include statements
* Add dummy implementations for Backward and DebugWeights in TFNetwork
* Add files generated with protoc from tfnetwork.proto
  (so the Tensorflow sources are not needed for the build)
* Update Makefiles

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-24 14:11:31 +02:00
Zdenko Podobný
294f548ac1 fix missing tiff format 2019-05-24 10:39:17 +02:00
Stefan Weil
3f74da5da9 lstmtrainer: Set constant kLearningRateDecay at compile time
sqrt(0.5) = 1 / sqrt(2) can be replaced by the macro M_SQRT1_2.

This also fixes a compiler warning:

    src/lstm/lstmtrainer.cpp:51:14: warning: declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-23 15:01:23 +02:00
zdenop
4bab7dd83d
Merge pull request #2451 from Bharat123rox/lgtm
Some LGTM alert fixes and potential bugfixes
2019-05-22 12:19:44 +02:00
Egor Pugin
fea1f3970b
Merge pull request #2452 from stweil/tprintf
tprintf: Make code reentrant and use less memory
2019-05-22 12:31:34 +03:00
Egor Pugin
8f99880a7a
Merge pull request #2453 from stweil/crashcode
Remove SavePixForCrash and related code
2019-05-22 12:30:29 +03:00
Bharat123rox
bc3ea622a6 Fix bug in max_max_dist 2019-05-22 08:21:30 +02:00
Bharat123rox
0bf45e81e7 Fix LGTM and revert bugfix for later PR 2019-05-22 11:23:27 +05:30
Bharat123rox
945ccac85a Fix syntax error 2019-05-22 10:10:12 +05:30
Stefan Weil
6514479e8f Remove SavePixForCrash and related code
That debugging code uses very much memory and is no longer useful.

    text	   data	    bss	    dec	    hex	filename
     815	      0	 262144	 262959	  4032f	src/ccutil/globaloc.o

Remove also the function err_exit which was only used in ccmain/reject.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 20:25:58 +02:00
Stefan Weil
078a129674 tprintf: Make code reentrant and use less memory
Reduce the maximum message size from 64 KiB to 2 KiB which still should
be large enought for trace messages.

Create the smaller message on the stack instead of using a global
array to allow reentrancy and to reduce the memory use of Tesseract.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-21 20:22:58 +02:00
Bharat123rox
7f31a0634d Some LGTM fixes and potential bugfixes 2019-05-21 23:24:50 +05:30
Stefan Weil
d2ca81e794 Remove local definition of M_PI
It is defined for all platforms when math.h or cmath is included
after defining the macro _USE_MATH_DEFINES.

Define _USE_MATH_DEFINES before any include statement to make sure
that M_PI gets defined. It is not necessary to define it conditionally
only for Windows.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-20 21:18:52 +02:00
Stefan Weil
64bdceee69 Fix compiler warnings
This fixes lots of warnings related to ERRCODE like the following one:

    src/ccutil/errcode.h:81:15: warning:
      declaration requires a global constructor [-Wglobal-constructors]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-19 22:10:22 +02:00
Stefan Weil
09edd1a604 Fix out-of-bounds writes in Classify::ReadNewCutoffs
The function did not correctly read Chinese unichars into the local
Class variable if the locale was set to de_DE.UTF-8 (or other
incompatible locales). That resulted in a wrong ClassId which was
used to write into the Cutoffs array without checking for valid bounds.

On macOS the result was a runtime error in baseapi_test (see GitHub
issue #1250):

    [ RUN      ] TesseractTest.InitConfigOnlyTest
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug

Replacing sscanf by std::istringstream fixes that.
Add also an assertion to catch future out-of-bounds writes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-18 13:39:55 +02:00
zdenop
7e9d2f4bc4
Merge pull request #2432 from nickjwhite/hocrmoretypes
Add different classes to hocr output depending on BlockType
2019-05-16 17:02:48 +02:00
Stefan Weil
331cc84d8d Remove assertions for unsupported locale settings
The latest code passed all unittests with locale de_DE.UTF-8
and has fixed the locale issues which were reported on GitHub.
Therefore the assertions can be removed.

Any remaining locale issue will be fixed when it is identified.
To help finding such remaining isses, debug code now uses the
user's locale settings instead of the default "C" locale for all
executables which use TessBaseAPI.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 13:59:39 +02:00
Stefan Weil
77f9bad3c2 Fix UNICHARSET::save_to_string for locale de_DE.UTF-8
That function writes float values which must always use '.' as the
decimal separator, no matter what the current locale setting is.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:39:59 +02:00
Stefan Weil
36ed6da349 Fix baseapi_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/baseapi_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 12 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 10 tests from TesseractTest
    [ RUN      ] TesseractTest.ArraySizeTest
    [       OK ] TesseractTest.ArraySizeTest (0 ms)
    [ RUN      ] TesseractTest.BasicTesseractTest
    [       OK ] TesseractTest.BasicTesseractTest (1251 ms)
    [ RUN      ] TesseractTest.IteratesParagraphsEvenIfNotDetected
    [       OK ] TesseractTest.IteratesParagraphsEvenIfNotDetected (347 ms)
    [ RUN      ] TesseractTest.HOCRWorksWithoutSetInputName
    [       OK ] TesseractTest.HOCRWorksWithoutSetInputName (403 ms)
    [ RUN      ] TesseractTest.HOCRContainsBaseline
    [       OK ] TesseractTest.HOCRContainsBaseline (389 ms)
    [ RUN      ] TesseractTest.RickSnyderNotFuckSnyder
    [       OK ] TesseractTest.RickSnyderNotFuckSnyder (346 ms)
    [ RUN      ] TesseractTest.AdaptToWordStrTest
    Trying to adapt "136
    " to "1 3 6"
    Trying to adapt "256
    " to "2 5 6"
    Trying to adapt "410
    " to "4 1 0"
    Trying to adapt "432
    " to "4 3 2"
    Trying to adapt "540
    " to "5 4 0"
    Trying to adapt "692
    " to "6 9 2"
    Trying to adapt "779
    " to "7 7 9"
    Trying to adapt "793
    " to "7 9 3"
    Trying to adapt "808
    " to "8 0 8"
    Trying to adapt "815
    " to "8 1 5"
    Trying to adapt "12
    " to "1 2"
    Trying to adapt "12
    " to "1 2"
    [       OK ] TesseractTest.AdaptToWordStrTest (788 ms)
    [ RUN      ] TesseractTest.BasicLSTMTest
    [       OK ] TesseractTest.BasicLSTMTest (4525 ms)
    [ RUN      ] TesseractTest.LSTMGeometryTest
    [       OK ] TesseractTest.LSTMGeometryTest (615 ms)
    [ RUN      ] TesseractTest.InitConfigOnlyTest
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.232621 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.231864 in normproto file is not in unichar set.
    [...]
    Error: unichar ? in normproto file is not in unichar set.
    Error: unichar 0.233915 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar 0.221755 in normproto file is not in unichar set.
    Error: unichar 0.000400 in normproto file is not in unichar set.
    Error: unichar ? in normproto file is not in unichar set.
    baseapi_test(21845,0x1134c45c0) malloc: *** error for object 0x927f96c28005e0: pointer being freed was not allocated
    baseapi_test(21845,0x1134c45c0) malloc: *** set a breakpoint in malloc_error_break to debug
    [INFO]  Lang eng took 327ms in regular init
    [INFO]  Lang chi_tra took 1422ms in regular init
    Abort trap: 6

TesseractTest.InitConfigOnlyTest is fixed by using std::istringstream
instead of sscanf.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-16 11:05:09 +02:00
Stefan Weil
0dcc889e8d Fix apiexample_test with locale de_DE.UTF-8
The unittest failed with LANG=de_DE.UTF-8:

    $ unittest/apiexample_test
    Running main() from ../../../../unittest/../googletest/googletest/src/gtest_main.cc
    [==========] Running 4 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from EuroText
    [ RUN      ] EuroText.FastLatinOCR
    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../../../../src/ccutil/unicharset.h, line 874

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-15 22:43:47 +02:00
Stefan Weil
6b1e709b19 Fix Doxygen comments for void functions
Void functions should not use @return. It causes compiler warnings
like this one:

    src/classify/intproto.cpp:326:5: warning:
      '@return' command used in a comment that is attached to a function
      returning void [-Wdocumentation]

Some non-void functions also were documented with @return none.
Fix those comments, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 21:57:17 +02:00
Stefan Weil
caa04882fd normmatch: Remove unused private function
PrintNormMatch was unused. Remove it and remove also an unused prototype.
Make the only remaining private function NormEvidenceOf static.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-14 20:56:04 +02:00
Nick White
068eb4c35d Add different classes to hocr output depending on BlockType
These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
2019-05-14 13:25:08 +01:00
Stefan Weil
5d92fbf010 Replace sscanf by std::istringstream
Using std::istringstream allows conversion of string to float
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 15:04:30 +02:00
Stefan Weil
c76ceafcdf Fix reading of parameter from traineddata normproto component
The NonEssential parameter was wrongly derived from linear_token instead
of essential_token and therefore always set to true.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 14:43:58 +02:00
Stefan Weil
c07bc4e014 Fix Doxygen comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 08:55:23 +02:00
Stefan Weil
c8e96e2c02 Fix cast from pointer to integer type
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-12 08:54:46 +02:00
zdenop
7a5b9b8fcd ScrollView: remove custom implementation of GetAddrInfo 2019-05-04 15:16:41 +02:00
zdenop
5e01f74648 remove unused include 2019-05-04 15:14:54 +02:00
Stefan Weil
aba037329a tesscallback: Remove more unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-04 11:05:50 +02:00
Stefan Weil
57ff92e4bf tesscallback: Remove unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 22:14:04 +02:00
zdenop
9192c3afe2 correct tessdata comment in baseapi.h 2019-05-02 08:43:04 +02:00
zdenop
7e48368a5e
Merge pull request #2421 from stweil/includes
universalambigs: Add missing include file
2019-05-02 08:36:49 +02:00
zdenop
39d3824c78
Merge pull request #2420 from stweil/locale
Fix more locale dependencies
2019-05-02 08:31:41 +02:00
Stefan Weil
cd749be473 universalambigs: Add missing include file
This allows fixing two compiler warnings from clang++:

    src/ccutil/universalambigs.cpp:23:19: warning: no previous extern declaration for non-static variable 'kUniversalAmbigsFile' [-Wmissing-variable-declarations]
    src/ccutil/universalambigs.cpp:19019:18: warning: no previous extern declaration for non-static variable 'ksizeofUniversalAmbigsFile' [-Wmissing-variable-declarations]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:36:31 +02:00
Stefan Weil
4fbc0a257b commandlineflags: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:46 +02:00
Stefan Weil
d047fa1d1b paramsd: Replace strtod by std::stringstream
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:46 +02:00
Stefan Weil
e3860e45b7 clusttool: Replace strtof by std::stringstream
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:45 +02:00
Stefan Weil
ed45656ec8 clusttool: Remove unused code and some global functions
* WriteProtoList is unused. Remove it.

* ReadNFloats, WriteNFloats and WriteProtoStyle are only used locally,
  so make them local.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-02 07:33:45 +02:00
Stefan Weil
28a521fec2 Fix some typos (most found and fixed by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-05-01 20:30:41 +02:00
zdenop
41f50b19bb fix crash in case of missing PNG support in Leptonica see #2333 2019-05-01 19:51:54 +02:00
zdenop
90aef80dd7 fix documentation about datapath: ending "/" is not relevant 2019-05-01 11:37:50 +02:00
Jeff Breidenbach
546a9e81eb fix #1900: intraword spacing for slightly better pdf copy-paste performance 2019-04-29 11:28:30 +02:00
zdenop
137e6de56f Print info when uzn file is used. 2019-04-28 19:06:38 +02:00
Zdenko Podobný
80e54e401d fix spelling 2019-04-24 15:35:22 +02:00
Zdenko Podobný
832c257771 remove unused variable 2019-04-24 14:55:35 +02:00
Stefan Weil
b7bc71e987 Fix build for Windows
* winsock2.h is case sensitive, lower case is required for cross build.
* ws2tcpip.h is required for addrinfo.
* FreeAddrInfo conflicts with existing freeaddrinfo, so rename it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-24 11:24:47 +02:00
zdenop
129fe95390 svutil.cpp: fix windows build 2019-04-23 23:03:28 +02:00
zdenop
7bacc8852b Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-04-23 22:01:30 +02:00
zdenop
5c6ac61fe2 remove unused includes 2019-04-23 20:59:36 +02:00
zdenop
27f0f2ecea MSVS support inttypes.h from VS 2015 2019-04-23 20:45:14 +02:00
Stefan Weil
708511adcb Only include windows.h using host.h
host.h sets the macros NOMINMAX and WIN32_LEAN_AND_MEAN which must be
set before including windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
53f1265362 Clean macros in platform.h
* Remove unused macros ultoa, SIGNED.
* Move macros NOMINMAX and WIN32_LEAN_AND_MEAN to host.h
  because they are used when including windows.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
3bd61bfae4 svutil: Clean include file
* Remove MIN, MAX macros. They are unused.
* Include windows.h indirectly by including host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
e12b99d49b Remove host.h from Tesseract API
It is not needed by other API header files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:51:07 +02:00
Stefan Weil
8a34da027f Fix typo in description
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-22 21:50:37 +02:00
Shree
f8fba6362b fix the coordinates for EOL tab 2019-04-22 09:54:20 +00:00
zdenop
3ec7c22a87 fix missing EOL 2019-04-22 08:49:55 +02:00
Stefan Weil
09255ebe44 Don't include windows.h from platform.h
This partially reverts commit c150b9832d.
Now params.cpp includes host.h which also gets the definition for MAX_PATH.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-21 22:20:13 +02:00
zdenop
6781d78211
Merge pull request #2399 from stweil/pgedit
pgedit: Remove unused global functions
2019-04-20 19:26:02 +02:00
Stefan Weil
4ac1fad18a pdfrenderer: Replace snprintf by std::stringstream
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Some snprintf statements are not needed at all because a constant string
can be appended directly.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-20 19:05:29 +02:00
Stefan Weil
07d5365a1f baseapi: Use std::stringstream to format float values
Using std::stringstream allows conversion of float to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-20 19:05:29 +02:00
Stefan Weil
743fc2562d Remove unneeded include statements for pgedit.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-20 19:00:07 +02:00
Stefan Weil
26dd0b82bf pgedit: Remove unused global functions
pgeditor_show_point is unused, so remove it completely.
Some more functions are only used locally, so make them static functions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-20 19:00:07 +02:00
Stefan Weil
217c2530e6 Remove strtofloat
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-19 11:19:04 +02:00
Stefan Weil
7c3f9000cd Replace sscanf by std::stringstream
Using std::stringstream allows working with the C locale, independent
of the current locale settings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-19 11:19:04 +02:00
Stefan Weil
5529a5db11 unittest: Fix and enable params_model_test
This needs the latest test submodule.

The test uses LoadFromFile which is not used otherwise, so remove that
function from class ParamsModel.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-18 17:06:48 +02:00
Stefan Weil
a1ffcd3654 Use std::stringstream for add_str_double
Using std::stringstream allows conversion of double to string
independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-14 16:16:16 +02:00
Stefan Weil
aa64a63f69 Use std::stringstream to generate PDF output
Using std::stringstream simplifies the code and allows conversion of
double to string independent of the current locale setting.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-14 16:15:39 +02:00
Stefan Weil
78a957b989 Remove spaces a line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-13 18:54:42 +02:00
Stefan Weil
12ca2513d4 Revert "e" flag for fopen
clang-tidy added it in commit ac0b191f6b.

The "e" flag is an extension for glibc which sets the O_CLOEXEC flag,
so the file handle is not leaked to child processes. It is not needed
here.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-13 18:53:57 +02:00
Samuel Lee
e32b3360aa
Fix for MSVC
LoadDataFromFile/SaveDataToFile use fopen with unsupport file mode 'e' in MSVC.
2019-04-11 02:33:51 +09:00
Stefan Weil
f88a7f28e3 fontinfo: Fix wrong delete
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 12:16:04 +02:00
Stefan Weil
3dfe1b8807 classify: Modernize function UniformDensity
This should fix an issue reported by Codacy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 12:13:45 +02:00
Stefan Weil
72c874140e Modernize code by replacing C type casts
This was done using clang-tidy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-07 09:04:51 +02:00
zdenop
95a15a7a82 fix cmake&clang build 2019-04-06 15:31:53 +02:00
zdenop
ab09b09da6
Merge pull request #2294 from bertsky/lstm-with-char-whitelist
trying to add tessedit_char_whitelist etc. again:
2019-04-06 14:41:30 +02:00
Robert Schubert
25a42ea42f fixed failure report for tesstrain commands:
- with `set -e` in effect, looking at stdout
  to detect failure is too late
2019-04-06 08:13:03 +02:00
Robert Schubert
d5584e793e fixed failure report for tesstrain commands:
- with `set -e` in effect, it does not make sense
  to query `$?` indirectly
2019-04-06 08:13:03 +02:00
zdenop
be617b3722
Merge pull request #2361 from Shreeshrii/truth
Change message display for debug_level -1 during lstmtraining
2019-04-05 10:52:21 +02:00
zdenop
2982cb4ff3
Merge pull request #2368 from amitdo/no-legacy-fix
disable-legacy build: Do not include unused headers
2019-04-05 09:35:04 +02:00
Stefan Weil
d35a6f2de5 Modernize code (clang-tidy check modernize-deprecated-headers)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
Stefan Weil
20d5eedd45 Modernize code (clang-tidy check modernize-loop-convert)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-05 08:29:00 +02:00
amitdo
fab9a54981 Remove unneeded 'SUBDIRS=' from 3 Makefile.am files 2019-04-04 19:31:39 +02:00
Shree
6673347986 Change page to line in message 2019-04-04 15:43:29 +00:00
Shree
51c3535310 Always display GROUND TRUTH. BEST OCR and ALIGNED TRUTH only if different for debug_level -1 2019-04-04 15:33:22 +00:00
Shree
84d4cc2e95 Display OCR TEXT and GROUND TRUTH only when different for debug_level = -1 2019-04-04 15:33:22 +00:00
Amit D
2069c057d6
Merge branch 'master' into no-legacy-fix 2019-04-04 18:26:22 +03:00
Egor Pugin
2a1d238bd5
Merge pull request #2366 from stweil/modernize
Modernize code with "using"
2019-04-04 15:13:10 +03:00
amitdo
546014aecd disable-legacy build: Do not include unused headers 2019-04-04 15:09:08 +03:00
Stefan Weil
98346c2cd4 Modernize and format code
The code was modernized using clang-tidy with "modernize-use-using".

The modified files were then formatted using clang-tidy with
"google-readability-braces-around-statements", then clang-format
was applied.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-03 21:02:23 +02:00
Shreeshrii
613c2bf6e4
Change pages to lines in message
The pages variables refer to the lines in document. This change makes the messages clearer without changing the variable names.
2019-04-03 10:41:14 +05:30
Egor Pugin
af7cc1ce4c Fix windows build. 2019-04-01 22:38:01 +03:00
Stefan Weil
81fbd878dd Add more missing include statements for Windows build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-04-01 08:10:25 +02:00
Stefan Weil
ab009fae94 Remove macro WINDLLNAME
It is now no longer used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 20:05:41 +02:00
Stefan Weil
77a5f2623e Remove unused config variable tessedit_module_name
It was only defined for Windows builds.

Use also false instead of 0 to set the default value of
two boolean config variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 20:04:00 +02:00
Stefan Weil
c150b9832d Add missing include statements for Windows build
The last commits which removed BOOL8 had broken the Windows build.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 19:02:29 +02:00
Stefan Weil
802f42e821 Remove BOOL8, TRUE, FALSE from host.h
Remove unneeded include statements for host.h, add required ones and
update the comments for the remaining include statements.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:27:20 +02:00
Stefan Weil
be96b7b660 bits16: Format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:26:50 +02:00
Stefan Weil
146079f31d api: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 18:15:53 +02:00
Stefan Weil
4e0c726d6c ccutil: replace TRUE, FALSE by true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:56:47 +02:00
Stefan Weil
da0c14ae45 cutil: Replace TRUE, FALSE by true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:56:19 +02:00
Stefan Weil
87a973652c classify: Replace BOOL8, TRUE, FALSE by bool, true, false
Simplify also some related code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:55:48 +02:00
Stefan Weil
30ee3afc29 textord: Replace TRUE, FALSE by true, false and use bool instead of BOOL8
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:55:20 +02:00
Stefan Weil
b391ab84d0 wordrec: Replace TRUE, FALSE by true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:54:21 +02:00
Stefan Weil
cbb5e729a1 classify: Use bool and replace TRUE, FALSE
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:53:50 +02:00
Stefan Weil
46fa59aadc ccstruct: Replace BOOL8, TRUE, FALSE by bool, true, false and modernize code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:53:06 +02:00
Stefan Weil
92b9f9f8de ccmain: Replace TRUE, FALSE by true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:52:09 +02:00
Stefan Weil
7db25e15c0 Remove unused config variable tessedit_single_match
Replace also TRUE, FALSE by true, false.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:38:35 +02:00
Stefan Weil
ca2947a2c0 blobclass: Remove unused macros
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:36:46 +02:00
Stefan Weil
f2bd98e656 PageIterator: Remove useless const
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:35:43 +02:00
Stefan Weil
813b7803e0 pgedit: Replace BOOL8 by bool
Replace also TRUE, FALSE by true, false and add some static attributes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:29:15 +02:00
Stefan Weil
664811a869 Replace BOOL8, TRUE, FALSE by bool, true, false
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:28:28 +02:00
Stefan Weil
51a2c2eae8 Format code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:24:02 +02:00
Stefan Weil
95ea778745 capi: Replace FALSE, TRUE and simplify and format code
Format code using clang-format and clang-tidy.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:19:04 +02:00
Stefan Weil
89ba48b106 strngs: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:13:38 +02:00
Stefan Weil
127d0e31f0 serialis: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:12:11 +02:00
Stefan Weil
8b663e7620 helpers: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 17:06:19 +02:00
zdenop
3bb8f9cd49 Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-03-31 16:54:15 +02:00
zdenop
5f06402755 python: optimize imports, reformat code 2019-03-31 16:53:39 +02:00
zdenop
2e9fd69c9e use 'import pathlib'; fix "TypeError: argument of type 'WindowsPath' is not iterable" 2019-03-31 16:53:33 +02:00
zdenop
a0527b41bd fix LGTM reports for python 2019-03-31 16:53:25 +02:00
Stefan Weil
1948f0d520 ocrclass: Modernize and format code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:39:44 +02:00
Stefan Weil
85957e9673 WERD: Don't print space character after "FALSE" at end of line
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:32:42 +02:00
Stefan Weil
83d4433d3b Modernize and format unichar.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:30:15 +02:00
Stefan Weil
ac0b191f6b Modernize and format genericvector.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:21:32 +02:00
Stefan Weil
36ed08636b Modernize and format tesscallback.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-31 16:16:00 +02:00
zdenop
f47c7c92dd fix uninitialized variables in wordstrboxrenderer and lstmboxrenderer;
CID 1399132, 1399134, 1399135, 1399137, 1399140, 1399141, 1399142
2019-03-31 12:26:49 +02:00
Shreeshrii
ea36e94e58 fix Could not parse bool from flag (#2359) 2019-03-29 14:50:21 +01:00
Stefan Weil
852598eecf Remove file tessedit.h
It only declared the unused global variable global_monitor
which is now removed, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-27 19:03:42 +01:00
Stefan Weil
6e59abcce2 Remove file cutil.h
It only contained three type definitions which fit better in other
include files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-27 19:03:42 +01:00
Stefan Weil
b6bfb20f1d Improve readability of conditional code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 12:35:56 +01:00
Stefan Weil
36a1a30c22 Remove some old type casts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 12:35:56 +01:00
Stefan Weil
a44bf41f14 Modernize C++ loops
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-loop-convert' -fix

Then the resulting code was cleaned manually.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 08:38:21 +01:00
Stefan Weil
ed011670c8 Modernize C++ code using bool literals
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-bool-literals' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:58:02 +01:00
Stefan Weil
a0fd90583b Modernize C++ code using auto
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-auto' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:55:08 +01:00
Stefan Weil
36f768853a Modernize C++ code using override
The modifications were done using this command:

    run-clang-tidy-8.py -header-filter='.*' -checks='-*,modernize-use-override' -fix

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-26 07:37:52 +01:00
Stefan Weil
f877640bc9
Merge pull request #2319 from bertsky/tesstrain-parallel-wait-retval
tesstrain: check failure of subjobs
2019-03-25 16:10:09 +01:00
Stefan Weil
d8d2f6f48a Fix broken shell scripts for training
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 15:32:43 +01:00
Stefan Weil
631882a346 Fix compiler warnings (signed / unsigned mismatch)
clang warnings:

    src/ccutil/unicharcompress.cpp:172:27: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    src/lstm/recodebeam.cpp:129:29: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    src/lstm/recodebeam.cpp:276:48: warning: comparison of integers of different signs: 'std::__cxx1998::vector::size_type' (aka 'unsigned long') and 'int' [-Wsign-compare]
    unittest/imagedata_test.cc:101:21: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:33:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/linlsq_test.cc:44:23: warning: comparison of integers of different signs: 'int' and 'std::__cxx1998::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
    unittest/nthitem_test.cc:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/nthitem_test.cc:68:21: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
    unittest/stats_test.cc:26:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 08:36:07 +01:00
Stefan Weil
ecaad2aca8 ccstruct/werd: Format code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-25 07:57:34 +01:00
Stefan Weil
b1e305f38c Simplify code which tests for non-empty StringParam
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:35:52 +01:00
Stefan Weil
f9860cda41 Optimize functions ResetFrom
The loop can terminate as soon as the parameter name was found.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:21:23 +01:00
Stefan Weil
41da5afe9d UNICHARSET: Fix compiler warning (signed/unsigned mismatch)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:18:21 +01:00
Stefan Weil
91e2b253c0 Format modified code with clang-format
Format the files which were changed in
commit 297d7d86ce.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 21:10:29 +01:00
Stefan Weil
06acbaf99c IntegerMatcher: Fix division by zero
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1231:62: runtime error: division by zero
	    #0 0x6119d5 in IntegerMatcher::ApplyCNCorrection(float, int, int, int) tesseract/src/classify/intmatcher.cpp:1231:62
	    #1 0x5fe9c4 in tesseract::Classify::ComputeCorrectedRating(bool, int, double, double, int, int, int, int, int, unsigned char const*) tesseract/src/classify/adaptmatch.cpp:1213:29
	    #2 0x5fdc22 in tesseract::Classify::ExpandShapesAndApplyCorrections(ADAPT_CLASS_STRUCT**, bool, int, int, int, float, int, int, unsigned char const*, tesseract::UnicharRating*, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1184:13
	    #3 0x5fe421 in tesseract::Classify::MasterMatcher(INT_TEMPLATES_STRUCT*, short, INT_FEATURE_STRUCT const*, unsigned char const*, ADAPT_CLASS_STRUCT**, int, int, TBOX const&, GenericVector<CP_RESULT_STRUCT> const&, ADAPT_RESULTS*) tesseract/src/classify/adaptmatch.cpp:1119:5
	    #4 0x6003eb in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) tesseract/src/classify/adaptmatch.cpp:1374:5

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13712.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 19:39:31 +01:00
Stefan Weil
58423d2f6c
Merge pull request #2328 from bertsky/lstm-with-user-patterns2
Add user words / patterns again
2019-03-24 19:38:40 +01:00
zdenop
0d36d9a9d7
Merge pull request #2341 from Shreeshrii/fix
Fix
2019-03-24 18:21:09 +01:00
Stefan Weil
da6305b632 Fix compiler warnings caused by ASSERT_HOST
The modified definition avoids warnings caused by redundant semicolons.
Now a semicolon is required when using the macro, so a few code locations
had to be updated.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 17:47:04 +01:00
Stefan Weil
44a6d9f4d4 intmatcher: Catch more out of bounds reads
Credit to OSS-Fuzz which reported this issue:

intmatcher.cpp:1121:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x61034b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*, short) tesseract/src/classify/intmatcher.cpp:1121:17
	    #1 0x60f560 in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:514:11
	    #2 0x5f3a25 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f2ccd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f16ee in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads in release builds.
Add also assertions for debug builds.

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13818.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 17:27:43 +01:00
Stefan Weil
5fd7228414 intmatcher: Catch out of bounds reads
Credit to OSS-Fuzz which reported this issue:

    intmatcher.cpp:1163:17: runtime error: index 24 out of bounds for type 'uint8_t [24]'
	    #0 0x610d3b in ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS_STRUCT*, unsigned int*) tesseract/src/classify/intmatcher.cpp:1163:17
	    #1 0x60ff4e in IntegerMatcher::Match(INT_CLASS_STRUCT*, unsigned int*, unsigned int*, short, INT_FEATURE_STRUCT const*, tesseract::UnicharRating*, int, int, bool) tesseract/src/classify/intmatcher.cpp:563:11
	    #2 0x5f4355 in tesseract::Classify::AdaptToChar(TBLOB*, int, int, float, ADAPT_TEMPLATES_STRUCT*) tesseract/src/classify/adaptmatch.cpp:894:9
	    #3 0x5f35fd in tesseract::Classify::LearnPieces(char const*, int, int, float, tesseract::CharSegmentationType, char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:430:5
	    #4 0x5f201e in tesseract::Classify::LearnWord(char const*, WERD_RES*) tesseract/src/classify/adaptmatch.cpp:293:7

This catches the out of bounds data reads, but does not fix the primary
reason: ProtoLengths currently gets values which are larger than the
allowed index.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 15:44:33 +01:00
Stefan Weil
509ee95023 IntegerMatcher: Fix data type of loop counters
ClassTemplate->ProtoLengths[n] is of type uint8_t, so use that for
the related loop counters, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 15:35:06 +01:00
Stefan Weil
f4f34a87db WERD_RES: Fix uninitialized member variable
Credit to OSS-Fuzz which reported this issue:

    pageres.cpp:1143:7: runtime error: load of value 249, which is not a valid value for type 'bool'
	    #0 0x6ba560 in WERD_RES::Clear() tesseract/src/ccstruct/pageres.cpp:1143:7
	    #1 0x6b9fd1 in WERD_RES::operator=(WERD_RES const&) tesseract/src/ccstruct/pageres.cpp:193:3
	    #2 0x49a9ad in WERD_RES::WERD_RES(WERD_RES const&) tesseract/src/ccstruct/pageres.h:356:11

See https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13707.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 14:59:08 +01:00
Stefan Weil
afc099b9f4 intmatcher: Split data_table
The old code was a hack to improve the performance.

The new code is clearer and results in the same binary when compiling
with gcc 8.3.0, so it looks like the old hack is no longer needed with
modern compilers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-24 08:15:40 +01:00
Shreeshrii
8749f3553e
LINEDATA=false 2019-03-23 19:16:49 +05:30
Shree
bcb7cf9846 sort arguments, use true/false instead of 1/0 2019-03-23 12:28:53 +00:00
Shree
c2db272134 Modify distort_image for Boolean 2019-03-22 17:02:46 +00:00
Shree
259d5af6b1 Add PSM values to the definition 2019-03-22 15:29:02 +00:00
Shree
8eafec0d17 Fix comments with current values of PSM codes 2019-03-22 14:10:49 +00:00
Stefan Weil
e1e56d9d66 Remove local function declarations from intmatcher.h
This requires moving the local function HeapSort to the beginning.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-22 11:39:39 +01:00
Stefan Weil
2ba194ca8d Remove four unused parameters
This fixes some compiler warnings:

    src/classify/intmatcher.cpp:711:63: warning: unused parameter ‘ConfigMask’ [-Wunused-parameter]
    src/classify/intmatcher.cpp:1007:16: warning: unused parameter ‘ProtoMask’ [-Wunused-parameter]
    src/classify/intmatcher.cpp:1095:61: warning: unused parameter ‘NumFeatures’ [-Wunused-parameter]
    src/classify/intmatcher.cpp:1136:59: warning: unused parameter ‘used_features’ [-Wunused-parameter]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-22 11:30:24 +01:00
Stefan Weil
dd79d56e9f Remove unused parameter BlobLength
This fixes two compiler warnings:

    src/classify/intmatcher.cpp:553:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter]
    src/classify/intmatcher.cpp:622:14: warning: unused parameter ‘BlobLength’ [-Wunused-parameter]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-22 11:17:19 +01:00
Shree
9b915d5efb add --distort_image 2019-03-22 05:39:38 +00:00
Shree
f7ffde99d5 add --distort_image 2019-03-22 05:34:00 +00:00
zdenop
ac7ea4322a
Merge pull request #2335 from Shreeshrii/master
Changes to tesstrain.py - max_workers=8, distort_image=false
2019-03-17 15:27:34 +01:00
zdenop
26877ba703 check min. python version; os.uname is not available on windows 2019-03-17 15:25:48 +01:00
Shreeshrii
f8e8521606
Update tesstrain_utils.py 2019-03-17 15:32:35 +05:30
Shree
6fa8e1bb15 Set max_workers=8 2019-03-17 09:58:11 +00:00
Shree
e21499e81e Set default value for distort_image 2019-03-17 09:54:16 +00:00
Stefan Weil
ee2f9bf7bf Remove old comments in file headers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:55:00 +01:00
Shree
d47b0d588a Use LATIN_FONTS for kmr 2019-03-15 15:47:56 +00:00
Shree
3eee1d217a Add kmr and kur_ara, remove kur from training scripts 2019-03-15 15:37:49 +00:00
Robert Schubert
297d7d86ce trying to add user words/patterns again:
- pass in ParamsVectors from Tesseract
  (carrying values from langdata/config/api)
  into LSTMRecognizer::Load and LoadDictionary
- after LSTMRecognizer's Dict is initialised
  (with default values), reset the variables
  user_{words,patterns}_{suffix,file} from the
  corresponding entries in the passed vector
2019-03-15 16:06:19 +01:00
Shree
b2ebf0195f Add kmr and kur_ara, remove kur from training scripts 2019-03-15 14:39:39 +00:00
Shree
37befdf6c4 Add option for --distort_image 2019-03-15 13:32:36 +00:00
zdenop
0a36b38169
Merge pull request #2317 from eighttails/master
Added missing linker flags for MinGW.
2019-03-15 08:01:21 +01:00
Robert Schubert
14346e56b0 tesstrain: catch+handle SIGINT (to stop waiting on subjobs) 2019-03-15 00:03:16 +01:00
Robert Schubert
6cbad17e30 tesstrain: check all subjobs' retval 2019-03-14 14:38:51 +01:00
Robert Schubert
5316bcbb94 tesstrain: check failure of subjobs 2019-03-14 11:42:01 +01:00
Stefan Weil
4c2bbebecc Fix compiler warning (-Wunused-value)
Warning from clang++:

    ..\src\ccmain\ltrresultiterator.cpp(454,8):  warning: expression result unused [-Wunused-value]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-13 20:56:03 +01:00
Stefan Weil
ed84ba0a44 Fix wrong comparison
symbol_steps is a vector, so testing for a nullptr was wrong.

clang++ reports:

    ..\src\ccmain\ltrresultiterator.cpp(440,19):  warning: comparison of address of 'this->word_res_->symbol_steps' equal to a null pointer is always false [-Wtautological-pointer-compare]
      if (&word_res_->symbol_steps == nullptr || !LSTM_mode_) return nullptr;
           ~~~~~~~~~~~^~~~~~~~~~~~    ~~~~~~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-13 20:38:38 +01:00
Tadahito Yao
bbbd262a8d Added missing linker flags for MinGW. 2019-03-13 22:10:36 +09:00
jm server2
1206362d30 accumulated_timesteps is not a pointer but a vector and in case we use ChoiceIterator without lstm_choice_mode tesseract crashes (or similar) because the check is true and we reference not existing item 2019-03-13 12:55:14 +01:00
Stefan Weil
3baf0d8076 Fix boolean assignments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 15:34:24 +01:00
Stefan Weil
8ad0489f0f Remove svpaint.cpp from libtesseract
svpaint is a standalone application (it includes a main function)
and should not be part of the Tesseract library.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 12:22:53 +01:00
zdenop
7546a01020
Merge pull request #2310 from noahmetzger/LSTMChoiceRIL
Lstm choice ril
2019-03-12 10:46:11 +01:00
Stefan Weil
35a999f91a Fix assertion caused by wrong unicharset
Credit to OSS-Fuzz: it found another case which triggered this assertion:

    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502

This is the OSS-Fuzz testcase:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 09:31:21 +01:00
Stefan Weil
56a39bda77 Fix float division by zero
That runtime error is normally not visible because it does not abort
the program, but is detected when the code was compiled with sanitizers.

It can be triggered with this OSS-Fuzz testcase:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13662

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 09:28:16 +01:00
Noah Metzger
5b3e2fe812 Integrated accumulated Symbol Choice in the Choice Iterator and made the api lstm_choice_mode independent
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-12 09:15:10 +01:00
Stefan Weil
4c0b98bd12 Replace undefined shift operations by multiplications
Shift operations are undefined for negative numbers, but at least on
Intel they return the same value as a multiplication with 2 ^ shift value.

This fixes runtime errors reported by sanitizers and OSS-Fuzz:

    intmatcher.cpp:821:59: runtime error: left shift of negative value -14
    intmatcher.cpp:823:75: runtime error: left shift of negative value -512
    intmatcher.cpp:820:50: runtime error: left shift of negative value -80

See issue #2297 and
https://oss-fuzz.com/testcase-detail/4845195990925312 for details.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 06:56:54 +01:00
Stefan Weil
896698a4f5 Fix runtime error (left shift of negative value)
Runtime error:

    src/training/util.h:37:28: runtime error: left shift of negative value -17

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-12 06:56:54 +01:00
Stefan Weil
5202208a8c Remove globals.h
It only included other files which are already included where needed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-11 19:01:23 +01:00
Noah Metzger
bc2b919805 Integrated Timesteps per symbol into ChoiceIterator
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
Noah Metzger
754e38d2b4 Added the option to get the timesteps separated by the suggested segmentation
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-03-11 10:50:56 +01:00
zdenop
e817607280 archive_version_details is available from libArchive version 3.2.0 2019-03-10 22:57:48 +01:00
zdenop
5cfe4cc1f0
Merge pull request #2286 from Shreeshrii/lstmbox
Rename function to TessBaseAPIGetTsvText to be consistent to Create method
2019-03-10 21:41:52 +01:00
zdenop
02a1ffe87a Report libArchive support 2019-03-10 20:08:45 +01:00
Stefan Weil
b3aff7d633 Fix Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature
This fixes issue #2299, an issue which was already reported by
static code analyzers and now by OSS-Fuzz, see details at
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13597.

The Tesseract code assigns an address which is out-of-bounds to a pointer
variable, but increments that variable later. So this is a false positive.

Change the code nevertheless to satisfy OSS-Fuzz.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-10 18:26:40 +01:00
Stefan Weil
91d0a71d51 Fix assertion caused by wrong unicharset (issue #2301)
Credit to OSS-Fuzz:
This fixes an issue which was reported by OSS-Fuzz, see details at
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13592.

OSS-Fuzz triggered this assertion:

    contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 502

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-10 16:42:54 +01:00
Stefan Weil
71d4990c6d Fix Heap-buffer-overflow in GenericVector<int>::size (issue #2298)
Credit to OSS-Fuzz:
This fixes a security issue which was reported by OSS-Fuzz, see details at
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13590.

Add also some assertions to catch similar bugs.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-10 16:12:30 +01:00
Robert Schubert
3912cb1c33 LSTM char_whitelist/blacklist (6ac2ff0): more robust
- unicharset can be null too
2019-03-09 10:40:40 +01:00
Robert Schubert
b45999088c LSTM char_whitelist/blacklist (6ac2ff0): multi-code chars
- move decision from ComputeTopN to ContinueContext, where
  it belongs: block context continuations which emit final
  codes translating to disabled unichar_ids.
  (The normal logic for fallback from top2 > top2 > rest
   will apply.)
- pass UNICHARSET refs appropriately
2019-03-08 12:30:16 +01:00
Robert Schubert
8012d5e653 LSTM char_whitelist/blacklist (6ac2ff0): also sublangs 2019-03-07 18:32:50 +01:00
Robert Schubert
6ac2ff083e trying to add tessedit_char_whitelist etc. again:
- ignore matrix outputs in ComputeTopN if they
  belong to a disabled unichar_id
- pass UNICHARSET refs to check that
- in SetBlackAndWhitelist, also update the unicharset
  of the lstm_recognizer_ instance, if any
2019-03-07 01:37:23 +01:00
zdenop
f80085c0bf
Merge pull request #2289 from Armyke/master
Added an additional optional --tmp_dir parameter to specify the tempo…
2019-03-06 15:03:14 +01:00
Stefan Weil
1c7e00611b Add initial support for traineddata files in standard archive formats
This requires libarchive-dev.

Tesseract can now load traineddata files in any of the archive formats
which are supported by libarchive. Example of a zipped BagIt archive:

    $ unzip -l /usr/local/share/tessdata/zip.traineddata
    Archive:  /usr/local/share/tessdata/zip.traineddata
      Length      Date    Time    Name
    ---------  ---------- -----   ----
           55  2019-03-05 15:27   bagit.txt
            0  2019-03-05 15:25   data/
         1557  2019-03-05 15:28   manifest-sha256.txt
      1082890  2019-03-05 15:25   data/eng.word-dawg
      1487588  2019-03-05 15:25   data/eng.lstm
         7477  2019-03-05 15:25   data/eng.unicharset
        63346  2019-03-05 15:25   data/eng.shapetable
       976552  2019-03-05 15:25   data/eng.inttemp
        13408  2019-03-05 15:25   data/eng.normproto
         4322  2019-03-05 15:25   data/eng.punc-dawg
         4738  2019-03-05 15:25   data/eng.lstm-number-dawg
         1410  2019-03-05 15:25   data/eng.freq-dawg
          844  2019-03-05 15:25   data/eng.pffmtable
         6360  2019-03-05 15:25   data/eng.lstm-unicharset
         1012  2019-03-05 15:25   data/eng.lstm-recoder
         1047  2019-03-05 15:25   data/eng.unicharambigs
         4322  2019-03-05 15:25   data/eng.lstm-punc-dawg
     16109842  2019-03-05 15:25   data/eng.bigram-dawg
           80  2019-03-05 15:25   data/eng.version
         6426  2019-03-05 15:25   data/eng.number-dawg
      3694794  2019-03-05 15:25   data/eng.lstm-word-dawg
    ---------                     -------
     23468070                     21 files

`combine_tessdata -d` and `combine_tessdata -u` also work.

The traineddata files in the new format can be generated with
standard tools like zip or tar.

More work is needed for other training tools and big endian support.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-05 17:18:48 +01:00
Armyke
56b04d4ea7 Added the same --tmp_dir flag to tesstrain_utils.sh 2019-03-04 14:05:25 +00:00
Armyke
25fa392887 Added an additional optional --tmp_dir parameter to specify the temporary directory in which tesstrain.py creates the training temporary files. The main reason is due to the slow R/W on HDD, if anyone wants to speed up this process can use as tmp_dir a directory on an SSDrive 2019-03-04 13:26:53 +00:00
Stefan Weil
7fbde96a04 Format new code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 20:26:07 +01:00
Stefan Weil
38fac625cd Format new code with clang-format
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 20:01:48 +01:00
Shree
a0202bac70 Rename function to TessBaseAPIGetTsvText to be consistent to the Create method 2019-03-02 16:29:53 +00:00
zdenop
5de2a21b3f
Merge pull request #2283 from Shreeshrii/lstmbox
Add missing renderers to C-API
2019-03-02 15:15:34 +01:00
Stefan Weil
9c90894ff0 PAGE_RES_IT: Optimize compare operators by using inline code
Avoiding a function call will make both == and != operator faster.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:57:16 +01:00
Stefan Weil
295996ed05 commandlineflags: Fix compiler warnings (signed/unsigned)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:21:04 +01:00
Stefan Weil
eb14726aac ICOORD: Fix old type casts
This fixes compiler warnings and avoids unnecessary conversions
between float and double.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:04:54 +01:00
Stefan Weil
fb0f1bcf66 BoxChar: Fix compiler warnings (signed/unsigned)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 14:04:54 +01:00
Stefan Weil
0e1a1fc3cf Validator: Fix compiler warnings (signed/unsigned)
This also fixes a regression in validate_grapheme_test introduced
by commit 32e9d7c8f5.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-02 13:05:03 +01:00
Shree
c7e8131efc Add TSV option to C-API 2019-03-02 09:50:54 +00:00
Shree
22c099348b rename LSTMBOX to LSTMBox 2019-03-02 09:11:47 +00:00
zdenop
2ba8e0061a
Merge branch 'master' into mya 2019-03-01 18:37:24 +01:00
Shree
c33f03e33e Add lstmboxand wordstrbox to capi.h 2019-03-01 17:16:59 +00:00
Shree
76ec21df3d Add lstmbox and wordstrbox to C-API 2019-03-01 16:40:41 +00:00
zdenop
646b043d2c
use space instead of tab 2019-03-01 14:36:09 +01:00
Shree
5ee1deaea2 correct handling of 0BF0-0BFA Tamil numbers and symbols 2019-03-01 13:21:49 +00:00
zdenop
d7ddc4c5b7
Merge pull request #2270 from Shreeshrii/U_ARABIC_NUMBER
Treat U_ARABIC_NUMBER as LTR
2019-02-28 09:27:54 +01:00
zdenop
12c1225a5f
Merge pull request #2271 from stweil/refactor
Refactor class Network
2019-02-27 07:43:13 +01:00
Michal Čihař
14c4494f42 Allow UTF-8 variant of C locale
It behaves same in scanf, but it allows proper handling of unicode
chars.
2019-02-26 21:37:33 +01:00
Stefan Weil
98dd3b6351 Refactor class Network
That class is an abstract class with several pure virtual functions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-26 16:55:31 +01:00
Shree
25b02bf1f2 Treat U_ARABIC_NUMBER as LTR 2019-02-26 09:51:21 +00:00
Shreeshrii
2f71fe280c
Use alternative way to comment a block of code (using the c preprocessor).
https://github.com/tesseract-ocr/tesseract/pull/2268#pullrequestreview-207605382
Thanks @amitdo
2019-02-26 15:05:51 +05:30
Shree
449f1cd4ba Remove test for Word started with a combiner 2019-02-25 18:47:42 +00:00