Commit Graph

3231 Commits

Author SHA1 Message Date
Stefan Weil
f93fb9de74 unittest: Add lang_model_test (only works partially)
The test currently has subtests which fail because of missing files.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 16:23:13 +02:00
Stefan Weil
de6a759744 unittest: Add paragraphs_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 16:23:10 +02:00
Stefan Weil
53f0e7658f unittest: Add imagedata_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 15:15:02 +02:00
zdenop
30081c517b
Merge pull request #1981 from stweil/clean
Clean code
2018-10-12 15:06:46 +02:00
Stefan Weil
d86d520fd0 Remove tab character in source files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
Stefan Weil
d59f14c70a Remove gradechop.h
It only defines the macro partial_split_priority which is only used in
findseam.cpp, so move it to that file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 11:31:10 +02:00
zdenop
2633ba28d0
Merge pull request #1980 from stweil/unittest
unittest: Add baseapi_test, fileio_test and qrsequence_test
2018-10-12 09:10:37 +02:00
Stefan Weil
420a0286fd unittest: Add fileio_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 08:13:39 +02:00
Stefan Weil
d3cf423748 unittest: Add qrsequence_test
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-12 08:01:58 +02:00
Stefan Weil
11f82f5c1e unittest: Add baseapi_test
* Add Abseil sources to build process.

* Add copyright comment.

* InitConfigOnlyTest no longer tests
  hin.traineddata because it is LSTM only.

* Fix std::string.

* Deactivate tests with missing test data.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:58 +02:00
Stefan Weil
db16fea6b1 Add a basic implementation of class CycleTimer
It is used by baseapi_test.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:58 +02:00
Stefan Weil
27bfaccf73 Enhance LOG emulation
It is needed for baseapi_test and other unit tests.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:58 +02:00
Stefan Weil
db07a69b56 Add more hacks for use with Google unittests
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:58 +02:00
Stefan Weil
b65b4afe43 Update test submodule
The latest version is needed for the baseapi_test.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:58 +02:00
Stefan Weil
3318c9aadd Add Abseil as a submodule (needed for some of the new unit tests)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-11 22:18:51 +02:00
Quan Nguyen
9d84968d71 fix building of ScrollView.jar with modern java version; fixes #1289 2018-10-10 10:42:39 +02:00
Zdenko Podobný
5fac51173b Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
* 'master' of https://github.com/tesseract-ocr/tesseract:
  remove insight.io badge
  Use env variable in AppVeyor configuration
  Fix integer overflow in overlap calculation
  hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470
  fix uninitialized variable, remove unused variable
  Remove virtual specifiers
2018-10-10 00:38:24 +02:00
Zdenko Podobný
6e75924352 remove not existing directory from autotools distribution 2018-10-10 00:36:53 +02:00
zdenop
b7098b3a40
remove insight.io badge 2018-10-10 00:29:11 +02:00
zdenop
53ce4cf8ca
Merge pull request #1972 from stweil/appveyor
Use env variable in AppVeyor configuration
2018-10-09 20:25:08 +02:00
Stefan Weil
488cc49aa8 Use env variable in AppVeyor configuration
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-09 19:23:40 +02:00
Egor Pugin
d93094b397
Merge pull request #1971 from stweil/fix
Fix integer overflow in overlap calculation
2018-10-09 19:59:09 +03:00
Stefan Weil
7f911ac5e0 Fix integer overflow in overlap calculation
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-09 16:43:31 +02:00
zdenop
ca5d285a28 hocr: add ocrp_wconf to unconditional ocr-capabilities; fixes #1470 2018-10-09 16:34:50 +02:00
zdenop
956525f5a4 fix uninitialized variable, remove unused variable 2018-10-09 15:47:20 +02:00
zdenop
a6e716659e
Merge pull request #1970 from stweil/virtual
Remove virtual specifiers
2018-10-09 15:40:47 +02:00
Zdenko Podobný
67b6b02e2d Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
* 'master' of https://github.com/tesseract-ocr/tesseract:
  Remove code for _MSC_VER < 1900
  keep API compatibility with #1265
  Update googletest submodule to release v1.8.1
  Update test submodule
  Always use isascii() with isspace()
  Avoid crash with --psm 0 and LSTM traineddata
  SVPaint: Remove empty block
  Classify: Don't hide debug parameter
  UNICHARMAP: Remove comparison which is always false
  svpaint: Change a variable from global to local
  pgedit: remove unused declaration of display_bln_lines
  Plumbing: Remove comparison which is always false
  Release candidate 2
  use pdf L_FLATE_ENCODE only for png input; fixes #1961
2018-10-09 15:37:40 +02:00
Stefan Weil
128422e75c Remove virtual specifiers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-09 15:23:59 +02:00
zdenop
a9a411613a
Merge pull request #1968 from stweil/msvc
Remove code for _MSC_VER < 1900
2018-10-09 14:42:37 +02:00
Stefan Weil
f94b3fd9fc Remove code for _MSC_VER < 1900
Tesseract does not support Visual C++ older than Visual Studio 2015.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-09 14:05:21 +02:00
zdenop
c375f4fbf7 keep API compatibility with #1265 2018-10-09 11:22:15 +02:00
zdenop
7be5f74df8
Merge pull request #1966 from stweil/tests
Update submodules for testing
2018-10-08 20:57:28 +02:00
Stefan Weil
af02ac6474 Update googletest submodule to release v1.8.1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 19:54:56 +02:00
Stefan Weil
eba1c81d52 Update test submodule
The latest version includes more files for testing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 19:44:08 +02:00
zdenop
272ebf995f
Merge pull request #1965 from stweil/isspace
Always use isascii() with isspace()
2018-10-08 18:47:39 +02:00
zdenop
ab39adbcab
Merge pull request #1964 from stweil/fix
Avoid crash with --psm 0 and LSTM traineddata
2018-10-08 18:45:37 +02:00
Stefan Weil
dcd0377bf0 Always use isascii() with isspace()
isspace() must only used with an unsigned char or EOF argument,
and even then its result can depend on the current locale settings.

While this is not a problem for C/C++ executables which use the default
"C" locale, it becomes a problem when the Tesseract API is called from
languages like Python or Java which don't use the "C" locale.

By calling isasci() before calling isspace() this uncertainty can be
avoided, because any locale will hopefully give identical results for
the basic ASCII character set.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 17:25:09 +02:00
Stefan Weil
32e92def49 Avoid crash with --psm 0 and LSTM traineddata
Orientation and script detect only worked with legacy models
and crashed with LSTM models.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 16:03:54 +02:00
zdenop
59ebd58fcc
Merge pull request #1963 from stweil/fix
Fix some warnings from static code analyzer LGTM
2018-10-08 15:09:59 +02:00
Stefan Weil
1eeca175f7 SVPaint: Remove empty block
This fixes a warning from LGTM:

    Empty block without comment

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 14:25:05 +02:00
Stefan Weil
9c857ab962 Classify: Don't hide debug parameter
Fix a warning from LGTM:

    Local variable 'debug' hides a parameter of the same name.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 14:22:31 +02:00
Stefan Weil
30b75cfc05 UNICHARMAP: Remove comparison which is always false
Warning from LGTM:

    Comparison is always false because index <= 0 and 1 <= length.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 14:15:17 +02:00
Stefan Weil
3ae765ecca svpaint: Change a variable from global to local
This fixes a warning from LGTM:

    Poor global variable name 'rgb'. Prefer longer, descriptive
    names for globals (eg. kMyGlobalConstant, not foo).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 13:53:09 +02:00
Stefan Weil
7b5955920d pgedit: remove unused declaration of display_bln_lines
This fixes a warning from LGTM:

    This parameter of type ScrollView is 144 bytes
    - consider passing a pointer/reference instead.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 13:49:59 +02:00
Stefan Weil
ae93b65b1f Plumbing: Remove comparison which is always false
Warning from LGTM:

    Comparison is always false because index >= 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-08 13:47:16 +02:00
zdenop
944816ae3d Release candidate 2 2018-10-07 21:10:50 +02:00
zdenop
f794571195 use pdf L_FLATE_ENCODE only for png input; fixes #1961 2018-10-07 20:57:19 +02:00
Zdenko Podobný
8598731daf Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
* 'master' of https://github.com/tesseract-ocr/tesseract: (27 commits)
  Rework check for readable input file
  fix "mktemp -d --tmpdir" on Mac OS; see #1453
  pgedit: Change some variables from global to local ones
  improve description of min_characters_to_try variable
  WERD_RES: Remove comparisons which are constant
  GENERIC_2D_ARRAY: Pass parameters by reference
  genericvector: Pass parameters by reference
  chop: Use more efficient float calculations for sqrt
  rect: Use more efficient float calculations for ceil, floor
  intproto: Use more efficient float calculations for floor
  genericvector: Rewrite code to satisfy static code analyzer
  Fix constructor for class Dict (uninitialized member variables)
  Fix use of wrong UNICHARSET
  lstmtraining: Remove dead code for purified model name
  combine_tessdata: Handle failures when extracting
  lstmtraining: Check write permission for output model
  implement parameter min_characters_to_try for minimum characters to try to skip page entirely. fixes #1729
  Merge and enhance documentation on language and script models
  Document some more config options for tesseract
  Add Makefile rule to build HTML manpages
  ...
2018-10-07 15:39:02 +02:00
Egor Pugin
5cf5c80ba1
Merge pull request #1960 from stweil/errhandling
Rework check for readable input file
2018-10-07 12:23:31 +03:00
Stefan Weil
67bf9062df Rework check for readable input file
This reverts commit 1a096441d0 and
implements an alternate check which allows input from stdin.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-06 22:33:02 +02:00