Commit Graph

4192 Commits

Author SHA1 Message Date
Stefan Weil
4bbfabaa67 Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:59 +01:00
Stefan Weil
e3441f0ced Copy resolution of source image (fix issue #1702)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:20 +01:00
Stefan Weil
090d3c4b4c Fix typo in README.md (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:14 +01:00
zdenop
7488c85ef1 fix memory leak at PangoFontInfo::CanRenderString 2019-11-01 15:07:06 +01:00
Hyeonguk Ryu
080f83a17c Change from HTTP to HTTPS 2019-11-01 15:06:57 +01:00
zdenop
4a5ec186f6 test for synthesized font faces. 2019-11-01 15:06:48 +01:00
zdenop
f9d1bda7e3 cmake: add minimum required version for pango and icu based on autotools 2019-11-01 15:06:41 +01:00
zdenop
99645e3c40 text2image: show pango version 2019-11-01 15:06:35 +01:00
Stefan Weil
077616fd36 quadlsq: Fix warnings from LGTM
Fix two occurrences of this LGTM warning:

    Multiplication result may overflow 'double'
      before it is converted to 'long double'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:30 +01:00
Stefan Weil
02d916a64f Use "C" locale for PDF output
This fixes wrong output of integers with locale de_DE.UTF-8:

    -  /Width 2.481
    -  /Height 3.508
    +  /Width 2481
    +  /Height 3508

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:23 +01:00
Stefan Weil
efc2b7601a Use "C" locale for ALTO output
This fixes wrong output of integers with locale de_DE.UTF-8:

    - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0">
    + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:18 +01:00
Stefan Weil
83af58c2aa Fix build error (undefined local variable)
The latest commit 96025c7923 was incomplete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:11 +01:00
Stefan Weil
6a8be20b4c Remove unimplemented +/- for parameter files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:02 +01:00
zdenop
c7e9f31b0a do not exit if not existing parameter is used. fixes #1334
# Conflicts:
#	src/ccmain/tessedit.cpp
2019-11-01 15:05:47 +01:00
zdenop
03a0586931 Report when tesseract legacy engine not present. (fix issue #2053) 2019-11-01 15:02:58 +01:00
Egor Pugin
17828fc7e8 Fix isolated build. 2019-11-01 15:02:49 +01:00
Stefan Weil
9c52eb0cba Add new parameter "document_title" to set the title in OCR output files
The title can be set for hOCR and PDF output.

Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700.

The constant unknown_title_ is no longer needed and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:42 +01:00
Stefan Weil
e43eb9104c sw.cpp: Sync list of public headers with Autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:32 +01:00
zdenop
fc45fc51b5 CMake: Sync list of public headers with Autotools build 2019-11-01 15:02:20 +01:00
jm
a5670421a7 speed optimisation - add the option to disable automatic inverting of line images 2019-11-01 14:58:59 +01:00
Stefan Weil
ebff4dae35 Fix comment which referred to unused Tesseract parameter
This completes commit aa2ab68e29.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:58:50 +01:00
zdenop
b244bd1c13 Removed unused parameters
The following parameters are not used anywhere anymore:

 * use_definite_ambigs_for_classifier
 * max_viterbi_list_size
 * word_to_debug_lengths
 * fragments_debug
 * tessedit_redo_xheight
 * debug_acceptable_wds
 * tessedit_matcher_log
 * tessedit_test_adaption_mode
 * docqual_excuse_outline_errs
 * crunch_pot_garbage
 * suspect_space_level
 * tessedit_consistent_reps
 * wordrec_display_all_words
 * wordrec_no_block
 * wordrec_worst_state
 * fragments_guide_chopper
 * segment_adjust_debug
 * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
 * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
 * classify_min_norm_scale_x
 * classify_max_norm_scale_x
 * classify_min_norm_scale_y
 * classify_max_norm_scale_y
 * il1_adaption_test
 * textord_blob_size_bigile
 * textord_blob_size_smallile
 * editor_debug_config_file
 * textord_tabfind_show_color_fit

The list was generated by a python script and each parameter occurence checked
manually.

# Conflicts:
#	src/classify/classify.cpp
2019-11-01 14:58:36 +01:00
Stefan Weil
58122ea313 Don't create OCR result files when training data is created
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.

In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:43 +01:00
Stefan Weil
3dfd72721b Simplify configure.ac
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:34 +01:00
Stefan Weil
ca172592da Add support for image or image list by URL
This allows OCR of images from the internet without downloading them first:

    tesseract http://IMAGE_URL OUTPUT ...

It uses libcurl.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:28 +01:00
Stefan Weil
190536bbd7 Modernize code for LIST (fix some -Wold-style-cast warnings)
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)

This fixes lots of compiler warnings caused by old type casts.
2019-11-01 14:53:14 +01:00
Stefan Weil
49659dbc1d mfoutline: Modernize code
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment

This fixes several warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:08 +01:00
Stefan Weil
4f815797bc intmatcher: Avoid conversion from double to float and vice versa
This fixes some clang warnings:

    src/classify/intmatcher.cpp:48:49: warning:
      implicit conversion loses floating-point precision:
      'double' to 'const float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:34: warning:
      implicit conversion loses floating-point precision:
      'double' to 'float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:64: warning:
      implicit conversion increases floating-point precision:
      'float' to 'double' [-Wdouble-promotion]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:02 +01:00
Stefan Weil
8602568b52 Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare)
This fixes a clang warning:

    src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
      unsigned enum expression >= 0 is always true
      [-Wtautological-unsigned-enum-zero-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:56 +01:00
Stefan Weil
58557299c5 cmake: Set default build type to Release
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:49 +01:00
zdenop
b913bedfc7 cmake: AUTO_OPTIMIZE: enable to turn-off auto optimize macros 2019-11-01 14:52:36 +01:00
zdenop
be68642e19 cmake: arch files: fix duplicate build and follow autotools logic 2019-11-01 14:52:30 +01:00
zdenop
0af21dcd07 Re-add strngs.h to public API
It is still needed.
  This partially reverts commit a730b5c4ff.

  Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:16 +01:00
Stefan Weil
9fb7aa6936 Add missing linker flags for TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:50:54 +01:00
zdenop
ad97b521bf cmake: auto optimize macros must be run before setting CMAKE_CXX_FLAGS* 2019-11-01 14:50:47 +01:00
zdenop
f8d95bb478 cmake: fix build type flags 2019-11-01 14:50:41 +01:00
zdenop
51a8c08b3e ignore #pragma optimize for clang-cl 2019-11-01 14:50:35 +01:00
zdenop
3d5b1b626a cmake: remove moved (training) header from installation 2019-11-01 14:50:26 +01:00
Egor Pugin
c1de84e431 Update sw build. 2019-11-01 14:50:14 +01:00
Egor Pugin
ec212754cb Update sw build. 2019-11-01 14:50:07 +01:00
Stefan Weil
4c3c38573b Remove vecfuncs.cpp and vecfunc.h
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:49:52 +01:00
zdenop
8993319cd7 Disable legacy build: Disable bitvec.h
# Conflicts:
#	src/cutil/Makefile.am
2019-11-01 14:41:01 +01:00
zdenop
ebe136c08d Remove STRING from the public Tesseract API
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccutil/Makefile.am
#	src/ccutil/genericvector.h
#	src/training/lstmtester.cpp
2019-11-01 14:34:44 +01:00
zdenop
601ee34276 Add more initial values for class Classify from constructor to header file
This fixes compiler warnings caused by
commit 751fcd2b11:

    src/classify/classify.cpp:176:7: warning:
      field 'EnableLearning' will be initialized after
      field 'il1_adaption_test' [-Wreorder]
    src/classify/classify.cpp:187:7: warning:
      field 'dict_' will be initialized after
      field 'static_classifier_' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/classify/classify.cpp
2019-11-01 14:30:41 +01:00
amitdo
224f1c01f0 Disable legacy build: Disable more unneeded code 2019-11-01 14:29:00 +01:00
Stefan Weil
2d4b355485 Fix CID 1405673 part 2 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:35 +01:00
Robert Schubert
dd8bfa0d40 fix langdata (user words/patterns) file suffixes for LSTMs:
- add another constructor for LSTMRecognizer
  which takes the language_data_path_prefix configured/selected
  at runtime and passes it to the internal CCUtil
- use this in Tesseract::init_tesseract_lang_data when LSTMs
  are available

(this was missing from 297d7d86ce)
2019-11-01 14:28:22 +01:00
amitdo
357177c169 Disabled legacy build: Disable more unneeded code 2019-11-01 14:28:16 +01:00
Stefan Weil
a3e1463ebe Fix CID 1405673 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:08 +01:00
Stefan Weil
44a226088c Fix CID 1164624 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:03 +01:00