Commit Graph

4176 Commits

Author SHA1 Message Date
Stefan Weil
9c52eb0cba Add new parameter "document_title" to set the title in OCR output files
The title can be set for hOCR and PDF output.

Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700.

The constant unknown_title_ is no longer needed and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:42 +01:00
Stefan Weil
e43eb9104c sw.cpp: Sync list of public headers with Autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:32 +01:00
zdenop
fc45fc51b5 CMake: Sync list of public headers with Autotools build 2019-11-01 15:02:20 +01:00
jm
a5670421a7 speed optimisation - add the option to disable automatic inverting of line images 2019-11-01 14:58:59 +01:00
Stefan Weil
ebff4dae35 Fix comment which referred to unused Tesseract parameter
This completes commit aa2ab68e29.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:58:50 +01:00
zdenop
b244bd1c13 Removed unused parameters
The following parameters are not used anywhere anymore:

 * use_definite_ambigs_for_classifier
 * max_viterbi_list_size
 * word_to_debug_lengths
 * fragments_debug
 * tessedit_redo_xheight
 * debug_acceptable_wds
 * tessedit_matcher_log
 * tessedit_test_adaption_mode
 * docqual_excuse_outline_errs
 * crunch_pot_garbage
 * suspect_space_level
 * tessedit_consistent_reps
 * wordrec_display_all_words
 * wordrec_no_block
 * wordrec_worst_state
 * fragments_guide_chopper
 * segment_adjust_debug
 * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
 * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
 * classify_min_norm_scale_x
 * classify_max_norm_scale_x
 * classify_min_norm_scale_y
 * classify_max_norm_scale_y
 * il1_adaption_test
 * textord_blob_size_bigile
 * textord_blob_size_smallile
 * editor_debug_config_file
 * textord_tabfind_show_color_fit

The list was generated by a python script and each parameter occurence checked
manually.

# Conflicts:
#	src/classify/classify.cpp
2019-11-01 14:58:36 +01:00
Stefan Weil
58122ea313 Don't create OCR result files when training data is created
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.

In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:43 +01:00
Stefan Weil
3dfd72721b Simplify configure.ac
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:34 +01:00
Stefan Weil
ca172592da Add support for image or image list by URL
This allows OCR of images from the internet without downloading them first:

    tesseract http://IMAGE_URL OUTPUT ...

It uses libcurl.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:28 +01:00
Stefan Weil
190536bbd7 Modernize code for LIST (fix some -Wold-style-cast warnings)
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)

This fixes lots of compiler warnings caused by old type casts.
2019-11-01 14:53:14 +01:00
Stefan Weil
49659dbc1d mfoutline: Modernize code
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment

This fixes several warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:08 +01:00
Stefan Weil
4f815797bc intmatcher: Avoid conversion from double to float and vice versa
This fixes some clang warnings:

    src/classify/intmatcher.cpp:48:49: warning:
      implicit conversion loses floating-point precision:
      'double' to 'const float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:34: warning:
      implicit conversion loses floating-point precision:
      'double' to 'float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:64: warning:
      implicit conversion increases floating-point precision:
      'float' to 'double' [-Wdouble-promotion]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:53:02 +01:00
Stefan Weil
8602568b52 Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare)
This fixes a clang warning:

    src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
      unsigned enum expression >= 0 is always true
      [-Wtautological-unsigned-enum-zero-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:56 +01:00
Stefan Weil
58557299c5 cmake: Set default build type to Release
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:49 +01:00
zdenop
b913bedfc7 cmake: AUTO_OPTIMIZE: enable to turn-off auto optimize macros 2019-11-01 14:52:36 +01:00
zdenop
be68642e19 cmake: arch files: fix duplicate build and follow autotools logic 2019-11-01 14:52:30 +01:00
zdenop
0af21dcd07 Re-add strngs.h to public API
It is still needed.
  This partially reverts commit a730b5c4ff.

  Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:52:16 +01:00
Stefan Weil
9fb7aa6936 Add missing linker flags for TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:50:54 +01:00
zdenop
ad97b521bf cmake: auto optimize macros must be run before setting CMAKE_CXX_FLAGS* 2019-11-01 14:50:47 +01:00
zdenop
f8d95bb478 cmake: fix build type flags 2019-11-01 14:50:41 +01:00
zdenop
51a8c08b3e ignore #pragma optimize for clang-cl 2019-11-01 14:50:35 +01:00
zdenop
3d5b1b626a cmake: remove moved (training) header from installation 2019-11-01 14:50:26 +01:00
Egor Pugin
c1de84e431 Update sw build. 2019-11-01 14:50:14 +01:00
Egor Pugin
ec212754cb Update sw build. 2019-11-01 14:50:07 +01:00
Stefan Weil
4c3c38573b Remove vecfuncs.cpp and vecfunc.h
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:49:52 +01:00
zdenop
8993319cd7 Disable legacy build: Disable bitvec.h
# Conflicts:
#	src/cutil/Makefile.am
2019-11-01 14:41:01 +01:00
zdenop
ebe136c08d Remove STRING from the public Tesseract API
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccutil/Makefile.am
#	src/ccutil/genericvector.h
#	src/training/lstmtester.cpp
2019-11-01 14:34:44 +01:00
zdenop
601ee34276 Add more initial values for class Classify from constructor to header file
This fixes compiler warnings caused by
commit 751fcd2b11:

    src/classify/classify.cpp:176:7: warning:
      field 'EnableLearning' will be initialized after
      field 'il1_adaption_test' [-Wreorder]
    src/classify/classify.cpp:187:7: warning:
      field 'dict_' will be initialized after
      field 'static_classifier_' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/classify/classify.cpp
2019-11-01 14:30:41 +01:00
amitdo
224f1c01f0 Disable legacy build: Disable more unneeded code 2019-11-01 14:29:00 +01:00
Stefan Weil
2d4b355485 Fix CID 1405673 part 2 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:35 +01:00
Robert Schubert
dd8bfa0d40 fix langdata (user words/patterns) file suffixes for LSTMs:
- add another constructor for LSTMRecognizer
  which takes the language_data_path_prefix configured/selected
  at runtime and passes it to the internal CCUtil
- use this in Tesseract::init_tesseract_lang_data when LSTMs
  are available

(this was missing from 297d7d86ce)
2019-11-01 14:28:22 +01:00
amitdo
357177c169 Disabled legacy build: Disable more unneeded code 2019-11-01 14:28:16 +01:00
Stefan Weil
a3e1463ebe Fix CID 1405673 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:08 +01:00
Stefan Weil
44a226088c Fix CID 1164624 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:28:03 +01:00
Stefan Weil
4c987baa58 Fix comment for Textord::make_prop_words
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:27:57 +01:00
Stefan Weil
deae22ac3d Re-order initialisation in constructor of class ViterbiStateEntry
This fixes compiler warnings caused by
commit 091ce345f6:

    src/wordrec/lm_state.h💯7: warning: field 'cost'
      will be initialized after field 'curr_b' [-Wreorder]
    src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags'
      will be initialized after field 'dawg_info' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:27:50 +01:00
Stefan Weil
45bd039ec3 Move initial values for class ColPartition from constructor to header file
This fixes compiler warnings caused by
commit 5b4565b80b:

    src/textord/colpartition.cpp:91:24: warning: field 'last_column_'
      will be initialized after field 'column_set_' [-Wreorder]
    src/textord/colpartition.cpp:93:37: warning: field 'inside_table_column_'
      will be initialized after field 'nearest_neighbor_above_' [-Wreorder]
    src/textord/colpartition.cpp:95:58: warning: field 'space_to_right_'
      will be initialized after field 'owns_blobs_' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:27:44 +01:00
Stefan Weil
24bc9d4979 Re-order initialisation in constructors of classes Dawg and DawgPosition
This fixes compiler warnings caused by
commit ecf0f2dee5:

    src/dict/dawg.h:202:9: warning: field 'type_' will be initialized
      after field 'lang_' [-Wreorder]
    src/dict/dawg.h:355:9: warning: field 'dawg_index' will be initialized
      after field 'dawg_ref' [-Wreorder]
    src/dict/dawg.h:356:9: warning: field 'punc_index' will be initialized
      after field 'punc_ref' [-Wreorder]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:27:38 +01:00
Stefan Weil
f6cc2bebf4 Fix CID 1164666 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:26:05 +01:00
Stefan Weil
20b2da6886 Fix CID 1164664 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:25:59 +01:00
Stefan Weil
5138b3e576 Fix CID 1375402 (Uninitialized pointer field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:25:53 +01:00
Stefan Weil
b2ab64ef88 simd: Check OSXSAVE bit before calling _xgetbv
Both checks are needed for AVX, AVX2 and FMA checks.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:25:47 +01:00
Stefan Weil
96e60a52e0 Remove UnicharAmbigs for builds without legacy code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:25:41 +01:00
amitdo
0eacee03b3 Disabled legacy engine build: Disable code related to ambigs. 2019-11-01 14:25:31 +01:00
Stefan Weil
80c36095fa Fix 1164647 (Uninitialized members)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:25:22 +01:00
zdenop
567fc17377 Fix CID 1366450 (Uninitialized scalar field) for class RecodeBeamSearch
secondary_beam_size_ is set but never used, so remove it.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/lstm/recodebeam.cpp
2019-11-01 14:25:00 +01:00
Stefan Weil
b1965ad0fe Fix CID 1164662 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:16:15 +01:00
Stefan Weil
73de2d99f5 Fix CID 1164659 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:16:09 +01:00
Stefan Weil
90936b98c6 Fix CID 1164657 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:16:03 +01:00
Stefan Weil
f9dd65a246 Fix CID 1164649 (Uninitialized scalar field)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:15:57 +01:00