Commit Graph

5083 Commits

Author SHA1 Message Date
Stefan Weil
22cf0f854d Use "C" locale for PDF output
This fixes wrong output of integers with locale de_DE.UTF-8:

    -  /Width 2.481
    -  /Height 3.508
    +  /Width 2481
    +  /Height 3508

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:30:42 +02:00
Stefan Weil
914a8e40d6 Use "C" locale for ALTO output
This fixes wrong output of integers with locale de_DE.UTF-8:

    - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0">
    + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:18:27 +02:00
Stefan Weil
3e8cc203f4 Fix build error (undefined local variable)
The latest commit 96025c7923 was incomplete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:05:31 +02:00
Egor Pugin
d24c16f767
Merge pull request #2715 from stweil/params
Remove unimplemented +/- for parameter files
2019-10-17 22:18:04 +03:00
Stefan Weil
96025c7923 Remove unimplemented +/- for parameter files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-17 17:14:43 +02:00
zdenop
a3cfd66f37 do not exit if not existing parameter is used. fixes #1334 2019-10-15 07:56:22 +02:00
zdenop
0150fc57cc Report when tesseract legacy engine not present. (fix issue #2053) 2019-10-14 22:55:47 +02:00
Egor Pugin
247cd0edc4
Merge pull request #2705 from stweil/title
Add new parameter "document_title" to set the title in OCR output files
2019-10-11 02:23:36 +03:00
Egor Pugin
fb52f43822 Fix isolated build. 2019-10-11 00:46:23 +03:00
Stefan Weil
a1e3150bd7 Add new parameter "document_title" to set the title in OCR output files
The title can be set for hOCR and PDF output.

Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700.

The constant unknown_title_ is no longer needed and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-10 15:42:52 +02:00
Egor Pugin
8b694b8a71
Merge pull request #2694 from stweil/master
sw.cpp: Sync list of public headers with Autotools build
2019-10-07 00:16:12 +03:00
Stefan Weil
1012252004 sw.cpp: Sync list of public headers with Autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-06 22:30:36 +02:00
amitdo
227501c580 CMake: Sync list of public headers with Autotools build 2019-10-06 13:59:18 +02:00
zdenop
6d171b889c
Merge pull request #2686 from stweil/boxfilename
Extend function BoxFileName to handle more common image names
2019-10-05 16:53:34 +02:00
Stefan Weil
7a7704bc94 Extend function BoxFileName to handle more common image names
The function derives the file name for the .box file from an image name.

For training from existing line images, it is useful to directly support
the image names which are commonly used.

While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.

BoxFileName is also now a local function as it is only used locally.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-05 15:59:56 +02:00
zdenop
84c410a8e3
Merge pull request #2690 from vidiecan/master
Optional speed optimisation
2019-10-04 13:02:51 +02:00
jm
fb150265ef speed optimisation - add the option to disable automatic inverting of line images 2019-10-04 10:09:52 +02:00
Stefan Weil
6b35d6ff6e Fix comment which referred to unused Tesseract parameter
This completes commit aa2ab68e29.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-03 09:23:25 +02:00
Johannes Künsebeck
aa2ab68e29 Removed unused parameters
The following parameters are not used anywhere anymore:

 * use_definite_ambigs_for_classifier
 * max_viterbi_list_size
 * word_to_debug_lengths
 * fragments_debug
 * tessedit_redo_xheight
 * debug_acceptable_wds
 * tessedit_matcher_log
 * tessedit_test_adaption_mode
 * docqual_excuse_outline_errs
 * crunch_pot_garbage
 * suspect_space_level
 * tessedit_consistent_reps
 * wordrec_display_all_words
 * wordrec_no_block
 * wordrec_worst_state
 * fragments_guide_chopper
 * segment_adjust_debug
 * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
 * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
 * classify_min_norm_scale_x
 * classify_max_norm_scale_x
 * classify_min_norm_scale_y
 * classify_max_norm_scale_y
 * il1_adaption_test
 * textord_blob_size_bigile
 * textord_blob_size_smallile
 * editor_debug_config_file
 * textord_tabfind_show_color_fit

The list was generated by a python script and each parameter occurence checked
manually.
2019-10-03 09:18:29 +02:00
Egor Pugin
8095e6c1c3
Merge pull request #2685 from stweil/lstm.train
Don't create OCR result files when training data is created
2019-10-02 22:00:41 +03:00
Stefan Weil
1e84a6f225 Don't create OCR result files when training data is created
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.

In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-02 19:29:27 +02:00
Egor Pugin
445d06375d
Merge pull request #2134 from stweil/curl
RFC: Add support for image or image list by URL
2019-10-01 16:29:43 +03:00
Stefan Weil
94651e65ce Simplify configure.ac
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:32:08 +02:00
Stefan Weil
286d8275c7 Add support for image or image list by URL
This allows OCR of images from the internet without downloading them first:

    tesseract http://IMAGE_URL OUTPUT ...

It uses libcurl.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:10:45 +02:00
Egor Pugin
da0fa73e77
Merge pull request #2678 from stweil/warnings
Fix some clang compiler warnings
2019-10-01 12:50:55 +03:00
Stefan Weil
47d70d7014 Modernize code for LIST (fix some -Wold-style-cast warnings)
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)

This fixes lots of compiler warnings caused by old type casts.
2019-10-01 11:12:00 +02:00
Stefan Weil
672d67859f mfoutline: Modernize code
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment

This fixes several warnings from clang.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 21:33:15 +02:00
Stefan Weil
7ec5f0ca02 intmatcher: Avoid conversion from double to float and vice versa
This fixes some clang warnings:

    src/classify/intmatcher.cpp:48:49: warning:
      implicit conversion loses floating-point precision:
      'double' to 'const float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:34: warning:
      implicit conversion loses floating-point precision:
      'double' to 'float' [-Wimplicit-float-conversion]
    src/classify/intmatcher.cpp:405:64: warning:
      implicit conversion increases floating-point precision:
      'float' to 'double' [-Wdouble-promotion]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 18:05:26 +02:00
Stefan Weil
6d259ebe44 Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare)
This fixes a clang warning:

    src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
      unsigned enum expression >= 0 is always true
      [-Wtautological-unsigned-enum-zero-compare]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-29 22:13:27 +02:00
Stefan Weil
b3b740eb22 cmake: Set default build type to Release
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-29 13:53:44 +02:00
zdenop
021f4d553b Merge branch 'master' of https://github.com/tesseract-ocr/tesseract 2019-09-28 14:38:36 +02:00
zdenop
e8e77957ae cmake: AUTO_OPTIMIZE: enable to turn-off auto optimize macros 2019-09-28 14:37:05 +02:00
zdenop
573dc31adb cmake: arch files: fix duplicate build and follow autotools logic 2019-09-28 14:35:44 +02:00
Stefan Weil
49e351508c Re-add strngs.h to public API
It is still needed.
This partially reverts commit a730b5c4ff.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 10:34:48 +02:00
Stefan Weil
8ad86d6494 Add missing linker flags for TensorFlow
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 09:42:37 +02:00
zdenop
8a62d49914 cmake: auto optimize macros must be run before setting CMAKE_CXX_FLAGS* 2019-09-28 08:27:43 +02:00
zdenop
21680fa75b cmake: fix build type flags 2019-09-27 23:56:36 +02:00
zdenop
d6aa866430 ignore #pragma optimize for clang-cl 2019-09-27 21:19:37 +02:00
zdenop
b1f7047a5f cmake: remove moved (training) header from installation 2019-09-27 21:08:17 +02:00
Egor Pugin
52cf4615dc Update sw build. 2019-09-26 00:34:36 +03:00
Egor Pugin
9217aa5c95 Update sw build. 2019-09-26 00:22:07 +03:00
Egor Pugin
ac0190bfaa
Merge pull request #2677 from stweil/vecfuncs
Remove vecfuncs.cpp and vecfunc.h
2019-09-25 23:33:01 +03:00
Stefan Weil
74d5ce82a6 Remove vecfuncs.cpp and vecfunc.h
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 21:20:03 +02:00
Stefan Weil
eec9c96767 Remove member functions STRING::string and StringParam::string (continued)
Commit 994ec697d8 did not update unittest.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 10:07:51 +02:00
Stefan Weil
7bddad59d1 Optimize class ChoiceIterator
Re-order a class variable to avoid memory holes and
remove unused class variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 09:43:57 +02:00
zdenop
b5c1fcc9bf
Merge pull request #2673 from noahmetzger/LSTMChoiceRIL
Fixed minor bug in ChoiceIterator when lstm_choice_mode isn't active.
2019-09-24 15:48:27 +02:00
Noah Metzger
ff4c1d204d Fixed minor bug with the Choice iterator when lstm_choice_mode is not active.
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-24 15:38:28 +02:00
Egor Pugin
cb0c024a6f
Merge pull request #2672 from stweil/api
Remove member functions STRING::string and StringParam::string
2019-09-24 01:31:18 +03:00
Stefan Weil
994ec697d8 Remove member functions STRING::string and StringParam::string
They were redundant because there exist member functions 'c_str' which do the same.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Egor Pugin
1fa7324cf7
Merge pull request #2668 from stweil/api
Remove STRING from the public Tesseract API
2019-09-23 01:02:26 +03:00