Commit Graph

4220 Commits

Author SHA1 Message Date
zdenop
21c83b8036 Use "C" locale for printing parameters
This fixes a test for the Python wrapper `tesserocr` (python setup.py test).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 09:05:56 +01:00
zdenop
6d8ef9a168 fix using bilevel tiff in pdf output 2019-11-11 09:02:35 +01:00
Stefan Weil
eaf1f69679 Fix issue #2748
Commit 94d0f77f56 tried to fix issue #2741
but created a new problem.

This commit should fix both old and new issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 09:01:00 +01:00
Stefan Weil
1ce3cd2805 Use BRT_UNKNOWN instead of BRT_NOISE to initialize ColPartition::blob_type_
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 09:00:46 +01:00
Stefan Weil
d20db0f258 Add missing libraries in configuration for pkg-config
This fixes linker errors in third-party software like tesserocr for builds
which use any of these libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 08:58:28 +01:00
maungd@battelle.org
41ec627199 Exposed the text2image option --ptsize to tesstrain.sh. Text2image has the
option --ptsize which defaults to 12.  This option is not exposed through
tesstrain.sh; thus, you cannot use tesstrain.sh to explore training with
different font sizes.  I made a small modification to expose the --ptsize
option to tesstrain.sh.  It defaults to 12 if not specified.
2019-11-11 08:58:16 +01:00
zdenop
975c626dc3 Fail if no valid lstmf file was written (fix issue #2741)
Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/ccmain/linerec.cpp
2019-11-11 08:58:03 +01:00
Stefan Weil
185d237c2e Don't create an empty lstmf file
If Tesseract cannot find text in the input image, it should not write
an empty lstmf file. This problem was reported in issue #2741.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 08:52:42 +01:00
Stefan Weil
2f011aece6 Use pre-calculated lookup tables for all C++ compilers
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 08:52:11 +01:00
Stefan Weil
13237d8566 Fix build for Intel Compiler (issue #2736)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-11 08:50:35 +01:00
zdenop
87291fff62 Improve ABI compatibility with version 4.1.0 2019-11-10 20:27:40 +01:00
zdenop
960583ca56 partly Revert "Remove global array kPolyBlockNames from Tesseract library" to improve backwards API compatibility
This reverts commit c3d4742af6.

# Conflicts:
#	src/ccstruct/Makefile.am
2019-11-10 19:47:36 +01:00
zdenop
c423ad4215 partly Revert "Add more initial values for class Classify from constructor to header file" to improve backwards API compatibility
This reverts commit 601ee34276.
2019-11-10 19:27:16 +01:00
zdenop
b36cd63922 Revert "Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874)"
This reverts commit 4bbfabaa67.
2019-11-05 08:39:17 +01:00
zdenop
d423fb16b2 add cppan depreciate info 2019-11-02 00:27:10 +01:00
zdenop
979b17f20e fix sw build (add missing part from Move LSTMTrainer from libtesseract to libtesseract_training) 2019-11-02 00:25:10 +01:00
zdenop
b808997958 fix cmake training build 2019-11-01 22:47:03 +01:00
zdenop
a2444bc55e fix string conversion in lstmtrainer 2019-11-01 21:16:33 +01:00
zdenop
cca107d5f6 fix autotools build of tesseract library 2019-11-01 17:36:13 +01:00
zdenop
2b4212c206 add untracked src/training/lstmtrainer.h 2019-11-01 16:23:26 +01:00
zdenop
0cbd8297e4 4.1.1 Release Candidate 1 2019-11-01 15:35:29 +01:00
zdenop
dadf1329e6 cmake: fix clan openmp build on windows 2019-11-01 15:28:53 +01:00
zdenop
710fa82fb0 fix inverting (Bilevel BW png) in pdf; fixes # 2059 2019-11-01 15:18:09 +01:00
zdenop
f27ca3e348 Add pageseg_apply_music_mask option to allow disabling the music mask
# Conflicts:
#	src/ccmain/tesseractclass.cpp
#	src/ccmain/tesseractclass.h
2019-11-01 15:16:29 +01:00
Shree
6a671e39df remove legacy parameter disable_character_fragments from lstm.train 2019-11-01 15:08:52 +01:00
wshwang
cdd2a887f4 src/ccutil/bits16.h remove warnings (#2726) 2019-11-01 15:08:20 +01:00
wshwang
23669398f9 Remove warning C4312 2019-11-01 15:08:14 +01:00
zdenop
9295381d0f Subject: training: show error description for open/delete file 2019-11-01 15:08:05 +01:00
Stefan Weil
4bbfabaa67 Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:59 +01:00
Stefan Weil
e3441f0ced Copy resolution of source image (fix issue #1702)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:20 +01:00
Stefan Weil
090d3c4b4c Fix typo in README.md (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:07:14 +01:00
zdenop
7488c85ef1 fix memory leak at PangoFontInfo::CanRenderString 2019-11-01 15:07:06 +01:00
Hyeonguk Ryu
080f83a17c Change from HTTP to HTTPS 2019-11-01 15:06:57 +01:00
zdenop
4a5ec186f6 test for synthesized font faces. 2019-11-01 15:06:48 +01:00
zdenop
f9d1bda7e3 cmake: add minimum required version for pango and icu based on autotools 2019-11-01 15:06:41 +01:00
zdenop
99645e3c40 text2image: show pango version 2019-11-01 15:06:35 +01:00
Stefan Weil
077616fd36 quadlsq: Fix warnings from LGTM
Fix two occurrences of this LGTM warning:

    Multiplication result may overflow 'double'
      before it is converted to 'long double'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:30 +01:00
Stefan Weil
02d916a64f Use "C" locale for PDF output
This fixes wrong output of integers with locale de_DE.UTF-8:

    -  /Width 2.481
    -  /Height 3.508
    +  /Width 2481
    +  /Height 3508

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:23 +01:00
Stefan Weil
efc2b7601a Use "C" locale for ALTO output
This fixes wrong output of integers with locale de_DE.UTF-8:

    - <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0">
    + <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:18 +01:00
Stefan Weil
83af58c2aa Fix build error (undefined local variable)
The latest commit 96025c7923 was incomplete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:11 +01:00
Stefan Weil
6a8be20b4c Remove unimplemented +/- for parameter files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:06:02 +01:00
zdenop
c7e9f31b0a do not exit if not existing parameter is used. fixes #1334
# Conflicts:
#	src/ccmain/tessedit.cpp
2019-11-01 15:05:47 +01:00
zdenop
03a0586931 Report when tesseract legacy engine not present. (fix issue #2053) 2019-11-01 15:02:58 +01:00
Egor Pugin
17828fc7e8 Fix isolated build. 2019-11-01 15:02:49 +01:00
Stefan Weil
9c52eb0cba Add new parameter "document_title" to set the title in OCR output files
The title can be set for hOCR and PDF output.

Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700.

The constant unknown_title_ is no longer needed and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:42 +01:00
Stefan Weil
e43eb9104c sw.cpp: Sync list of public headers with Autotools build
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 15:02:32 +01:00
zdenop
fc45fc51b5 CMake: Sync list of public headers with Autotools build 2019-11-01 15:02:20 +01:00
jm
a5670421a7 speed optimisation - add the option to disable automatic inverting of line images 2019-11-01 14:58:59 +01:00
Stefan Weil
ebff4dae35 Fix comment which referred to unused Tesseract parameter
This completes commit aa2ab68e29.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:58:50 +01:00
zdenop
b244bd1c13 Removed unused parameters
The following parameters are not used anywhere anymore:

 * use_definite_ambigs_for_classifier
 * max_viterbi_list_size
 * word_to_debug_lengths
 * fragments_debug
 * tessedit_redo_xheight
 * debug_acceptable_wds
 * tessedit_matcher_log
 * tessedit_test_adaption_mode
 * docqual_excuse_outline_errs
 * crunch_pot_garbage
 * suspect_space_level
 * tessedit_consistent_reps
 * wordrec_display_all_words
 * wordrec_no_block
 * wordrec_worst_state
 * fragments_guide_chopper
 * segment_adjust_debug
 * classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
 * classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
 * classify_min_norm_scale_x
 * classify_max_norm_scale_x
 * classify_min_norm_scale_y
 * classify_max_norm_scale_y
 * il1_adaption_test
 * textord_blob_size_bigile
 * textord_blob_size_smallile
 * editor_debug_config_file
 * textord_tabfind_show_color_fit

The list was generated by a python script and each parameter occurence checked
manually.

# Conflicts:
#	src/classify/classify.cpp
2019-11-01 14:58:36 +01:00