amitdo
2f8884a64e
Fix autotools build
2019-10-28 21:23:58 +02:00
amitdo
e1bae15547
Fix #include path of public headers
2019-10-28 19:10:30 +02:00
amitdo
dfede8ac01
Move all public headers to include/tesseract
2019-10-28 18:50:31 +02:00
zdenop
cede5b34e7
Add pageseg_apply_music_mask option to allow disabling the musi… ( #2732 )
...
Add pageseg_apply_music_mask option to allow disabling the music mask
2019-10-27 17:02:05 +01:00
zdenop
4a37cde0d9
fix inverting (Bilevel BW png) in pdf; fixes # 2059
2019-10-27 14:15:12 +01:00
Nat
52bc15acd9
Add pageseg_apply_music_mask option to allow disabling the music mask
2019-10-24 11:44:05 -05:00
Egor Pugin
048f729785
Merge branch 'master' of github.com-egorpugin:tesseract-ocr/tesseract
2019-10-23 23:30:12 +03:00
Egor Pugin
401e60c54c
Merge pull request #2728 from egorpugin/master
...
Remove TESS_CALL.
2019-10-23 23:29:38 +03:00
Shree
df6b1ce452
remove legacy parameter disable_character_fragments from lstm.train
2019-10-23 13:15:16 +02:00
Egor Pugin
c727b556f0
Remove unneeded TESS_API from source file.
2019-10-23 13:26:46 +03:00
Egor Pugin
e2688c39e9
Remove TESS_CALL.
2019-10-23 13:21:59 +03:00
wshwang
4ee95a615a
src/ccutil/bits16.h remove warnings ( #2726 )
2019-10-23 11:46:24 +02:00
wshwang
71e291bae5
Remove warning C4312
2019-10-22 13:06:44 +02:00
zdenop
fc629eae3b
Subject: training: show error description for open/delete file
2019-10-21 16:31:57 +02:00
Stefan Weil
90bcff3732
Delete copy constructor and assignment operator for TessBaseAPI (fix issue #874 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-21 13:12:36 +02:00
zdenop
c2775832de
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2019-10-20 22:17:23 +02:00
zdenop
3762feb200
Provide more details for sucessfull running of unittests.
2019-10-20 22:15:21 +02:00
Stefan Weil
a209a6b4b5
Copy resolution of source image (fix issue #1702 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-20 20:45:35 +02:00
Stefan Weil
8be2346c4c
Fix typo in README.md (found by codespell)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-20 18:13:22 +02:00
zdenop
3a5c107345
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2019-10-20 16:43:42 +02:00
zdenop
36dc2ccf75
fix memory leak at PangoFontInfo::CanRenderString
2019-10-20 16:43:04 +02:00
Egor Pugin
59e16daf93
Merge pull request #2718 from hyeongukryu/patch-1
...
Change from HTTP to HTTPS
2019-10-19 16:55:57 +03:00
Hyeonguk Ryu
508a965b32
Change from HTTP to HTTPS
2019-10-19 22:41:29 +09:00
zdenop
1ec34378d9
test for synthesized font faces.
2019-10-19 15:05:28 +02:00
zdenop
cbbe45d94b
cmake: add minimum required version for pango and icu based on autotools
2019-10-19 15:00:49 +02:00
zdenop
37c7a5dd82
text2image: show pango version
2019-10-19 14:52:06 +02:00
Stefan Weil
73a38b39d5
quadlsq: Fix warnings from LGTM
...
Fix two occurrences of this LGTM warning:
Multiplication result may overflow 'double'
before it is converted to 'long double'.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 12:07:54 +02:00
Stefan Weil
22cf0f854d
Use "C" locale for PDF output
...
This fixes wrong output of integers with locale de_DE.UTF-8:
- /Width 2.481
- /Height 3.508
+ /Width 2481
+ /Height 3508
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:30:42 +02:00
Stefan Weil
914a8e40d6
Use "C" locale for ALTO output
...
This fixes wrong output of integers with locale de_DE.UTF-8:
- <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0">
+ <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:18:27 +02:00
Stefan Weil
3e8cc203f4
Fix build error (undefined local variable)
...
The latest commit 96025c7923
was incomplete.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-18 11:05:31 +02:00
Egor Pugin
d24c16f767
Merge pull request #2715 from stweil/params
...
Remove unimplemented +/- for parameter files
2019-10-17 22:18:04 +03:00
Stefan Weil
96025c7923
Remove unimplemented +/- for parameter files
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-17 17:14:43 +02:00
zdenop
a3cfd66f37
do not exit if not existing parameter is used. fixes #1334
2019-10-15 07:56:22 +02:00
zdenop
0150fc57cc
Report when tesseract legacy engine not present. (fix issue #2053 )
2019-10-14 22:55:47 +02:00
Egor Pugin
247cd0edc4
Merge pull request #2705 from stweil/title
...
Add new parameter "document_title" to set the title in OCR output files
2019-10-11 02:23:36 +03:00
Egor Pugin
fb52f43822
Fix isolated build.
2019-10-11 00:46:23 +03:00
Stefan Weil
a1e3150bd7
Add new parameter "document_title" to set the title in OCR output files
...
The title can be set for hOCR and PDF output.
Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700 .
The constant unknown_title_ is no longer needed and therefore removed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-10 15:42:52 +02:00
Egor Pugin
8b694b8a71
Merge pull request #2694 from stweil/master
...
sw.cpp: Sync list of public headers with Autotools build
2019-10-07 00:16:12 +03:00
Stefan Weil
1012252004
sw.cpp: Sync list of public headers with Autotools build
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-06 22:30:36 +02:00
amitdo
227501c580
CMake: Sync list of public headers with Autotools build
2019-10-06 13:59:18 +02:00
zdenop
6d171b889c
Merge pull request #2686 from stweil/boxfilename
...
Extend function BoxFileName to handle more common image names
2019-10-05 16:53:34 +02:00
Stefan Weil
7a7704bc94
Extend function BoxFileName to handle more common image names
...
The function derives the file name for the .box file from an image name.
For training from existing line images, it is useful to directly support
the image names which are commonly used.
While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.
BoxFileName is also now a local function as it is only used locally.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-05 15:59:56 +02:00
zdenop
84c410a8e3
Merge pull request #2690 from vidiecan/master
...
Optional speed optimisation
2019-10-04 13:02:51 +02:00
jm
fb150265ef
speed optimisation - add the option to disable automatic inverting of line images
2019-10-04 10:09:52 +02:00
Stefan Weil
6b35d6ff6e
Fix comment which referred to unused Tesseract parameter
...
This completes commit aa2ab68e29
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-03 09:23:25 +02:00
Johannes Künsebeck
aa2ab68e29
Removed unused parameters
...
The following parameters are not used anywhere anymore:
* use_definite_ambigs_for_classifier
* max_viterbi_list_size
* word_to_debug_lengths
* fragments_debug
* tessedit_redo_xheight
* debug_acceptable_wds
* tessedit_matcher_log
* tessedit_test_adaption_mode
* docqual_excuse_outline_errs
* crunch_pot_garbage
* suspect_space_level
* tessedit_consistent_reps
* wordrec_display_all_words
* wordrec_no_block
* wordrec_worst_state
* fragments_guide_chopper
* segment_adjust_debug
* classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
* classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
* classify_min_norm_scale_x
* classify_max_norm_scale_x
* classify_min_norm_scale_y
* classify_max_norm_scale_y
* il1_adaption_test
* textord_blob_size_bigile
* textord_blob_size_smallile
* editor_debug_config_file
* textord_tabfind_show_color_fit
The list was generated by a python script and each parameter occurence checked
manually.
2019-10-03 09:18:29 +02:00
Egor Pugin
8095e6c1c3
Merge pull request #2685 from stweil/lstm.train
...
Don't create OCR result files when training data is created
2019-10-02 22:00:41 +03:00
Stefan Weil
1e84a6f225
Don't create OCR result files when training data is created
...
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.
In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-02 19:29:27 +02:00
Egor Pugin
445d06375d
Merge pull request #2134 from stweil/curl
...
RFC: Add support for image or image list by URL
2019-10-01 16:29:43 +03:00
Stefan Weil
94651e65ce
Simplify configure.ac
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:32:08 +02:00