Stefan Weil
7a7704bc94
Extend function BoxFileName to handle more common image names
...
The function derives the file name for the .box file from an image name.
For training from existing line images, it is useful to directly support
the image names which are commonly used.
While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.
BoxFileName is also now a local function as it is only used locally.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-05 15:59:56 +02:00
zdenop
84c410a8e3
Merge pull request #2690 from vidiecan/master
...
Optional speed optimisation
2019-10-04 13:02:51 +02:00
jm
fb150265ef
speed optimisation - add the option to disable automatic inverting of line images
2019-10-04 10:09:52 +02:00
Stefan Weil
6b35d6ff6e
Fix comment which referred to unused Tesseract parameter
...
This completes commit aa2ab68e29
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-03 09:23:25 +02:00
Johannes Künsebeck
aa2ab68e29
Removed unused parameters
...
The following parameters are not used anywhere anymore:
* use_definite_ambigs_for_classifier
* max_viterbi_list_size
* word_to_debug_lengths
* fragments_debug
* tessedit_redo_xheight
* debug_acceptable_wds
* tessedit_matcher_log
* tessedit_test_adaption_mode
* docqual_excuse_outline_errs
* crunch_pot_garbage
* suspect_space_level
* tessedit_consistent_reps
* wordrec_display_all_words
* wordrec_no_block
* wordrec_worst_state
* fragments_guide_chopper
* segment_adjust_debug
* classify_adapt_feature_thresh (classify_adapt_feature_threshold still exists)
* classify_adapt_proto_thresh (classify_adapt_proto_threshold still exists)
* classify_min_norm_scale_x
* classify_max_norm_scale_x
* classify_min_norm_scale_y
* classify_max_norm_scale_y
* il1_adaption_test
* textord_blob_size_bigile
* textord_blob_size_smallile
* editor_debug_config_file
* textord_tabfind_show_color_fit
The list was generated by a python script and each parameter occurence checked
manually.
2019-10-03 09:18:29 +02:00
Egor Pugin
8095e6c1c3
Merge pull request #2685 from stweil/lstm.train
...
Don't create OCR result files when training data is created
2019-10-02 22:00:41 +03:00
Stefan Weil
1e84a6f225
Don't create OCR result files when training data is created
...
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.
In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-02 19:29:27 +02:00
Egor Pugin
445d06375d
Merge pull request #2134 from stweil/curl
...
RFC: Add support for image or image list by URL
2019-10-01 16:29:43 +03:00
Stefan Weil
94651e65ce
Simplify configure.ac
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:32:08 +02:00
Stefan Weil
286d8275c7
Add support for image or image list by URL
...
This allows OCR of images from the internet without downloading them first:
tesseract http://IMAGE_URL OUTPUT ...
It uses libcurl.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-10-01 12:10:45 +02:00
Egor Pugin
da0fa73e77
Merge pull request #2678 from stweil/warnings
...
Fix some clang compiler warnings
2019-10-01 12:50:55 +03:00
Stefan Weil
47d70d7014
Modernize code for LIST (fix some -Wold-style-cast warnings)
...
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)
This fixes lots of compiler warnings caused by old type casts.
2019-10-01 11:12:00 +02:00
Stefan Weil
672d67859f
mfoutline: Modernize code
...
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment
This fixes several warnings from clang.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 21:33:15 +02:00
Stefan Weil
7ec5f0ca02
intmatcher: Avoid conversion from double to float and vice versa
...
This fixes some clang warnings:
src/classify/intmatcher.cpp:48:49: warning:
implicit conversion loses floating-point precision:
'double' to 'const float' [-Wimplicit-float-conversion]
src/classify/intmatcher.cpp:405:34: warning:
implicit conversion loses floating-point precision:
'double' to 'float' [-Wimplicit-float-conversion]
src/classify/intmatcher.cpp:405:64: warning:
implicit conversion increases floating-point precision:
'float' to 'double' [-Wdouble-promotion]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-30 18:05:26 +02:00
Stefan Weil
6d259ebe44
Remove unneeded compare statement (-Wtautological-unsigned-enum-zero-compare)
...
This fixes a clang warning:
src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
unsigned enum expression >= 0 is always true
[-Wtautological-unsigned-enum-zero-compare]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-29 22:13:27 +02:00
Stefan Weil
b3b740eb22
cmake: Set default build type to Release
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-29 13:53:44 +02:00
zdenop
021f4d553b
Merge branch 'master' of https://github.com/tesseract-ocr/tesseract
2019-09-28 14:38:36 +02:00
zdenop
e8e77957ae
cmake: AUTO_OPTIMIZE: enable to turn-off auto optimize macros
2019-09-28 14:37:05 +02:00
zdenop
573dc31adb
cmake: arch files: fix duplicate build and follow autotools logic
2019-09-28 14:35:44 +02:00
Stefan Weil
49e351508c
Re-add strngs.h to public API
...
It is still needed.
This partially reverts commit a730b5c4ff
.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 10:34:48 +02:00
Stefan Weil
8ad86d6494
Add missing linker flags for TensorFlow
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-28 09:42:37 +02:00
zdenop
8a62d49914
cmake: auto optimize macros must be run before setting CMAKE_CXX_FLAGS*
2019-09-28 08:27:43 +02:00
zdenop
21680fa75b
cmake: fix build type flags
2019-09-27 23:56:36 +02:00
zdenop
d6aa866430
ignore #pragma optimize for clang-cl
2019-09-27 21:19:37 +02:00
zdenop
b1f7047a5f
cmake: remove moved (training) header from installation
2019-09-27 21:08:17 +02:00
Egor Pugin
52cf4615dc
Update sw build.
2019-09-26 00:34:36 +03:00
Egor Pugin
9217aa5c95
Update sw build.
2019-09-26 00:22:07 +03:00
Egor Pugin
ac0190bfaa
Merge pull request #2677 from stweil/vecfuncs
...
Remove vecfuncs.cpp and vecfunc.h
2019-09-25 23:33:01 +03:00
Stefan Weil
74d5ce82a6
Remove vecfuncs.cpp and vecfunc.h
...
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 21:20:03 +02:00
Stefan Weil
eec9c96767
Remove member functions STRING::string and StringParam::string (continued)
...
Commit 994ec697d8
did not update unittest.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 10:07:51 +02:00
Stefan Weil
7bddad59d1
Optimize class ChoiceIterator
...
Re-order a class variable to avoid memory holes and
remove unused class variables.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-25 09:43:57 +02:00
zdenop
b5c1fcc9bf
Merge pull request #2673 from noahmetzger/LSTMChoiceRIL
...
Fixed minor bug in ChoiceIterator when lstm_choice_mode isn't active.
2019-09-24 15:48:27 +02:00
Noah Metzger
ff4c1d204d
Fixed minor bug with the Choice iterator when lstm_choice_mode is not active.
...
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>
2019-09-24 15:38:28 +02:00
Egor Pugin
cb0c024a6f
Merge pull request #2672 from stweil/api
...
Remove member functions STRING::string and StringParam::string
2019-09-24 01:31:18 +03:00
Stefan Weil
994ec697d8
Remove member functions STRING::string and StringParam::string
...
They were redundant because there exist member functions 'c_str' which do the same.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-23 08:33:08 +02:00
Egor Pugin
1fa7324cf7
Merge pull request #2668 from stweil/api
...
Remove STRING from the public Tesseract API
2019-09-23 01:02:26 +03:00
amitdo
0598879a00
Disable legacy build: Disable bitvec.h
2019-09-22 20:37:13 +02:00
Stefan Weil
a730b5c4ff
Remove STRING from the public Tesseract API
...
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
Stefan Weil
8cb677d6a2
Replace STRING arguments for LoadDataFromFile and SaveDataToFile
...
This is a step to eliminate the proprietary STRING data type
from the public Tesseract API.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-22 20:32:28 +02:00
zdenop
b99d6e8c5b
Merge pull request #2666 from amitdo/legacy-clean
...
Disable legacy build: Disable more unneeded code
2019-09-22 20:06:44 +02:00
amitdo
1e13d1d4d5
Disable legacy build: Disable more unneeded code
2019-09-22 20:55:24 +03:00
zdenop
39a63c2837
Merge pull request #2663 from bertsky/fix-lstm-user-patterns
...
fix langdata (user words/patterns) file suffixes for LSTMs:
2019-09-20 15:32:54 +02:00
Stefan Weil
0c7cc5a4dd
Fix CID 1405673 part 2 (Uninitialized members)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-19 19:37:05 +02:00
zdenop
86fe8bd39f
Disabled legacy build: Disable more unneeded code ( #2662 )
...
Disabled legacy build: Disable more unneeded code
2019-09-19 19:35:20 +02:00
Robert Schubert
5b976bfb55
fix langdata (user words/patterns) file suffixes for LSTMs:
...
- add another constructor for LSTMRecognizer
which takes the language_data_path_prefix configured/selected
at runtime and passes it to the internal CCUtil
- use this in Tesseract::init_tesseract_lang_data when LSTMs
are available
(this was missing from 297d7d86ce
)
2019-09-19 19:30:54 +02:00
amitdo
479a7b1ca0
Disabled legacy build: Disable more unneeded code
2019-09-19 19:00:13 +03:00
Stefan Weil
3b030b4aeb
Fix CID 1405673 (Uninitialized members)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-17 22:04:08 +02:00
Stefan Weil
85e8529a2e
Fix CID 1164624 (Uninitialized members)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-17 21:59:42 +02:00
Stefan Weil
b2999d8190
Fix comment for Textord::make_prop_words
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 15:03:45 +02:00
Stefan Weil
256701e2e0
Re-order initialisation in constructor of class ViterbiStateEntry
...
This fixes compiler warnings caused by
commit 091ce345f6
:
src/wordrec/lm_state.h💯 7: warning: field 'cost'
will be initialized after field 'curr_b' [-Wreorder]
src/wordrec/lm_state.h:104:7: warning: field 'top_choice_flags'
will be initialized after field 'dawg_info' [-Wreorder]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-09-16 14:33:32 +02:00