clang warnings:
src/ccstruct/pageres.cpp:903:20: warning:
implicit conversion from 'int' to 'float' changes value from
2147483647 to 2147483648 [-Wimplicit-int-float-conversion]
src/ccstruct/pageres.cpp:904:23:
warning: implicit conversion from 'int' to 'float' changes value from
-2147483647 to -2147483648 [-Wimplicit-int-float-conversion]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
powerpc64le-linux-gnu-g++ warning:
src/training/mftraining.cpp:209:5: warning:
‘%04d’ directive output may be truncated writing between 4 and 10 bytes
into a region of size 8 [-Wformat-truncation=]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Those files are C++, and the wrong modeline is not needed at all.
Remove also some empty descriptions and old history in the comments.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- Remove unused type definitions for TessTextRenderer, ... in capi.h
(they were only used in capi.cpp which now no longer needs them)
- Fix typo in comment
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- Replace AVX_OPT, AVX2_OPT, FMA_OPT, SSE41_OPT
- Replace AVX, AVX2, FMA, SSE4_1
- Write new HAVE_AVX, HAVE_AVX2, HAVE_FMA, HAVE_SSE4_1 into config_auto.h
- Put related conditionals in Makefile.am in one place
This makes the code clearer and fixes a log message in
IntSimdMatrixTest.AVX2.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
`tesseract --version` now also shows the version of libcurl and related
libraries if it was build with libcurl.
The preprocessor macro HAVE_LIBCURL is now defined in config_auto.h.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Commit 94d0f77f56 tried to fix issue #2741
but created a new problem.
This commit should fix both old and new issue.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
If Tesseract cannot find text in the input image, it should not write
an empty lstmf file. This problem was reported in issue #2741.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
option --ptsize which defaults to 12. This option is not exposed through
tesstrain.sh; thus, you cannot use tesstrain.sh to explore training with
different font sizes. I made a small modification to expose the --ptsize
option to tesstrain.sh. It defaults to 12 if not specified.
Fix two occurrences of this LGTM warning:
Multiplication result may overflow 'double'
before it is converted to 'long double'.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes wrong output of integers with locale de_DE.UTF-8:
- /Width 2.481
- /Height 3.508
+ /Width 2481
+ /Height 3508
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes wrong output of integers with locale de_DE.UTF-8:
- <Page WIDTH="2.481" HEIGHT="3.508" PHYSICAL_IMG_NR="0" ID="page_0">
+ <Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The title can be set for hOCR and PDF output.
Currently it is also used for ALTO, so setting the title can be used
as a workaround for issue #2700.
The constant unknown_title_ is no longer needed and therefore removed.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The function derives the file name for the .box file from an image name.
For training from existing line images, it is useful to directly support
the image names which are commonly used.
While generated images for Tesseract training typically use the name
pattern NAME.tif, other ground truth sets use NAME.bin.png for binarized
or NAME.nrm.png for grayscale images.
BoxFileName is also now a local function as it is only used locally.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
The configuration file lstm.train causes Tesseract to generate
training data for training of an LSTM line recognizer.
In this mode, no other files with OCR results should be written.
Without this patch, Tesseract writes a small text file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This allows OCR of images from the internet without downloading them first:
tesseract http://IMAGE_URL OUTPUT ...
It uses libcurl.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- Use C++ type casts
- Remove unneeded type cast
- Simplify code for function pop
- Remove macro push_on (it was only used once)
This fixes lots of compiler warnings caused by old type casts.
- Use C++ enums
- Use strongly typed C++11 enum for DIRECTION and optimize struct MFEDGEPT
- Use float constant for MF_SCALE_FACTOR
- Replace macros by inline functions
- Fix documentation comment
This fixes several warnings from clang.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a clang warning:
src/ccstruct/polyblk.cpp:412:12: warning: result of comparison of
unsigned enum expression >= 0 is always true
[-Wtautological-unsigned-enum-zero-compare]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Replace the macros which were declared in vecfuncs.h by member functions
and move a function which was only used in chop.cpp to that file.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Removing STRING from genericvector.h allows eliminating the proprietary
STRING data type from the public Tesseract API.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
- add another constructor for LSTMRecognizer
which takes the language_data_path_prefix configured/selected
at runtime and passes it to the internal CCUtil
- use this in Tesseract::init_tesseract_lang_data when LSTMs
are available
(this was missing from 297d7d86ce)