Commit Graph

963 Commits

Author SHA1 Message Date
zdenop
f15e2cc174 fix typo 2019-11-01 14:00:22 +01:00
Stefan Weil
7e980df016 simd: Check whether the OS supports FMA, AVX, ...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 14:00:00 +01:00
Stefan Weil
e413b9318b classify/Makefile: Fix inconsistent style
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:59:33 +01:00
Egor Pugin
55b4099ad1 Export some classify vars. 2019-11-01 13:59:14 +01:00
zdenop
0d8be252cc Remove more code for builds with disabled legacy engine
Now the Tesseract library no longer includes unused code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

# Conflicts:
#	src/cutil/Makefile.am
#	unittest/Makefile.am
2019-11-01 13:58:37 +01:00
zdenop
c9ecab8854 Move source files which are used for training only to src/training 2019-11-01 13:50:26 +01:00
Stefan Weil
b80acd81ba OpenCL: Add static attribute for kernel_src
It is only used in openclwrapper.cpp.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:36:22 +01:00
Stefan Weil
14665dfa2c Remove unused functions create_edges_window, draw_raw_edge
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:36:15 +01:00
Stefan Weil
91f0de94bc Remove unused function truncate_path and related files
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:36:07 +01:00
Stefan Weil
c3d4742af6 Remove global array kPolyBlockNames from Tesseract library
It is only used in unittest/layout_test.cc after moving a test from
baseapi_test.cc to that file, so it can be made local.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:35:55 +01:00
Stefan Weil
92b460010e cmake: Don't link pthread on Windows
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 13:00:03 +01:00
Stefan Weil
5d2265478f universalambigs: Add hack to fix builds with Microsoft compiler
The MS compiler only accepts string constants up to 65535 characters,
so shorten the string for that compiler to fix the compilation.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:59:44 +01:00
Zdenko Podobný
9dd392d8b2 move fileio.cpp and fileio.h to training (this fix android build) 2019-11-01 12:59:31 +01:00
Stefan Weil
ea34763fea universalambigs: Replace octal characters by UTF-8 string
This improves readability and reduces the file size.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:59:20 +01:00
Stefan Weil
a473283482 Clean ambigs.h
* Remove unused kUnigramAmbigsBufferSize and kAmbigNgramSeparator
* Move some declarations to ambigs.cpp

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:59:12 +01:00
Egor Pugin
8ebcea2926 Use pangocairo-1.43 for the moment. Remove private pango header. 2019-11-01 12:59:04 +01:00
Egor Pugin
49ce908e4b Try to fix #2599 2019-11-01 12:58:57 +01:00
Stefan Weil
7fcad19286 cmake: Add missing pthread library
It is needed for C++ threads since commit 85068be405.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:58:42 +01:00
Stefan Weil
b21779d699 Improve formatting of hOCR output with character boxes
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:55:49 +01:00
Stefan Weil
d338681758 Use auto data type for results of std::ftell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:44 +01:00
Stefan Weil
47c8710ac2 Remove unused filesize_ from class InputBuffer
This also simplifies the constructors.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:36 +01:00
Stefan Weil
e34acfeb46 Simplify shell code (fixes warning from Codacy)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:28 +01:00
Stefan Weil
8baf817192 Use long instead of off_t for result from ftell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:21 +01:00
Stefan Weil
055f32d422 Fix training script for macOS (issue #2578)
Bash on macOS does not support "|&":

    tesstrain_utils.sh: line 80: syntax error near unexpected token `&'

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:14 +01:00
Stefan Weil
a469224ec1 Fix some compiler warnings (unused local variables)
gcc warnings:

    src/classify/protos.cpp:85:7: warning: unused variable ‘i’ [-Wunused-variable]
    src/classify/protos.cpp:86:7: warning: unused variable ‘Bit’ [-Wunused-variable]
    src/classify/protos.cpp:89:14: warning: unused variable ‘Config’ [-Wunused-variable]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:53:06 +01:00
zdenop
5775cf0535 Implemented improved bounding box algorithm
Signed-off-by: Noah Metzger <noah.metzger@bib.uni-mannheim.de>

# Conflicts:
#	src/lstm/recodebeam.cpp
2019-11-01 12:52:47 +01:00
Stefan Weil
25b1a4b951 classify: Use fixed size bit vector
The vector was already limited to MAX_NUM_PROTOS (512) entries or 64 bytes
in the old code. Now it uses that size right from the start which avoids
reallocating it later when entries are added.

The old code which reallocated the vector to expand it was buggy because
the realloc function can return a different pointer, but the code still
used the original pointer to reset the new bits.

Function ExpandBitVector is now unused and therefore removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:46:44 +01:00
Robert Pösel
c01d230c10 Give word's bounds to callback also during second pass 2019-11-01 12:46:37 +01:00
Stefan Weil
59659ddc6e Remove structures.*
It only provided the functions new_cell, free_cell which could be replaced by new, delete.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:46:02 +01:00
Stefan Weil
40b69539ff Remove unused functions reverse16, reverse32
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:44:29 +01:00
Stefan Weil
ae6eddcc12 Remove non portable sleep by std::this_thread::sleep_for
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:44:22 +01:00
Stefan Weil
25a6fe7ba9 arch: Reduce number of include files for dot product functions
dotproductavx.h and dotproductsse.h declared only two functions.
Move those declarations to dotproduct.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:29:51 +01:00
Stefan Weil
2e1cd1d448 Add dot product implementation for Intel FMA (double = tessdata_best)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:29:39 +01:00
Stefan Weil
ba8e870f85 Optimize tprintf implementation
It no longer uses a local buffer, so it needs less memory
and no mutex.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:28:19 +01:00
Stefan Weil
75a9926f01 FPRow: Add missing initialisation for scalar (CID 1402754)
Modernize the code also a little bit.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:28:11 +01:00
Stefan Weil
cad3433dc8 Fix format strings for size_t arguments (CID 1402762, 1402767)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:28:03 +01:00
Stefan Weil
c2839ecfd6 Fix format string for 64 bit integer (CID 1402986)
Commit c1264c189e was not the right fix.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:26:28 +01:00
Stefan Weil
595e263ceb tfnetwork: Add missing return statement (CID 1402992)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-11-01 12:26:21 +01:00
Egor Pugin
3afc185ad4 Implement CMake+SW build.
Currently only Windows is supported.
You could try it as following:

    mkdir build_sw && cd build_sw && cmake .. -DSW_BUILD=1
2019-11-01 12:26:09 +01:00
zhuangzhuang1988
4b4e1f1e8d fix tesstrain.py error 2019-11-01 12:25:57 +01:00
zhuangzhuang
b8014ee1c1 fix windows stdout messy code (#2546)
* fix windows stdout messy code

* fix type name error

* remoe unnecessary  codepoint check.
2019-11-01 12:25:48 +01:00
Stefan Weil
22fb70cb85 Fix handling of single pages from multipage TIFF files (issue #2537)
That case now uses Leptonica to deliver the desired image instead of
using an inefficient loop in the Tesseract code.

See commit 54fafc4e2e which used similar
code in the past.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-06 10:00:46 +02:00
Stefan Weil
08ca7b8416 Fix linker error with disabled legacy engine (issue #2532)
Commit 3871caae86 introduced a build
regression when the legacy engine was disabled.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-06 10:00:46 +02:00
Stefan Weil
e53e10503a genericvector: Remove redundant declarations
tesseract::FileReader and tesseract::FileWriter are already declared
in serialis.h which is included by genericvector.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-06 09:53:00 +02:00
Stefan Weil
f4698154b3 Revert "Replace callback by direct function calls in TessBaseAPI::GetComponentImages"
This reverts commit 1a44ce3178.
It removed global symbols, so the binary API was incompatible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-06 07:54:15 +02:00
Stefan Weil
792b39d5c8 Revert "Move LSTMTrainer from libtesseract to libtesseract_training"
This reverts commit a30d433356.

That commit removed LSTMTrainer also from libtesseract.so which breaks
the ABI compatibility.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-06 07:41:22 +02:00
Dmitry Bely
c310fef8f0 Fix crash in Tesseract::classify_word_and_language() when tessedit_timing_debug is enabled 2019-07-05 10:00:48 +03:00
Stefan Weil
d8494f3215 Revert "Simplify indirect call of LMPainPoints::GeneratePainPoint"
This reverts commit 6a0fc4f89f.
It removed global symbols, so the binary API was incompatible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-02 06:39:53 +02:00
Stefan Weil
1d5a320d4a Revert "Simplify class LSTMTrainer"
This reverts commit 563a1717d4.
It removed global symbols, so the binary API was incompatible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-07-02 06:38:19 +02:00
Stefan Weil
4535e4605b Update enum from unicode/uchar.h
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-06-25 14:55:03 +02:00