Commit Graph

192 Commits

Author SHA1 Message Date
Stefan Weil
023e1b340e Use POSIX data types and macros (#878)
* api: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* cutil: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* training: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract data types by POSIX data types

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract data types by POSIX data types

Now all Tesseract data types which are no longer needed can be removed
from ccutil/host.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccmain: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccstruct: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* classify: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* dict: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* lstm: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* textord: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* wordrec: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* ccutil: Replace Tesseract's MIN_*INT, MAX_*INT* by POSIX *INT*_MIN, *INT*_MAX

Remove the macros which are now unused from ccutil/host.h.
Remove also the obsolete history comments.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix build error caused by ambiguous ClipToRange

Error message vom Appveyor CI:

    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2672: 'ClipToRange': no matching overloaded function found [C:\projects\tesseract\build\libtesseract.vcxproj]
    C:\projects\tesseract\ccstruct\coutln.cpp(818): error C2782: 'T ClipToRange(const T &,const T &,const T &)': template parameter 'T' is ambiguous [C:\projects\tesseract\build\libtesseract.vcxproj]
      c:\projects\tesseract\ccutil\helpers.h(122): note: see declaration of 'ClipToRange'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: could be 'char'
      C:\projects\tesseract\ccstruct\coutln.cpp(818): note: or       'int'

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* unittest: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* arch: Replace Tesseract's MAX_INT8 by POSIX INT8_MAX

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-13 21:36:30 +01:00
Stefan Weil
7972b13e3a Remove macro USE_STD_NAMESPACE (#1360)
The related code in training/util.h now uses the GOOGLE_TESSERACT macro
to enable Google specific code to disable heap checking.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-04 14:43:28 +01:00
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Zdenko Podobný
173ad2bd00 mark& block legacy OCR Engine untill it will be removed. 2018-02-18 19:31:09 +01:00
Stefan Weil
c9169e5ac6 Remove unused cube OCR engine modes (#1281)
Since commit cdc35338c5 Tesseract checks
the value passed for `--oem NUM`.

That only works as expected when the old (now unused) engine mode values
for cube are removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-01-20 09:16:39 +01:00
Stefan Weil
c4d8f27019 Fix compiler warning (-Wchar-subscript) (#1259)
ccstruct/seam.cpp:66:26: warning:
 array subscript has type 'char' [-Wchar-subscripts]

Fix it by using an unsigned index and use the same type for related values.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-01-08 21:26:25 +01:00
Josh Reid
cdc35338c5 Added check if input PSM value is outside of range (#1236)
Wrote a function to throw an error if PSM is outside 0-13 or OEM is outside 0-5.
fixes #1234
2017-12-14 11:37:44 +01:00
zdenop
e62e8f5f80 Merge pull request #1109 from mingodad/mingodad-fix-interword-spaces
Fix to preserve_interword_spaces work again
2017-09-15 08:45:21 +02:00
amitdo
a905548ed6 Autotools build: Remove the option 'USING_MULTIPLELIBS'
Libtool's convenience libraries should never be installed. Fixes #985.
2017-09-11 15:03:53 +03:00
Ray Smith
fc6a390c6c Added intsimdmatrix as a generic integer matrixdotvector function with AVX2 and SSE specializations 2017-09-08 15:06:19 +01:00
Ray Smith
ad74e8a69c Fixed integer overflow error 2017-09-08 12:46:48 +01:00
Ray Smith
a912967cc3 Rewrote unicharset_extractor to use the new string normalizer and read plain text as well as box files. 2017-09-08 11:49:57 +01:00
Ray Smith
a18620cfea Improved results on images with no resolution. Estimates resolution
from the size of the connected components, based on average text size.
2017-09-08 09:37:03 +01:00
Domingo Alvarez Duarte
ea557a3e66 Fix to preserve_interword_spaces work again 2017-09-06 11:18:42 +02:00
Ray Smith
4e9665debf Added ADAM optimizer, unless git screwed it up, cos there is no diff 2017-08-02 14:03:50 -07:00
Ray Smith
b0ead95d64 Changed the way unicharsets are handled to allow support for the ™ character. Can find the issue where it was requested. 2017-07-24 11:45:57 -07:00
Ray Smith
4efc539f51 clang tidy on previous pull 2017-07-19 17:04:49 -07:00
Stefan Weil
ba95a686aa Use lept_free to free memory allocated by Leptonica
This fixes problems on Windows when Tesseract and Leptonica use different
C runtime libraries.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-16 08:34:18 +02:00
Stefan Weil
9929587f36 Remove extra semicolons
This fixes these compiler warnings:

    ccmain/equationdetect.cpp:1519:2: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.cpp:65:17: warning: extra ‘;’ [-Wpedantic]
    ccstruct/blobs.h:178:18: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:36:22: warning: extra ‘;’ [-Wpedantic]
    ccstruct/ratngs.cpp:37:22: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.cpp:46:20: warning: extra ‘;’ [-Wpedantic]
    ccutil/ambigs.h:137:21: warning: extra ‘;’ [-Wpedantic]
    cutil/structures.cpp:36:45: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.cpp:65:2: warning: extra ‘;’ [-Wpedantic]
    textord/equationdetectbase.h:57:2: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.cpp:25:28: warning: extra ‘;’ [-Wpedantic]
    wordrec/lm_state.h:190:29: warning: extra ‘;’ [-Wpedantic]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 12:40:34 +02:00
Stefan Weil
f527715933 Fix type of bit values (fixes compiler warning)
A single bit cannot represent a signed integer value, so it must be an
unsigned integer. This fixes a compiler warning:

    In file included from ../ccutil/clst.h:24:0,
                     from ../ccstruct/blobbox.h:23,
                     from workingpartset.h:24,
                     from workingpartset.cpp:21:
    ../ccstruct/blobbox.h: In member function ‘void BLOBNBOX::set_reduced_box(TBOX)’:
    ../ccutil/host.h:79:25: warning: overflow in conversion from ‘int’ to ‘signed char:1’ changes value from ‘1’ to ‘-1’ [-Woverflow]
     #define TRUE            1
                             ^
    ../ccstruct/blobbox.h:236:17: note: in expansion of macro ‘TRUE’
           reduced = TRUE;
                     ^~~~

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-07-15 09:30:31 +02:00
Ray Smith
aee910a7bf Fixed build broken by previous commits that added use of string in low-level code 2017-07-14 10:33:55 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Stefan Weil
3a67ff930e Optimize code by replacing init_to_size with resize_no_init
There is no need to initialize memory with a fixed value which is
overwritten in the next step.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 14:34:55 +02:00
Stefan Weil
8c75d26657 Remove unneeded type casts when using Leptonica macro GET_DATA_BYTE
The first parameter is casted to an unsigned byte by Leptonica,
so we don't need additional type casts in Tesseract code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 20:08:27 +02:00
Stefan Weil
9abbc4c6f3 ccstruct: Fix wrong format string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
Stefan Weil
ef1d9600b1 Use standard macros for format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-11 19:32:51 +02:00
zdenop
64994a2707 Merge pull request #900 from rfschtkt/cast
Reviewed uses of reinterpret_cast
2017-05-11 16:08:12 +02:00
Raf Schietekat
190584fec7 RAII: PB_LINE_IT::get_line(): was leaked inside POLY_BLOCK::fill() 2017-05-11 02:02:37 +02:00
Raf Schietekat
3983d2f76a Reviewed uses of reinterpret_cast 2017-05-11 01:58:40 +02:00
Ray Smith
8e79297dce Final part of endian improvement. Adds big-endian support to lstm and fixes issue 518 2017-05-03 16:09:44 -07:00
Stefan Weil
85afda65f5 blobs: Remove unused macro
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-01 17:12:43 +02:00
Stefan Weil
bb75793539 ccstruct: Remove unneeded NULL checks
It's also not necessary to nullify class variables in the destructor.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 19:23:24 +02:00
Stefan Weil
f8fba59804 Replace alloc_struct, free_struct
Both functions simply call malloc, free.

Remove also unneeded null pointer checks and use calloc where possible.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-30 09:25:04 +02:00
Ray Smith
7a116ce8bb More formatting fixes from clang tidy 2017-04-28 13:38:32 -07:00
Ray Smith
1cc511188d Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here. 2017-04-27 15:48:23 -07:00
Stefan Weil
b5abfb53d7 ccstruct: Fix Leptonica data type
L_Bmf works for C++ code, but the common form is L_BMF, so use that.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-04-27 07:51:50 +02:00
Jim Regan
7231e67682 typo in comment 2017-04-14 15:54:10 +01:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Ray Smith
a1c22fb0d0 Fixed issue #557 2017-01-25 16:05:59 -08:00
Stefan Weil
134a253758 ccstruct: Remove unused local variable
This fixes a compiler warning.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-01-21 12:32:06 +01:00
Stefan Weil
23a7330c85 Fix page range in log message
The internal range is 0...(n-1), but for users a page range 1...n is
more natural. Showing a range 0...n is wrong because it would imply
n+1 pages.

Change printed text from

    Loaded 72/72 pages (0-72) of document ...

to

    Loaded 72/72 pages (1-72) of document ...

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-15 11:05:22 +01:00
zdenop
da4c064c2e Merge pull request #531 from stweil/guards
Fix header file guards and replace reserved identifiers
2016-12-15 08:29:32 +01:00
Ray Smith
9f5ba9105f Removed dependency on cube from the code 2016-12-14 10:55:15 -08:00
Stefan Weil
a87d073bae Fix two typos in comments
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-11 09:20:17 +01:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
4897796d57 Replace reserved identifiers used in #define guards header files
Use macro names as suggested by the Google C++ Style Guide
(https://google.github.io/styleguide/cppguide.html#The__define_Guard).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 15:43:03 +01:00
Stefan Weil
cefc420ddb Remove extra semicolons after member function definitions
clang++ report:
api/baseapi.h:852:4: warning:
 extra ';' after member function definition [-Wextra-semi]
[...]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-04 14:54:52 +01:00
Stefan Weil
a389b7c8e9 opencl: Remove unneeded and potentially bad type casts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-02 00:02:35 +01:00
Ray Smith
ce76d1c569 Fixes to training process to allow incremental training from a recognition model 2016-11-30 15:51:17 -08:00