Commit Graph

2227 Commits

Author SHA1 Message Date
Shreeshrii
86275c2187
Clarify message to indicate additional LSTM training required for 4.0.0 2018-04-22 22:26:07 +05:30
Shreeshrii
0f3d33f699
Change max_pages to zero
Fixes https://github.com/tesseract-ocr/tesseract/issues/1149 and https://github.com/tesseract-ocr/tesseract/issues/1508
2018-04-22 22:19:06 +05:30
Stefan Weil
068d43d3d8 Remove old code for string class (no longer needed) (#1354)
* Remove old code for string class (no longer needed)

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add std namespace to string class

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:36:28 +01:00
Stefan Weil
9035217acd Remove parameter m_data_sub_dir (#1356)
This further simplifies the finding of the tessdata directory.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-03 14:34:24 +01:00
Hintz
cf66bc84c8 Fix syntax error. (#1355) 2018-03-03 14:33:39 +01:00
Stefan Weil
b9b08c7e50 Replace log2(n) by local functions (#1353)
* Replace log2(n) by faster local function

This also adds support for environments without a log2 function (Android).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Provide local log2 function on platforms without log2 function

The existing implementation in wordrec/language_model.cpp is modified
to use a local inline function in the tesseract namespace and copied
to lstm/weightmatrix.cpp, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-03-02 06:54:43 +01:00
Jeroen Ooms
c6e8916065 fixes for C++11 (#1164) 2018-02-28 15:42:33 +01:00
fifothekid
ad6f3b412a Fixed unqualified class "string" (#1082) 2018-02-28 15:16:23 +01:00
Shreeshrii
40f43111e0 Add list of scripts to manpage for tesseract (#1347) 2018-02-24 09:37:25 +01:00
Shreeshrii
bb89dc3594 Add info regarding LSTM components and options (#1346) 2018-02-23 21:59:50 +01:00
zdenop
44588a3c7c
add commas to language list 2018-02-23 11:27:55 +01:00
Zdenko Podobný
035325dfd0 Update language list based on tessdata_fast; fix #1343 2018-02-23 11:19:18 +01:00
Egor Pugin
6f80c35b3f
Update appveyor.yml 2018-02-22 23:41:44 +03:00
Jim Regan
0d1365a5ee gle_uncial (#1342) 2018-02-22 17:20:31 +01:00
Egor Pugin
ce638c4b35
Update appveyor.yml 2018-02-21 21:11:24 +03:00
Stefan Weil
9f888f044a Fix typo in documentation (#1330)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:47 +01:00
Stefan Weil
8130b8d346 Fix some typos in comments (found by codespell) (#1331)
Fix also a grammar issue.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:39:07 +01:00
Stefan Weil
638b025884 Fix CID 1164569 (Dereference after null check) (#1332)
If equ_detect_ can be NULL, we must catch that case and show a warning
instead of crashing in method SetEquationDetect.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:38:38 +01:00
Stefan Weil
eb8a6a5cf2 Fix CID 1164570 (Dereference after null check) (#1333)
Show a warning if datapath_ is NULL instead of crashing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-21 08:37:53 +01:00
Egor Pugin
6a58b2e682
Remove whitespace. 2018-02-21 08:08:55 +03:00
Amit D
766b7bd620 Don't drop words with low certainty (#1264)
Fix #681.
2018-02-20 17:19:10 +01:00
Stefan Weil
af6994efd9 Don't try alternate path for tessdata (#1328)
This simplifies the code and the user interface.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:18:14 +01:00
Stefan Weil
2ca7d9451a Remove files generated by libtool (#1329)
It looks like those files were added accidentally
in commit fc6a390c6c.
Add them to .gitignore to avoid that from now on.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-20 17:17:49 +01:00
Stefan Weil
a50ff5277d Improve help text for OCR engine mode (#1326)
The new text was suggested by Amit Dovev, see
https://github.com/tesseract-ocr/tesseract/pull/1325#issuecomment-366613553.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 09:43:58 +01:00
Stefan Weil
349de8b739 Support different help texts for normal and advanced users and restore legacy mode (#1325)
* Restore support for the legacy engine

It is still needed to get text attributes which are unsupported by the
LSTM engine, and it also has better recognition rates for some texts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* tesseractmain: Add missing 'static' attributes

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Support different help texts for normal and advanced users

The old option --help now shows a very basic help text.
The new option --help-extra shows the full help information.
It now also includes a hint that Tesseract supports lists of images.

Fix also the indentation in the PSM help and
use a more neutral text in the OEM help.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Add missing line feed in error message

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-19 07:30:38 +01:00
Zdenko Podobný
173ad2bd00 mark& block legacy OCR Engine untill it will be removed. 2018-02-18 19:31:09 +01:00
Stefan Weil
01f9a7f3c2 Clean use of double / float (#1323)
The variable 'diff' gets a double value and is compared with a double value,
so it should be double, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:34:04 +01:00
Stefan Weil
43f34f5c3e Clean Makefile.am (#1322)
Replace the doc-dummy hack with .PHONY.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:25:31 +01:00
Stefan Weil
20b3ff8796 Fix some minor issues reported by Coverity Scan (#1321)
* Dereference pointer after NULL check (CID 1385638)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Dereference pointer after NULL check (CID 1385635)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Dereference pointer after NULL check (CID 1385634)

Move the statement which dereferences the pointer variable "current"
after the NULL check.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

* Fix CID 1164527 'Constant' variable guards dead code

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-02-18 15:22:59 +01:00
Egor Pugin
ce7ee87fa4
Merge pull request #1290 from amitdo/patch-1
Update CMakeLists.txt
2018-01-24 21:31:11 +03:00
Amit D
d377281f73
Update CMakeLists.txt 2018-01-24 19:21:37 +02:00
Amit D
82ba423537
Update CMakeLists.txt 2018-01-24 19:07:17 +02:00
Egor Pugin
2da95d63bc Add more avx2, sse4.1 flags. Add MSVC's AVX2 ICE workaround. 2018-01-24 18:45:15 +03:00
Egor Pugin
4b6fefb2ac Add openmp support for Visual Studio builds. 2018-01-23 21:57:52 +03:00
Stefan Weil
c9169e5ac6 Remove unused cube OCR engine modes (#1281)
Since commit cdc35338c5 Tesseract checks
the value passed for `--oem NUM`.

That only works as expected when the old (now unused) engine mode values
for cube are removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-01-20 09:16:39 +01:00
Stefan Weil
10a8a67ca2 Remove execute permission from config file (#1263)
This fixes the only configuration file which had such permissions.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-01-10 16:43:02 +01:00
Stefan Weil
c4d8f27019 Fix compiler warning (-Wchar-subscript) (#1259)
ccstruct/seam.cpp:66:26: warning:
 array subscript has type 'char' [-Wchar-subscripts]

Fix it by using an unsigned index and use the same type for related values.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-01-08 21:26:25 +01:00
Egor Pugin
000d027a9f
Rename tesseract library. 2018-01-05 18:51:35 +03:00
Amit D
bc668da042 Update README.md (#1239) 2017-12-20 08:14:18 +01:00
Josh Reid
cdc35338c5 Added check if input PSM value is outside of range (#1236)
Wrote a function to throw an error if PSM is outside 0-13 or OEM is outside 0-5.
fixes #1234
2017-12-14 11:37:44 +01:00
Egor Pugin
eba0ae3b88
Merge pull request #1218 from hsen-dev/master
fixed missing include for std::back_inserter.
2017-11-24 17:31:27 +03:00
Ria
d751305804
fixed missing include for std::back_inserter.
with Visual Studio 2015 RTM:

Error C2039: 'back_inserter': is not a member of 'std'
Error C3861: 'back_inserter': identifier not found

need "iterator" with Visual Studio 2015 (vc14).

#include <iterator>
2017-11-23 11:37:35 +03:30
Stefan Weil
ebbfc3ae8d Improve robustness of function LoadDataFromFile (#1207)
ftell returns a long value which can be negative when an error occurred.
It returns LONG_MAX for directories.

Both cases were not handled by the old code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-11-10 15:46:38 +01:00
Stefan Weil
f3c4b894dc Fix help message for unicharset_extractor (#1206)
If unicharset_extractor was called without any argument,
a help message was printed by tesseract::ParseCommandLineFlags.

Replace that by the local help message which is better.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-11-10 15:45:35 +01:00
Amit D
ad5ee18415 Make font size estimation work with the lstm engine (#1173)
**Partial** fix for issue #1074
2017-10-20 10:07:16 +02:00
ivanzz1001
fb359fc981 Update unicharset_extractor.cpp (#1153)
* change IsWhitespace to IsUTF8Whitespace

To solve "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

please reference: [#1147](https://github.com/tesseract-ocr/tesseract/issues/1147)

* Update unicharset_extractor.cpp

fix the "Phase UP: Generating unicharset and unichar properties files" ERROR

* Update unicharset_extractor.cpp

fix "Phase UP: Generating unicharset and unichar properties files" ERROR #1147

* Update unicharset_extractor.cpp

fix the encoding invalid problem and fix the comment
2017-10-13 11:46:42 +02:00
Egor Pugin
1b0379c257 Merge pull request #1163 from cysp/bugfix/leptonica-pkgconfig
Add Leptonica's pkg-config-found library directory to the search path
2017-10-03 16:05:51 +03:00
Scott Talbot
a538cd126b Add Leptonica's pkg-config-found library directory to the search path 2017-10-03 21:15:44 +11:00
Egor Pugin
1b4fb3a762 Update appveyor.yml 2017-09-26 17:01:52 +03:00
zdenop
2cc531e6bf Merge pull request #1140 from stweil/pagebreak
Remove Tesseract parameter "include_page_breaks" and use FF by default
2017-09-19 08:41:08 +02:00