Commit Graph

126 Commits

Author SHA1 Message Date
Josh Reid
cdc35338c5 Added check if input PSM value is outside of range (#1236)
Wrote a function to throw an error if PSM is outside 0-13 or OEM is outside 0-5.
fixes #1234
2017-12-14 11:37:44 +01:00
chrismamo1
5fd3e22f74 move code around so that list-langs will work without an English traineddata file 2017-08-12 17:15:27 -05:00
Ray Smith
2ef1aeaeb4 Added AVX2 and AVX512 detector 2017-08-02 14:15:50 -07:00
Ray Smith
da03e4e910 Fixes from pull of cleanups: clang tidied, reviewed, fixed new bugs, undeleted needed code. Probably breaks the build, due to some inclusion of changes in utf8/32 conversion 2017-07-14 09:30:14 -07:00
Justin Hotchkiss Palermo
f057938069 fix filenames in comments 2017-07-02 17:35:47 -04:00
Raf Schietekat
b4cf46697f Issue #529: inside main() use return rather than exit 2017-05-13 18:02:00 +02:00
Stefan Weil
78142593d2 Fix order of destructor calls for DawgCache and TessBaseAPI
TessBaseAPI must release its cache use before DawgCache is destroyed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-13 11:35:30 +02:00
Stefan Weil
f37f858c99 main: Fix two memory leaks
When Tesseract terminates by calling the exit function,
the destructor of any local auto variable is not called.

Fix two cases by using static variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-05-12 21:15:52 +02:00
Egor Pugin
0afd5939b1 Use NDEBUG macro instead of DEBUG. 2017-05-08 13:01:22 +03:00
Chris Mayo
b231aee212 tidy tesseract(1) adding missing options
Together with:
- fix "C\++"
- align executable --print-parameters message
2017-03-23 20:02:50 +00:00
Ray Smith
f566a45b30 clang-tidy changes from sync 2017-01-25 16:20:19 -08:00
Ray Smith
a1c22fb0d0 Fixed issue #557 2017-01-25 16:05:59 -08:00
Zdenko Podobný
effa5741e6 Implement invisible text only for PDF 2017-01-20 21:26:34 +01:00
Stefan Weil
534a237015 Move AVX / SSE messages to function PrintVersionInfo (crash fix)
This information is not needed for normal runs, so it is sufficient
to show it on request (like versions and OpenCL information).

This also fixes a crash caused by undefined order of global constructors:

When the global variable SIMDDetect::detector is initialized before the
global variable debug_file, the first tprintf call in simddetect.cpp
crashes because of a NULL pointer in debug_file. This was only seen when
running with a shared library (libtesseract.so).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-28 13:35:44 +01:00
Stefan Weil
b262136b45 opencl: Show up to four OpenCL platforms
The old code only allowed one platform.
Add also strict error handling.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-23 10:25:56 +01:00
Stefan Weil
217a4dda43 tesseract: Disable Leptonica messages
Disable debugging and informational messages from Leptonica
for release builds.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-12-07 21:37:23 +01:00
Ray Smith
5deebe6c27 Fixed multilang for LSTM, pushed cube to one side without actually deleting it 2016-12-05 14:41:43 -08:00
Stefan Weil
92d981b93a Change tesseract parameter -psm to --psm
For compatibility reasons the old variant is still supported.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 22:23:46 +01:00
Stefan Weil
d2f9264383 Change tesseract parameter -oem to --oem
It was introduced recently in commit f24ef67d, so there is no need
to support the old variant for compatibility reasons.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-11-30 22:23:46 +01:00
Egor Pugin
644469595c Fix windows build. 2016-11-24 17:32:23 +03:00
Ray Smith
f24ef67df4 Limited max height to 48 even in variable height input, enabled neural nets via ocr engine mode 2016-11-08 14:01:04 -08:00
Ray Smith
2c837dffc3 Result of clang tidy on recent merge 2016-11-07 10:46:33 -08:00
Stefan Weil
f1d3a3b7c3 api/tesseractmain: Fix memory leak caused by exit()
When exit() is called from ParseArgs(), no destructors are executed
for the auto variables vars_vec and vars_values.

Making both variables static fixes the memory leaks, because now the
destructors are always executed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-10-24 20:20:24 +02:00
Stefan Weil
75fdc086ec win32: Check whether tiffio.h is available
The previous commit added a dependency on tiffio.h, so enable the new
code only if that file is available.

The code which conditionally defines HAVE_TIFFIO_H was already there
although that macro was unused up to now.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-17 12:07:58 +02:00
Stefan Weil
896e80d9a7 win32: Show TIFF warnings on console
Showing them in a window (default) is not acceptable for a console
application like Tesseract which must be able to work in batch mode.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-07-16 10:45:18 +02:00
Stefan Weil
e59be55bcc Print list of languages to stdout instead to stderr
It is common practice for command line programs to print
user requested information on stdout.

This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
7e98c33432 Print help text to stdout instead to stderr
It is common practice for command line programs to show help text
on stdout. This seems to be reasonable for Tesseract, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-05-16 17:59:48 +02:00
Stefan Weil
076f21c1f2 Print version to stdout instead to stderr
Most command line programs print the version to stdout.
This seams to be reasonable for Tesseract, too.

Now a shell statement like "VERSION=$(tesseract --version)" works
without I/O redirection.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-16 12:10:27 +01:00
amitdo
bf5345f6a1 Don't display tesseract's banner when quiet mode is active 2016-03-07 19:25:09 +02:00
Tom Morris
6700edd8bc Cleanup TSV renderer
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
59d593d796 Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. 2016-03-01 12:23:12 -05:00
amitdo
6be9d7a5f8 Fix #64. Make box training work
This commit is better than 06fc0533c. Hopefully, this is the last fix to box training issue.
2016-01-29 03:37:34 +02:00
amitdo
06fc0533c8 Fix #184. Training should work now 2016-01-17 14:27:35 +02:00
amitdo
a20156fc67 Add missing ')'_to make the code compile 2015-12-11 19:42:16 +02:00
amitdo
c2f5e9b849 If there is no explicit renderer(s), default to TessTextRenderer
Revert fd429c32, 43834da7, 05de195e.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
2015-12-11 19:06:49 +02:00
Stefan Weil
71c9e028f7 tesseractmain: Prettify help message
Commit 99110df757 improved the help text
in several aspects, but also introduced new inconsistencies which this
patch tries to fix.

* Align columns (this needed replacing tabs by spaces).
* Start explaining text with uppercase.
* Replace "the stdout" by "stdout.
* Small changes in help text for page segmentation modes.
* Split options in OCR options and single options
  (partially revert commit 99110df757).

In addition, whitespace characters at end of lines were removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-29 10:26:40 +01:00
amitdo
99110df757 tesseractmain.cpp: Split huge main() to sub functions
Add these functions to api/tesseractmain.cpp:
PrintVersionInfo()
PrintUsage()
PrintHelpForPSM()
PrintHelpMessage()
SetVariablesFromCLArgs()
PrintLangsList()
FixPageSegMode()
ParseArgs()
PreloadRenderers()
2015-11-26 11:36:16 +02:00
Stefan Weil
03f37c0cdc tesseractmain: Fix unterminated string
Coverity bug report: CID 1270421 "Buffer not null terminated".

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-24 17:17:17 +01:00
amitdo
6bbcb50dd9 Added osd renderer for psm 0.
Works for single page and multi-page.
2015-10-30 20:09:00 +02:00
amitdo
dcfdd5c035 OSD: Print script name instead of meaningless script id 2015-10-28 09:50:28 +02:00
Stefan Weil
11b2a4d9af api: Fix typos in comments (all found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:54:27 +02:00
Zdenko Podobný
628de5ba3f enable pdfrender with NO_CUBE_BUILD 2015-08-07 23:20:22 +02:00
Zdenko Podobný
41478fd5a1 implement build without cube (-DNO_CUBE_BUILD) 2015-07-24 11:51:44 +02:00
Ray Smith
242b14ae7f Reduced size of multi-renderer implementation from code review 2014-10-09 13:29:46 -07:00
Zdenko Podobný
9e8629d9ef allow multiple output in tesseract executable (https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ) 2014-09-19 23:33:47 +02:00
theraysmith@gmail.com
b64ad05096 Improved efficiency of image processing for PDF
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1141 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:15:25 +00:00
zdenop
1156098567 Add font info to hocr output - fix issue 1219
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-03 16:22:12 +00:00
zdenop@gmail.com
84cdcb32cc fixed windows build
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1110 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-26 06:48:58 +00:00
theraysmith@gmail.com
25a8c7b720 Enabled streaming input and output of multi-page documents
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1105 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-21 15:46:21 +00:00
zdenop
2e520f2fac fix hocr/pdf output when image is provided from stdin - issue 1196
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1099 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-05-11 15:59:47 +00:00