Thijs Leegwater
f061503a14
Added JPEG quality option parameter (-c jpg_quality=n)
2018-01-11 09:11:30 +01:00
Stefan Weil
aa6eb6bd46
Remove Tesseract parameter "include_page_breaks" and use FF by default
...
Now Tesseract adds a page break (normally form feed) by default.
It is still possible to suppress page breaks by setting an empty
page_separator.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-09-19 07:34:32 +02:00
jm
2a77d5ad69
returns the correct dictionary if lstm only used
2017-09-14 13:03:22 +02:00
Ray Smith
0382222d85
More clang-tidy fixes from sync
2017-09-08 10:22:32 +01:00
Stefan Weil
b016c48d06
Add missing spaces in help text
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2017-08-23 19:12:41 +02:00
Ray Smith
1cc511188d
Added extra Init that takes a memory buffer or a filereader function pointer to enable read of traineddata from memory or foreign file systems. Updated existing readers to use TFile API instead of FILE. This does not yet add big-endian capability to LSTM, but it is very easy from here.
2017-04-27 15:48:23 -07:00
Ray Smith
f566a45b30
clang-tidy changes from sync
2017-01-25 16:20:19 -08:00
Ray Smith
b453f74e01
Fixed issue #633 (multi-language mode
2017-01-25 15:58:39 -08:00
zdenop
c768b5867d
Merge pull request #668 from Wikinaut/chg-textonly-pdf-parameter-description
...
Improve textonly_pdf parameter description
2017-01-21 16:29:06 +01:00
Wikinaut
c03299e2b4
Improve textonly_pdf parameter description
2017-01-21 16:18:53 +01:00
Wikinaut
98df78ca8a
fix typo in parameter description
2017-01-21 10:48:25 +01:00
Zdenko Podobný
effa5741e6
Implement invisible text only for PDF
2017-01-20 21:26:34 +01:00
Wikinaut
39274d8000
typo correction "specific"
2017-01-13 04:17:32 +01:00
Simon Strandgaard
d38cffc332
Fixed typo
2016-12-15 14:58:53 +00:00
Ray Smith
9f5ba9105f
Removed dependency on cube from the code
2016-12-14 10:55:15 -08:00
Ray Smith
13e46ae1c4
Made LSTM the default engine, pushed cube out
2016-12-13 14:37:40 -08:00
Ray Smith
5deebe6c27
Fixed multilang for LSTM, pushed cube to one side without actually deleting it
2016-12-05 14:41:43 -08:00
Ray Smith
c1c1e426b3
Added new LSTM-based neural network line recognizer
2016-11-07 15:38:07 -08:00
Ray Smith
2c837dffc3
Result of clang tidy on recent merge
2016-11-07 10:46:33 -08:00
Tom Morris
6700edd8bc
Cleanup TSV renderer
...
Remove all references to hocr, hocr.tsv, etc. Remove dead code for font
info, input filename, HTML escapes. Improved comments. Fixed
indentation.
2016-03-01 13:41:19 -05:00
Sundar M. Vaidya
738fe4f757
Adds BoolParam tessedit_create_hocrtsv in class Tesseract.
2016-03-01 12:30:39 -05:00
amitdo
c2f5e9b849
If there is no explicit renderer(s), default to TessTextRenderer
...
Revert fd429c32
, 43834da7
, 05de195e
.
See #49 , #59 .
The code in this commit solves the issue in a more elegant way, IMHO.
Now you can use:
* `tesseract eurotext.tif eurotext txt pdf`
* `tesseract eurotext.tif eurotext txt hocr`
* `tesseract eurotext.tif eurotext txt hocr pdf`
NOTE:
With `tesseract eurotext.tif eurotext`
or `tesseract eurotext.tif eurotext txt`
the psm will be set to '3', but...
With `tesseract eurotext.tif eurotext txt pdf`
or `tesseract eurotext.tif eurotext txt hocr`
the psm will be set to '1'.
2015-12-11 19:06:49 +02:00
Stefan Weil
318b88daa6
ccmain: Fix typos in comments and strings
...
Most of them were found by codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-09-14 21:59:16 +02:00
Zdenko Podobný
41478fd5a1
implement build without cube (-DNO_CUBE_BUILD)
2015-07-24 11:51:44 +02:00
Ray Smith
0e868ef377
Major change to improve layout analysis for heavily diacritic languages:
...
Tha, Vie, Kan, Tel etc.
There is a new overlap detector that detects when diacritics
cause a big increase in textline overlap. In such cases, diacritics from
overlap regions are kept separate from layout analysis completely, allowing
textline formation to happen without them. The diacritics are then assigned
to 0, 1 or 2 close words at the end of layout analysis, using and modifying
an old noise detection data path.
The stored diacritics are used or not during recognition according to the
character classifier's liking for them.
2015-05-12 16:47:02 -07:00
Ray Smith
4a3caefd92
Add ability to build under android (without cube or scrollview).
2015-05-12 15:41:15 -07:00
Zdenko Podobný
4c7c960bfd
fix issue 1417
2015-02-07 22:22:20 +01:00
Zdenko Podobný
36883b4faf
preserve interword spaces patch - Issue 1409
2015-01-27 22:58:04 +01:00
Ray Smith
f927728169
Fixed issue 1207
2014-10-09 13:28:03 -07:00
Zdenko Podobný
d0cb1071b2
remove parameters tessedit_pdf_jpg_quality, tessedit_pdf_compression (reasons are in i1300 and i1285)
2014-10-07 23:37:34 +02:00
Ray Smith
55d11ad3c2
Moved params from global in page layout to tesseractclass, improved single column layout analysis
2014-10-07 09:31:00 -07:00
Zdenko Podobný
9e8629d9ef
allow multiple output in tesseract executable ( https://groups.google.com/d/msg/tesseract-ocr/Z_WUKmJDVxc/1vc3W0xJZ2oJ )
2014-09-19 23:33:47 +02:00
Zdenko Podobný
ff87944171
fix typo
2014-09-07 18:23:47 +02:00
Zdenko Podobný
d1aa61c110
fix issue 1285: reimplement option to select pdf compression
2014-09-06 09:32:22 +02:00
Ray Smith
09b439b05a
Fixed issue 1241, but disabled due to making accuracy worse
2014-08-13 13:33:10 -07:00
theraysmith@gmail.com
dbf6197471
Major refactor of control.cpp to enable line recognition
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1147 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-11 23:23:06 +00:00
zdenop
6941bffbd2
fix typo
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1135 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-09 17:53:57 +00:00
zdenop
bce2cd5f33
enable to select pdf compression type and jpeg quality (fix issue 1263)
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1134 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-08 21:18:44 +00:00
zdenop
1156098567
Add font info to hocr output - fix issue 1219
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1132 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-08-03 16:22:12 +00:00
theraysmith@gmail.com
d2ad450502
Added PDF renderer
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@957 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2014-01-09 17:47:34 +00:00
theraysmith@gmail.com
7ec4fd7a56
Refactorerd control functions to enable parallel blob classification
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@904 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-11-08 20:30:56 +00:00
theraysmith@gmail.com
2aafc9df24
Improved sub/superscript treatment
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@872 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2013-09-20 19:49:47 +00:00
theraysmith@gmail.com
3a998fe7ac
Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, Added paragraph detection in layout analysis/post OCR, Fixed inconsistent xheight during training and over-chopping, Added simultaneous multi-language capability, Refactored top-level word recognition module, Fixed problems with internally scaled images
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@651 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2012-02-02 02:59:49 +00:00
zdenop@gmail.com
da41b96f7f
removed check for libtiff - leptonica is required; cleanup #ifdef/#ifndef HAVE_LIBLEPT
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@624 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-08-30 06:34:41 +00:00
theraysmith
3e8c0bc228
Various fixes, including memory leak in fixspace, font labels on output, removed some annoying debug output, fixes to initialization of parameters, general cleanup, and added Hindi
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@567 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2011-03-21 21:44:05 +00:00
theraysmith
c8465252e4
Rewrite of DENORM
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@538 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-30 01:05:48 +00:00
zdenop@gmail.com
4523ce9f7d
3.01 code from http://github.com/jimregan/tesseract-ocr with addaptions related to Linux and Windows (VC2008) compile process
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@526 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2010-11-23 18:34:14 +00:00
theraysmith
96e8b51feb
More changes to ccmain for 3.00
...
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@287 d0cd1f9f-072b-0410-8dd7-cf729c803f20
2009-07-11 02:07:25 +00:00