Commit Graph

119 Commits

Author SHA1 Message Date
Jan Kamlah
577e8a8b93 Add PAGE XML renderer / export (#4214)
Add PAGE XML export and documentation.
To generate PAGE XML output just add 'page' to the tesseract command.

The output is outputname + '.page.xml' to avoid conflicts with ALTO export.

The output can be customized with the flags:
tessedit_create_page_polygon and tessedit_create_page_wordlevel.

Co-authored-by: Stefan Weil <sw@weilnetz.de>
2024-04-19 21:12:39 +02:00
Stefan Weil
bcc1a3b45b Rename frk -> deu_latf (ISO 639-3, ISO 15924)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-03-09 11:25:28 +01:00
Stefan Weil
7c7498c327 Rename BibTex file to please GitHub
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-08-08 16:51:58 +02:00
Stefan Weil
25cdca6492 combine_tessdata: Print "Version:" instead of "Version string:"
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:38:52 +01:00
Stefan Weil
386dd8a0c0 Update (master branch was renamed to main)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-13 07:42:46 +02:00
Stefan Weil
7fc9a34f79 Rename processed TIFF output file and add page number if needed (fixes issue #3544)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-01 14:16:05 +02:00
Stefan Weil
b7e8134dea Update URLs for Google groups
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 10:43:28 +02:00
Stefan Weil
c676d5bcff STRING is no longer required for Visual Studio
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:15:25 +01:00
Stefan Weil
ea446b1eae Remove blanks at line endings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 14:05:36 +01:00
Stefan Weil
b6787749e3 Remove rests from vs2010
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-20 13:36:59 +01:00
Stefan Weil
3195c8f75f Add new option -l for combine_tessdata to list the network string
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-15 18:49:51 +01:00
Stefan Weil
73ffcabfe9 lstmtraining: Interpret negative value for --max_iterations as epochs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-14 19:51:58 +01:00
Stefan Weil
e1b9f1b446 automake: Flat build for doc
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-11 21:58:45 +01:00
Stefan Weil
57efa41d47
Add XML_CATALOG_FILES for MacOS with Homebrew (#3188)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-21 16:39:32 +01:00
Stefan Weil
3f2892bc04
Update description for fry language to match Wikipedia 2020-12-08 05:59:17 +01:00
Merlijn Wajer
5ff273675c tesseract.1.asc: sync with languages available in tessdata-fast
cos, div, fao, fyr, gla, hye are available in Ubuntu's 'tesseract-ocr-*'
packages but not mentioned in the manpage.
2020-12-04 18:16:45 +01:00
Merlijn Wajer
58f7a72f00 Remove references to "kur" and "tgl", add "fil" to man page
"kur" no longer exists, might be named "kur_ara" (the old "kur_ara" is
now "kmr", which is actually Latin) now, but "kur" is not present in
tessdata_fast nor in tessdata_best. [1] [2]

"tgl" (Tagalo) is now named "fil" (Filipino) [3]

[1] https://github.com/tesseract-ocr/langdata/issues/124
[2] https://github.com/tesseract-ocr/tessdata_best/issues/23
[3] https://github.com/tesseract-ocr/langdata/issues/84
2020-12-01 23:43:50 +01:00
amitdo
4b6db07462 Improve disabled legacy engine build 2020-10-10 01:33:27 +03:00
Stefan Weil
16553014e0 Replace references to the old wiki by new URLs
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-02-03 11:37:41 +01:00
Stefan Weil
a1a177f582 Doxyfile: Add missing source directories (include, unittest)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-30 14:35:24 +01:00
Stefan Weil
cc05d19495 Doxyfile: Update to version 1.8.16
The update was done using `doxygen -u`.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-30 14:20:05 +01:00
Shreeshrii
99dfa8a680 Add separator and training_iteration to checkpoint name (#2752)
* Add separator and training_iteration to checkpoint name
* specify modelname_N.NN_NN_NN.checkpoint for intermediate checkpoint
2019-11-09 12:22:40 +01:00
zhuangzhuang1988
25acd28e1e add debugger view for params 2019-07-04 07:17:28 +02:00
supermouse
3f3d11a580 move natvis file path
when use natvis with visual studio see href (https://docs.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=vs-2019)
2019-07-04 07:17:28 +02:00
Shree
00abf57d02 Update documentation for unicharset_extractor 2019-05-31 08:20:19 +00:00
Stefan Weil
5f76a8495b Sort options alphabetically in tesseract man page
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:19:00 +01:00
Stefan Weil
b55984fb88 Add description for new --dpi option in tesseract man page
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 09:33:41 +01:00
Stefan Weil
26b4457b86 Add description for new --psm values in tesseract man page
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 09:24:40 +01:00
Stefan Weil
a6981ae548 Improve man page for tesseract
Format it like the example
https://github.com/asciidoc/asciidoc/blob/master/doc/asciidoc.1.txt.

Replace tab characters by blanks.

Add also a chapter on environment variables.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 08:54:28 +01:00
Stefan Weil
6b3c81c909 Add rule for PDF documentation
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 21:53:34 +01:00
Stefan Weil
e14797563b Update documentation for supported languages
kur_ara.traineddata was renamed to kmr.traineddata.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 11:07:54 +01:00
Stefan Weil
85d7feebf7 Add missing documentation for --help-extra
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 09:36:10 +01:00
Chris Mayo
a9d3efb6e3 Document that configfile can be a file path
Useful for custom config or when pointing tessdata to alternate
traineddata.
2019-03-05 19:47:54 +00:00
russiaayya
c6cc54aa76
Change option -l to --lang 2019-02-27 12:55:34 -05:00
zdenop
48be357688
Merge pull request #2220 from cjmayo/man_config
Man page description of configs and parameters
2019-02-16 13:53:47 +01:00
Stefan Weil
6e37389fcd doc: Don't fail if manpages fail to build
Even with installed asciidoc and xsltproc the build will fail if
xsltproc cannot find the required stylesheet
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl.

Ignore such errors until there is a better check in configure.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-15 22:19:52 +01:00
Felix Yan
d35f119c68
Fix a typo in Doxyfile 2019-02-15 04:07:35 +08:00
Raphael Graf
86b14c32a9 Avoid gmake-specific pattern substitution in Makefile.am.
Resolves #2226
2019-02-08 19:39:45 +01:00
Chris Mayo
c3b18cfd27 Improve description of configs and parameters in tesseract(1)
Try to make the relationship between configs, -c and --print-parameters
clearer by always using parameter and not variable.

Include the filenames created by each config.
2019-02-06 20:03:51 +00:00
Chris Mayo
da279e4216 Tidy tesseract(1)
A typo and missing full stops.
2019-02-05 19:58:40 +00:00
Stefan Weil
39ed30ad83 Fix build rule for manpages
This is similar to commit 2106cba0a9
which fixed doc/generate_manpages.sh.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-01 19:47:46 +01:00
Chris Mayo
2106cba0a9 Use universal location for docbook.xsl
xsltproc will use the system catalog to find the local path.
Pass --nonet option to ensure the Internet is not used.
2019-02-01 17:55:59 +01:00
Stefan Weil
a0e6586e63 Fix documentation for page segmentation mode 2
It never worked, so add a comment that the implementation is missing.
Add also a to-do comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 13:51:44 +01:00
Jake Sebright
e398601bf5 Include ALTO in list of supported output formats 2018-12-15 10:41:24 +01:00
zdenop
aefcbac840 add info about unicharambigs file v2; fixes #165 2018-10-21 20:18:48 +02:00
Zdenko Podobný
b0b5bd62f3 build doc only for tesseract engine 2018-10-12 19:01:17 +02:00
Stefan Weil
3315931859 Merge and enhance documentation on language and script models
Add also links to the user forum and to the Wiki and update the
history text.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 16:55:21 +02:00
Stefan Weil
383dcf70b5 Document some more config options for tesseract
Clarify also the name(s) of the generated OCR result file(s):
Tesseract does not create a file named outbase.txt by default.

Fix also a sentence in the language section.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 16:03:51 +02:00
Stefan Weil
b70a456788 Add Makefile rule to build HTML manpages
They can be built optionally by `make html` (only for automake builds).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 22:36:03 +02:00
Stefan Weil
3e9b0acc5c Update tesseract man page
- move Tesseract 4 release note to other release notes
- format command line options in text
- add link to release notes (wiki)
- add link to contributors (GitHub)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 22:10:22 +02:00