Jan Kamlah
577e8a8b93
Add PAGE XML renderer / export ( #4214 )
...
Add PAGE XML export and documentation.
To generate PAGE XML output just add 'page' to the tesseract command.
The output is outputname + '.page.xml' to avoid conflicts with ALTO export.
The output can be customized with the flags:
tessedit_create_page_polygon and tessedit_create_page_wordlevel.
Co-authored-by: Stefan Weil <sw@weilnetz.de>
2024-04-19 21:12:39 +02:00
Stefan Weil
bcc1a3b45b
Rename frk -> deu_latf (ISO 639-3, ISO 15924)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-03-09 11:25:28 +01:00
Stefan Weil
7c7498c327
Rename BibTex file to please GitHub
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-08-08 16:51:58 +02:00
Stefan Weil
25cdca6492
combine_tessdata: Print "Version:" instead of "Version string:"
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-11-14 16:38:52 +01:00
Stefan Weil
386dd8a0c0
Update (master branch was renamed to main)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-13 07:42:46 +02:00
Stefan Weil
7fc9a34f79
Rename processed TIFF output file and add page number if needed (fixes issue #3544 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-09-01 14:16:05 +02:00
Stefan Weil
b7e8134dea
Update URLs for Google groups
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-04-11 10:43:28 +02:00
Stefan Weil
c676d5bcff
STRING is no longer required for Visual Studio
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-03-15 09:15:25 +01:00
Stefan Weil
ea446b1eae
Remove blanks at line endings
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-02-26 14:05:36 +01:00
Stefan Weil
b6787749e3
Remove rests from vs2010
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-20 13:36:59 +01:00
Stefan Weil
3195c8f75f
Add new option -l for combine_tessdata to list the network string
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-15 18:49:51 +01:00
Stefan Weil
73ffcabfe9
lstmtraining: Interpret negative value for --max_iterations as epochs
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-14 19:51:58 +01:00
Stefan Weil
e1b9f1b446
automake: Flat build for doc
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-01-11 21:58:45 +01:00
Stefan Weil
57efa41d47
Add XML_CATALOG_FILES for MacOS with Homebrew ( #3188 )
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-12-21 16:39:32 +01:00
Stefan Weil
3f2892bc04
Update description for fry language to match Wikipedia
2020-12-08 05:59:17 +01:00
Merlijn Wajer
5ff273675c
tesseract.1.asc: sync with languages available in tessdata-fast
...
cos, div, fao, fyr, gla, hye are available in Ubuntu's 'tesseract-ocr-*'
packages but not mentioned in the manpage.
2020-12-04 18:16:45 +01:00
Merlijn Wajer
58f7a72f00
Remove references to "kur" and "tgl", add "fil" to man page
...
"kur" no longer exists, might be named "kur_ara" (the old "kur_ara" is
now "kmr", which is actually Latin) now, but "kur" is not present in
tessdata_fast nor in tessdata_best. [1] [2]
"tgl" (Tagalo) is now named "fil" (Filipino) [3]
[1] https://github.com/tesseract-ocr/langdata/issues/124
[2] https://github.com/tesseract-ocr/tessdata_best/issues/23
[3] https://github.com/tesseract-ocr/langdata/issues/84
2020-12-01 23:43:50 +01:00
amitdo
4b6db07462
Improve disabled legacy engine build
2020-10-10 01:33:27 +03:00
Stefan Weil
16553014e0
Replace references to the old wiki by new URLs
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-02-03 11:37:41 +01:00
Stefan Weil
a1a177f582
Doxyfile: Add missing source directories (include, unittest)
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-30 14:35:24 +01:00
Stefan Weil
cc05d19495
Doxyfile: Update to version 1.8.16
...
The update was done using `doxygen -u`.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2020-01-30 14:20:05 +01:00
Shreeshrii
99dfa8a680
Add separator and training_iteration to checkpoint name ( #2752 )
...
* Add separator and training_iteration to checkpoint name
* specify modelname_N.NN_NN_NN.checkpoint for intermediate checkpoint
2019-11-09 12:22:40 +01:00
zhuangzhuang1988
25acd28e1e
add debugger view for params
2019-07-04 07:17:28 +02:00
supermouse
3f3d11a580
move natvis file path
...
when use natvis with visual studio see href (https://docs.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=vs-2019 )
2019-07-04 07:17:28 +02:00
Shree
00abf57d02
Update documentation for unicharset_extractor
2019-05-31 08:20:19 +00:00
Stefan Weil
5f76a8495b
Sort options alphabetically in tesseract man page
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 10:19:00 +01:00
Stefan Weil
b55984fb88
Add description for new --dpi option in tesseract man page
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 09:33:41 +01:00
Stefan Weil
26b4457b86
Add description for new --psm values in tesseract man page
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 09:24:40 +01:00
Stefan Weil
a6981ae548
Improve man page for tesseract
...
Format it like the example
https://github.com/asciidoc/asciidoc/blob/master/doc/asciidoc.1.txt .
Replace tab characters by blanks.
Add also a chapter on environment variables.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-16 08:54:28 +01:00
Stefan Weil
6b3c81c909
Add rule for PDF documentation
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 21:53:34 +01:00
Stefan Weil
e14797563b
Update documentation for supported languages
...
kur_ara.traineddata was renamed to kmr.traineddata.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 11:07:54 +01:00
Stefan Weil
85d7feebf7
Add missing documentation for --help-extra
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-03-15 09:36:10 +01:00
Chris Mayo
a9d3efb6e3
Document that configfile can be a file path
...
Useful for custom config or when pointing tessdata to alternate
traineddata.
2019-03-05 19:47:54 +00:00
russiaayya
c6cc54aa76
Change option -l to --lang
2019-02-27 12:55:34 -05:00
zdenop
48be357688
Merge pull request #2220 from cjmayo/man_config
...
Man page description of configs and parameters
2019-02-16 13:53:47 +01:00
Stefan Weil
6e37389fcd
doc: Don't fail if manpages fail to build
...
Even with installed asciidoc and xsltproc the build will fail if
xsltproc cannot find the required stylesheet
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl .
Ignore such errors until there is a better check in configure.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-15 22:19:52 +01:00
Felix Yan
d35f119c68
Fix a typo in Doxyfile
2019-02-15 04:07:35 +08:00
Raphael Graf
86b14c32a9
Avoid gmake-specific pattern substitution in Makefile.am.
...
Resolves #2226
2019-02-08 19:39:45 +01:00
Chris Mayo
c3b18cfd27
Improve description of configs and parameters in tesseract(1)
...
Try to make the relationship between configs, -c and --print-parameters
clearer by always using parameter and not variable.
Include the filenames created by each config.
2019-02-06 20:03:51 +00:00
Chris Mayo
da279e4216
Tidy tesseract(1)
...
A typo and missing full stops.
2019-02-05 19:58:40 +00:00
Stefan Weil
39ed30ad83
Fix build rule for manpages
...
This is similar to commit 2106cba0a9
which fixed doc/generate_manpages.sh.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-02-01 19:47:46 +01:00
Chris Mayo
2106cba0a9
Use universal location for docbook.xsl
...
xsltproc will use the system catalog to find the local path.
Pass --nonet option to ensure the Internet is not used.
2019-02-01 17:55:59 +01:00
Stefan Weil
a0e6586e63
Fix documentation for page segmentation mode 2
...
It never worked, so add a comment that the implementation is missing.
Add also a to-do comment.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2019-01-09 13:51:44 +01:00
Jake Sebright
e398601bf5
Include ALTO in list of supported output formats
2018-12-15 10:41:24 +01:00
zdenop
aefcbac840
add info about unicharambigs file v2; fixes #165
2018-10-21 20:18:48 +02:00
Zdenko Podobný
b0b5bd62f3
build doc only for tesseract engine
2018-10-12 19:01:17 +02:00
Stefan Weil
3315931859
Merge and enhance documentation on language and script models
...
Add also links to the user forum and to the Wiki and update the
history text.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 16:55:21 +02:00
Stefan Weil
383dcf70b5
Document some more config options for tesseract
...
Clarify also the name(s) of the generated OCR result file(s):
Tesseract does not create a file named outbase.txt by default.
Fix also a sentence in the language section.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-05 16:03:51 +02:00
Stefan Weil
b70a456788
Add Makefile rule to build HTML manpages
...
They can be built optionally by `make html` (only for automake builds).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 22:36:03 +02:00
Stefan Weil
3e9b0acc5c
Update tesseract man page
...
- move Tesseract 4 release note to other release notes
- format command line options in text
- add link to release notes (wiki)
- add link to contributors (GitHub)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2018-10-04 22:10:22 +02:00