The unicharambigs file (a component of traineddata, see combine_tessdata(1) ) is used by Tesseract to represent possible ambiguities between characters, or groups of characters\&.
contains either 1 or 0\&. 1 denotes a mandatory replacement, 0 denotes an optional replacement\&.
T}
.TE
.sp1
.sp
Characters appearing in fields two and four should appear in unicharset\&. The numbers in fields one and three refer to the number of unichars (not bytes)\&.
.SH"EXAMPLE"
.sp
.ifn\{\
.RS4
.\}
.nf
2 \*(Aq \*(Aq 1 " 1
1 m 2 r n 0
3 i i i 1 m 0
.fi
.ifn\{\
.RE
.\}
.sp
In this example, all instances of the \fI2\fR character sequence \fI\*(Aq\fR\*(Aq will \fBalways\fR be replaced by the \fI1\fR character sequence \fI"\fR; a \fI1\fR character sequence \fIm\fR\fBmay\fR be replaced by the \fI2\fR character sequence \fIrn\fR, and the \fI3\fR character sequence \fBmay\fR be replaced by the \fI1\fR character sequence \fIm\fR\&.
.SH"HISTORY"
.sp
The unicharambigs file first appeared in Tesseract 3\&.00; prior to that, a similar format, called DangAmbigs (\fIdangerous ambiguities\fR) was used: the format was almost identical, except only mandatory replacements could be specified, and field 5 was absent\&.
.SH"BUGS"
.sp
This is a documentation "bug": it\(cqs not currently clear what should be done in the case of ligatures (such as \fIfi\fR) which may also appear as regular letters in the unicharset\&.