When translating XML to legacy ASCII-based formats with poor support for Unicode, such as man pages and Texinfo, there is always the problem that Unicode characters in the source document also have to be translated somehow.
A straightforward character set conversion from Unicode does not
suffice, because the target character set, usually US-ASCII or ISO
Latin-1, do not contain common characters such as dashes and
directional quotation marks that are widely used in XML documents.
But document formatters (man and Texinfo) allow such characters to
be entered by a markup escape: for example, \(lq for the left directional quote “
. And if a markup-level escape is not
available, an ASCII transliteration might be used: for example,
using the ASCII less-than sign < for
the angle quotation mark 〈.
So the Unicode character problem can be solved in two steps:
utf8trans, a program included in docbook2X, maps Unicode characters to markup-level escapes or transliterations.
Since there is not necessarily a fixed, official mapping of Unicode characters, utf8trans can read in user-modifiable character mappings expressed in text files and apply them. (Unlike most character set converters.)
In charmaps/man/roff.charmap
and
charmaps/man/texi.charmap
are
character maps that may be used for man-page and Texinfo
conversion. The programs db2x_manxml and db2x_texixml will apply these
character maps, or another character map specified by the user,
automatically.
The rest of the Unicode text is converted to some other
character set (encoding). For example, a French document with
accented characters (such as é
)
might be converted to ISO Latin 1.
This step is applied after utf8trans character mapping, using the iconv encoding conversion tool. Both db2x_manxml and db2x_texixml can call iconv automatically when producing their output.