Next: , Previous: The XSLT stylesheets, Up: Top



5 Charset considerations

XML uses Unicode as its character set, and so most XML tools use the UTF-8 encoding to cover all the possible characters. On the other hand, the non-XML world makes use of some other charsets1, and in fact neither man nor Texinfo support UTF-8 very well. So db2x_manxml and db2x_texixml have to transcode their output.

`Transcoding' can be separated into three components:


Footnotes

[1] `charset' is used very loosely here to mean any set of byte sequences used to represent characters. Other specifications typically do not make such fine distinctions between encoding and character set as the Unicode and XML standards do. Non-Unicode charsets are specifically referred to here.