Performance analysis

The performance of docbook2X, and most other DocBook tools[2] can be summed up in a short phrase: they are slow.

On a modern computer producing only a few man pages at a time, with the right software — namely, libxslt as the XSLT processor — the DocBook tools are fast enough. But their slowness becomes a hindrance for generating hundreds or even thousands of man pages at a time.

The author of docbook2X encounters this problem whenever he tries to do automated tests of the docbook2X package. Presented below are some actual benchmarks, and possible approaches to efficient DocBook to man pages conversion.

Table 1. docbook2X running times on 2157 refentry documents

Step Time for all pages Avg. time per page
DocBook to Man-XML 519.61 s 0.24 s
Man-XML to man-pages 383.04 s 0.18 s
roff character mapping 6.72 s 0.0031 s
Total 909.37 s 0.42 s

The above benchmark was run on 2157 documents coming from the doclifter man-page-to-DocBook conversion tool. The man pages come from the section 1 man pages installed in the author’s Linux system. The XML files total 44.484 MiB, and on average are 20.6KiB long.

The results were obtained using the test script in test/mass/, using the default man-page conversion options. The test script employs the obvious optimizations, such as only loading once the XSLT processor, the man-pages stylesheet, db2x_manxml and utf8trans.

Unfortunately, there does not seem to be obvious ways that the performance can be improved, short of re-implementing the tranformation program in a tight programming language such as C.

Some notes on possible bottlenecks:

[2] with the notable exception of the docbook-to-man tool based on the instant stream processor (but this tool has many correctness problems)

[3] From preliminary estimates, the Pure-XSLT solution takes only slightly longer at this stage: .22 s per page

[4] Of course, conceptually, DocBook processing is more complicated. So these timings also give us an estimate of the cost of DocBook’s complexity: twice the cost over a simpler document type, which is actually not too bad.