[xsl] An observation on the performance of fn:transform

Subject: [xsl] An observation on the performance of fn:transform
From: "Norman Tovey-Walsh ndw@xxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 3 Jul 2020 08:44:49 -0000
Hello world,

This isnbt a complaint, or explicitly a request for advice (though Ibm
always happy for helpful suggestions), just an observation. The workflow
for processing DocBook documents is roughly this pipeline:

1. Fixup the logical structure of the document (expand entities and
   replace entityref attributes with the corresponding fileref
   attributes).
2. Perform XInclude
3. Convert DocBook 4.x markup to 5.x markup if the source document
   appears to be DocBook 4.x (i.e., if its root element is in no
   namespace)
4. Perform transclusion[1]
5. Profile
6. Resolve annotations
7. Resolve XLinks (including external link bases)

These are all relatively small stylesheets and theybre currently run
with fn:transform. (This will, as Ibve said before, all be driven by
XProc in the medium term, but I have short term requirements.)

The last two or three steps are: transform the result of step 7 from
DocBook to HTML and then do a little cleanup on that output and, if
bchunkingb has been requested, break it into chunks.

Doing a little post-conversion cleanup improves the output and greatly
simplifies the chunking tasks.

Because Ibm old school, and because I initially had a bI canbt do this
as a pipeline because I donbt have XProcb mindset, I wrote up the
conversion to HTML, the cleanup, and the chunking as modes in the same
stylesheet.

Then this morning I thought, hang on, I could use fn:transform for those
steps too and get all the benefits of pipelines there (easier to
maintain, separately testable, etc.)

So I coded that up. I now have an *eight* stage pipeline where the last
stage does the transformation to HTML, cleanup of that HTML, and
possible chunking. Itbs all still in one stylesheet with modes because I
havenbt teased it apart yet, itbs just being run with fn:transform
instead of with a mode in the same stylesheet.

The performance difference is interesting.

Running 1,426 tests through the 8 stage pipeline: 4m19s.
Running 1,542 tests through the original 7 stages: 50s.

There are fewer tests in the former case because some of my XSpec tests
just canbt work against the new driver; Ibll have to run two sets of
tests which is kind of a drag, but I should be running separate tests
for all the stages anyway so I guess thatbs just the way it is.

The performance difference is presumably because it takes ~0.15s to
compile the main stylesheet each time. Which is, you know, pretty damned
fast, but adds up if youbre going to do it thousands of times in a row.

I donbt expect this to be an issue in real world use cases for the
stylesheets, but I thought it was interesting. Ibm not surprised, but it
wasnbt a consequence that had occurred to me before I started.

                                        Be seeing you,
                                          norm

[1] https://docbook.org/docs/transclusion/transclusion.html

--
Norman Tovey-Walsh <ndw@xxxxxxxxxx>
https://nwalsh.com/

> I think it's much more interesting to live not knowing than have
> answers which might be wrong.--Richard Feynman

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Current Thread