[xsl] performance of very very bad wtf-quality xslt

Subject: [xsl] performance of very very bad wtf-quality xslt
From: bryan rasmussen <rasmussen.bryan@xxxxxxxxx>
Date: Fri, 30 Sep 2005 11:50:30 +0200
I have access to an xslt that does an xml-to-xml transformation. I did
not write this xslt. It is the second worse xslt I have ever seen. The
worst xslt I have ever seen was written by the same company and had a
strategy of the following - match the root node, run through 6 if
statements to check what namespace the document element is in, output
one of two document elements dependent on these input namespaces,
within the if statements do 25 for-each statements equivalent to the
25 allowed children of the document element, on each element matched
by a for-each statement do something like 7 for-each statements for
the 7 allowed attributes, in a couple of these attributes that you
want to compare with another attribute save the value of the attribute
in a variable, go up to the parent do another 7 for-each statements
through the attributes to find the one you want to match, do your
match, after which we are in the elements, match elements by for-each
statements, rinse, wash repeat, it is for-each statements all the way
down!


now I have not tested this absolute worst quality xslt on anything,
according to the people who wrote it they performed tests and it
worked fine, but my life is too short and it would probably be shorter
if I had to deal too much with that.

The problem is that my employers decided to put it out on a web site
as a wonderful example of translating between these formats. Well now
the second worst xslt ever has come into my possession. It is somewhat
better. I doubt that there is any nesting of for-each any lower than
six levels. and there are some templates that have been defined. This
one is to do the transformation back from format 2 to format 1.

here is an example fragment of this xslt, somewhat fixed to keep the
knowledge as to formats and purposes out of the discussion (it might
be politically inexpedient to discuss this with full knowledge on the
net):



<xsl:template match="/n1:doc">
		<doc>
			<xsl:attribute name="xsi:schemaLocation">http://blah.com
http://blah.com/blah.xsd</xsl:attribute>
			<xsl:variable name="doc" select="."/>
			<xsl:for-each select="n1:ID">
				<n2:ID>
					<xsl:value-of select="."/>
				</n2:ID>
			</xsl:for-each>
			<xsl:for-each select="n1:Date">
				<n2:Date>
					<xsl:value-of select="."/>
				</n2:IssueDate>
			</xsl:for-each>
			<n2:Code>
				<xsl:value-of select="$CodeValue"/>
			</n2:Code>
			<xsl:variable name="itexists">
				<xsl:variable name="results">
					<xsl:for-each select="n1:Currency">
						<xsl:value-of select="true()"/>
					</xsl:for-each>
				</xsl:variable>

there is not as much for-each going on, but trust me these variables
they've defined. each one is about 7 lines of code too long.

Anyway, they say they've tested as well.

That's great. One problem is we don't have a good test set of
documents, some possible documents could be very big, I am worried
about performance problems applying to the following areas:

1. for-each as a whole
2. nested for-each
3. for-each over attributes, for example if one just wants to
transform a commonly occuring set of attributes I can't see how the
AST that comes out of an xslt could be optimized for doing that with
for-eaches, whereas it would be relatively easy with transforms.
4. performance hits on various processors for if statements and
choose-when statements.
5.Any other performance parameters that you can think of in this
context (one obvious one is that these stylesheets are approximately
8-10 times larger than they need to be.)

Thanks
Bryan Rasmussen

Current Thread