RE: [xsl] performance of very very bad wtf-quality xslt

Subject: RE: [xsl] performance of very very bad wtf-quality xslt
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 30 Sep 2005 11:09:44 +0100
This is a tricky situation. I'm a great believer in rewriting bad code if
you become responsible for its maintenance. Apart from anything else, it's a
great way of learning about the code. But it can be hard to persuade a
sceptical project manager that it's a useful thing to do. One reason PMs are
sceptical is that no programmer ever likes the code written by another, they
always want to rewrite it (the NIH syndrome); and doing so takes time away
from other activities, and introduces bugs, especially if there's no
regression test suite. 

Sometimes it's best to do the rewrite incrementally without asking for
permission, as part of other changes you are asked to make to the code. In
general, though, doing things surreptitiously is not very professional: you
should convice the PM of your case.

If you can justify the rewrite on performance grounds then by all means do
so: but only if you have concrete evidence that the code is causing
performance problems. It's not enough to know that you could improve the
performance, you also have to know that the performance needs to be
improved. From what you've shown us, however, the big problem with this code
is that it's a maintenance nightmare, not that it performs badly, and you
should try to justify the rewrite on those grounds.

Michael Kay 

> -----Original Message-----
> From: bryan rasmussen [mailto:rasmussen.bryan@xxxxxxxxx] 
> Sent: 30 September 2005 10:51
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] performance of very very bad wtf-quality xslt
> I have access to an xslt that does an xml-to-xml transformation. I did
> not write this xslt. It is the second worse xslt I have ever seen. The
> worst xslt I have ever seen was written by the same company and had a
> strategy of the following - match the root node, run through 6 if
> statements to check what namespace the document element is in, output
> one of two document elements dependent on these input namespaces,
> within the if statements do 25 for-each statements equivalent to the
> 25 allowed children of the document element, on each element matched
> by a for-each statement do something like 7 for-each statements for
> the 7 allowed attributes, in a couple of these attributes that you
> want to compare with another attribute save the value of the attribute
> in a variable, go up to the parent do another 7 for-each statements
> through the attributes to find the one you want to match, do your
> match, after which we are in the elements, match elements by for-each
> statements, rinse, wash repeat, it is for-each statements all the way
> down!
> now I have not tested this absolute worst quality xslt on anything,
> according to the people who wrote it they performed tests and it
> worked fine, but my life is too short and it would probably be shorter
> if I had to deal too much with that.
> The problem is that my employers decided to put it out on a web site
> as a wonderful example of translating between these formats. Well now
> the second worst xslt ever has come into my possession. It is somewhat
> better. I doubt that there is any nesting of for-each any lower than
> six levels. and there are some templates that have been defined. This
> one is to do the transformation back from format 2 to format 1.
> here is an example fragment of this xslt, somewhat fixed to keep the
> knowledge as to formats and purposes out of the discussion (it might
> be politically inexpedient to discuss this with full knowledge on the
> net):
> <xsl:template match="/n1:doc">
> 		<doc>
> 			<xsl:attribute 
> name="xsi:schemaLocation">
> 			<xsl:variable name="doc" select="."/>
> 			<xsl:for-each select="n1:ID">
> 				<n2:ID>
> 					<xsl:value-of select="."/>
> 				</n2:ID>
> 			</xsl:for-each>
> 			<xsl:for-each select="n1:Date">
> 				<n2:Date>
> 					<xsl:value-of select="."/>
> 				</n2:IssueDate>
> 			</xsl:for-each>
> 			<n2:Code>
> 				<xsl:value-of select="$CodeValue"/>
> 			</n2:Code>
> 			<xsl:variable name="itexists">
> 				<xsl:variable name="results">
> 					<xsl:for-each 
> select="n1:Currency">
> 						<xsl:value-of 
> select="true()"/>
> 					</xsl:for-each>
> 				</xsl:variable>
> there is not as much for-each going on, but trust me these variables
> they've defined. each one is about 7 lines of code too long.
> Anyway, they say they've tested as well.
> That's great. One problem is we don't have a good test set of
> documents, some possible documents could be very big, I am worried
> about performance problems applying to the following areas:
> 1. for-each as a whole
> 2. nested for-each
> 3. for-each over attributes, for example if one just wants to
> transform a commonly occuring set of attributes I can't see how the
> AST that comes out of an xslt could be optimized for doing that with
> for-eaches, whereas it would be relatively easy with transforms.
> 4. performance hits on various processors for if statements and
> choose-when statements.
> 5.Any other performance parameters that you can think of in this
> context (one obvious one is that these stylesheets are approximately
> 8-10 times larger than they need to be.)
> Thanks
> Bryan Rasmussen

Current Thread