RE: [xsl] <br/> to <p> and optimization

Subject: RE: [xsl] <br/> to <p> and optimization
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 2 Jul 2003 21:41:40 +0100
Yes, your analysis of this is correct. It's the kind of expression that
some optimizers are likely to handle much better than others, so it may
be worth trying a couple of different XSLT processors; but in general
it's quite likely to have O(n^2) performance with respect to the number
of sibling nodes.

The other approach to this problem is the recursive walk through the
siblings. This is more likely to have linear performance, but with a
large number of siblings it can blow the stack size. The more recent
releases of Saxon use tail call optimization on call-template and
apply-templates which can eliminate this problem. (Some other processors
use it too, but not all.)

If you need to do this repeatedly it may be worth looking at non-XSLT
solutions. This kind of problem is often fairly easy to tackle with a
SAX filter, because it's purely linear and doesn't require much analysis
of the context.  You could also look at the new STX tools for doing
serial transformations.

Alternatively you could try an XSLT 2.0 solution. Essentially the code
is very simple:

<xsl:for-each-group select="*" group-starting-at="br">
  <p>
  <xsl:copy-of select="current-group()[not(self::br)]"/>
  </p>
</xsl:for-each-group>

But I don't know how it will perform in Saxon - the grouping facilities
haven't received very much attention from a performance perspective yet.
If you get any data on this, I would love to know.

Michael Kay

> 
> Hello,
> 
> On the archives of this list I have found a solution to the 
> problem of putting all elements between two <br/> elements 
> into a <p> element: 
> http://www.biglist.com/lists/xsl-list/archives/200101/msg00865
.html

However, this process takes a very very long time for "big" files (over
100k) which have lots of brs (up to two minutes), and I am looking for a
way to optimize it.

In fact my problem is I'm not sure I correctly understand the following
line:
	<xsl:variable name="content"
		select="preceding-sibling::node()
			[not($br-before) or
 			generate-id(preceding-sibling::br[1]) =
			generate-id($br-before)]" />

	$br-before is the preceding <br/>:
		<xsl:variable name="br-before"
			select="preceding-sibling::br[1]" />

So, for setting $content, do we mean that we test _all_ nodes before the
current <br/>, and for each of them we test that they are not themselves
the preceding <br/> (not($br-before)) and that they are actually after
the same <br/> than the one located by $br-before?

In that case obviously we test the same nodes many times: for every new
<br/>, we want to add nodes that are before the current <br/> and after
the preceding one, but we test again the nodes that are before the last
<br/> up to the start of the containing element. Therefore what we need
is a way to "stop" the selection once the current node that is being
tested is in fact $br-before?

Is this correct?

Regards,
Emmanuel Bégué



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread