Re: [xsl] Matching on keys

Subject: Re: [xsl] Matching on keys
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 6 Jan 2017 21:10:16 -0000
> Question: I can define a key, as in
>
> <xsl:key name="elements-by-class" match="*[matches(@class,'\S')]"
> use="tokenize(@class,'\s+')"/>
>
> then
>
> <xsl:template match="key('elements-by-class','foo')">
> ...
> </xsl:template>
>

A further thought on this.

Often, as a rule of thumb, with document-oriented XML, a transformation
applies-templates to each node in the source document approximately once.

With Saxon, if you apply-templates to an element named E, the maximum number
of rules matched will be the number of rules that can only match elements
named E, plus the number of rules that can match any element regardless of its
name. The expected number of matching attempts will be half that, unless
you've been very smart about ordering the rules.

So if you've got a stylesheet with 50 rules of the form match="*[some
predicate]", and you're matching 10K element nodes, then you're going to be
performing about 250K predicate evaluations.

Without John Lumley's indexation help (which, as I say, is not really
productized at present), the only effective ways of reducing the number of
matching attempts are:

* split the rules into multiple modes

* think about ordering the rules

* make as many rules as possible match a specific element name

* have fewer rules

In pathological cases you could consider a manually constructed decision
tree:

<xsl:template match="*[starts-with(@class, 'a')]">
  <xsl:apply-templates select="." mode="class-a"/>
</xsl:template>

<xsl:template match="*[@class='aardvark']" mode="class-a">
  ...
</xsl:template>

<xsl:template match="*[@class='abingdon']" mode="class-a">
  ...
</xsl:template>

etc...

<xsl:template match="*[starts-with(@class, 'b')]">
  <xsl:apply-templates select="." mode="class-b"/>
</xsl:template>

<xsl:template match="*[@class='banana']" mode="class-b">
  ...
</xsl:template>

The other approach is to not worry about the number of rule matches, but to
concentrate on making the matches faster. This is what using keys can achieve.
But the keys need to be carefully chosen. If you do

<xsl:template match="key('k', 'foo')"/>

and there are 10K elements with the key value 'foo', then the test for whether
an element matches this rule consists of seeing whether the element is present
in a set of 10K elements, and in Saxon that's a serial search of 10K entries
(it could be done better, but it isn't - indexes are not optimized for this
use case). So the cost of the match probably is going to take longer than
doing

<xsl:template match="*[@class = 'foo']"/>

Michael Kay
Saxonica

Current Thread