Re: [xsl] Matching on keys

Subject: Re: [xsl] Matching on keys
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 6 Jan 2017 21:29:19 -0000
Hi Mike Kay, Graydon, and XSL-List,

All your remarks are very interesting and helpful.

In the main case I am looking at, I can hard-wire the order of the
rules by setting @priority ... which I suppose will help that problem.
(Except: then I use next-match to set up a cascade, ha! ... so there
goes that I suppose.)

However, I also have the advantage that the documents are not large,
typically 100-200KB of XHTML at a time. (Which is part of the reason I
was afraid I wasn't really seeing a performance hit.)

Thanks --
Wendell




On Fri, Jan 6, 2017 at 4:10 PM, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>
>> Question: I can define a key, as in
>>
>> <xsl:key name="elements-by-class" match="*[matches(@class,'\S')]"
>> use="tokenize(@class,'\s+')"/>
>>
>> then
>>
>> <xsl:template match="key('elements-by-class','foo')">
>> ...
>> </xsl:template>
>>
>
> A further thought on this.
>
> Often, as a rule of thumb, with document-oriented XML, a transformation
applies-templates to each node in the source document approximately once.
>
> With Saxon, if you apply-templates to an element named E, the maximum number
of rules matched will be the number of rules that can only match elements
named E, plus the number of rules that can match any element regardless of its
name. The expected number of matching attempts will be half that, unless
you've been very smart about ordering the rules.
>
> So if you've got a stylesheet with 50 rules of the form match="*[some
predicate]", and you're matching 10K element nodes, then you're going to be
performing about 250K predicate evaluations.
>
> Without John Lumley's indexation help (which, as I say, is not really
productized at present), the only effective ways of reducing the number of
matching attempts are:
>
> * split the rules into multiple modes
>
> * think about ordering the rules
>
> * make as many rules as possible match a specific element name
>
> * have fewer rules
>
> In pathological cases you could consider a manually constructed decision
tree:
>
> <xsl:template match="*[starts-with(@class, 'a')]">
>   <xsl:apply-templates select="." mode="class-a"/>
> </xsl:template>
>
> <xsl:template match="*[@class='aardvark']" mode="class-a">
>   ...
> </xsl:template>
>
> <xsl:template match="*[@class='abingdon']" mode="class-a">
>   ...
> </xsl:template>
>
> etc...
>
> <xsl:template match="*[starts-with(@class, 'b')]">
>   <xsl:apply-templates select="." mode="class-b"/>
> </xsl:template>
>
> <xsl:template match="*[@class='banana']" mode="class-b">
>   ...
> </xsl:template>
>
> The other approach is to not worry about the number of rule matches, but to
concentrate on making the matches faster. This is what using keys can achieve.
But the keys need to be carefully chosen. If you do
>
> <xsl:template match="key('k', 'foo')"/>
>
> and there are 10K elements with the key value 'foo', then the test for
whether an element matches this rule consists of seeing whether the element is
present in a set of 10K elements, and in Saxon that's a serial search of 10K
entries (it could be done better, but it isn't - indexes are not optimized for
this use case). So the cost of the match probably is going to take longer than
doing
>
> <xsl:template match="*[@class = 'foo']"/>
>
> Michael Kay
> Saxonica
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread