Re: [xsl] Performance of predicate-based patterns

Subject: Re: [xsl] Performance of predicate-based patterns
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 23 Jan 2015 14:19:30 -0000
On Fri, Jan 23, 2015 at 11:28:31AM -0000, Michael Kay mike@xxxxxxxxxxxx
scripsit:
> We've started doing some performance work in Saxon on the DITA
> stylesheets, which use large numbers of match patterns in the form
> 
> <xsl:template match="*[contains(@class, ' token ')]">

If anybody ever starts using XSLT 2.0 for DITA processing, there are
going to be things like

<xsl:template match="*[(tokenize(@class,'\p{Zs}+')[normalize-space()])[2] eq 'topic/li']]">

showing up.  ("some $x in tokenize(@class,...."  seems pretty likely,
too.)

> Currently these require a very inefficient sequential search to find
> the matching rule for each element.
>
> Does anyone know of any other commonly-used stylesheets (or even,
> uncommonly used ones) which show similar characteristics, that is,
> large numbers of match patterns using predicate matching only, with no
> explicit element names? We'd like any optimizations we implement to be
> as general-purpose as possible.

I've done some conversion work on legal documents where the goal was to
get everything back on a single schema after a couple decades of
evolution in the element names of various DTDs.  Matches of the form

<xsl:template match="*[name() = ('P','NP','PARA')]">

showed up a fair bit to match on the abstract "that's a paragraph"
across the range of evolved element names.

There was also a fair bit of 

<xsl:template match="*[not(name() = ('PARA','LIST','TABLE')))]">

used as general "we don't think there's anything but those in the data
but let's not make rash assumptions" surprise handler templates.

-- Graydon

Current Thread