Re: [xsl] Performance of predicate-based patterns

Subject: Re: [xsl] Performance of predicate-based patterns
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 23 Jan 2015 19:52:46 -0000
I don't think anyone at all familiar with normal DITA XSLT practice would
use anything other than [contains(@class, ' foo/bar ')] or the DITA
Community df:class() function:

<xsl:function name="df:class" as="xs:boolean">
    <xsl:param name="elem" as="element()"/>
    <xsl:param name="classSpec" as="xs:string"/>

      <!-- '\$" in the regex is a workaround for a bug in MarkLogic 3.x
and for a common user
         error, where trailing space in class= attribute is dropped.
      -->
    <xsl:variable name="normalizedClassSpec" as="xs:string"
select="normalize-space($classSpec)"/>
    <xsl:variable name="result"
       select="matches($elem/@class,
                       concat(' ', $normalizedClassSpec, ' | ',
$normalizedClassSpec, '$'))"
       as="xs:boolean"/>

    <xsl:sequence select="$result"/>
  </xsl:function>

The df:class() function handles the case where a @class attribute value is
missing the required trailing space in the @class value (a problem that
MarkLogic used to cause but that was fixed in ML 4 I think).


If there's a more efficient way to match values in the @class attribute,
I'd certainly like to know about it.

Cheers,

E.
bbbbb
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 1/23/15, 8:19 AM, "Graydon graydon@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>On Fri, Jan 23, 2015 at 11:28:31AM -0000, Michael Kay mike@xxxxxxxxxxxx
>scripsit:
>> We've started doing some performance work in Saxon on the DITA
>> stylesheets, which use large numbers of match patterns in the form
>>
>> <xsl:template match="*[contains(@class, ' token ')]">
>
>If anybody ever starts using XSLT 2.0 for DITA processing, there are
>going to be things like
>
><xsl:template match="*[(tokenize(@class,'\p{Zs}+')[normalize-space()])[2]
>eq 'topic/li']]">
>
>showing up.  ("some $x in tokenize(@class,...."  seems pretty likely,
>too.)
>
>> Currently these require a very inefficient sequential search to find
>> the matching rule for each element.
>>
>> Does anyone know of any other commonly-used stylesheets (or even,
>> uncommonly used ones) which show similar characteristics, that is,
>> large numbers of match patterns using predicate matching only, with no
>> explicit element names? We'd like any optimizations we implement to be
>> as general-purpose as possible.
>
>I've done some conversion work on legal documents where the goal was to
>get everything back on a single schema after a couple decades of
>evolution in the element names of various DTDs.  Matches of the form
>
><xsl:template match="*[name() = ('P','NP','PARA')]">
>
>showed up a fair bit to match on the abstract "that's a paragraph"
>across the range of evolved element names.
>
>There was also a fair bit of
>
><xsl:template match="*[not(name() = ('PARA','LIST','TABLE')))]">
>
>used as general "we don't think there's anything but those in the data
>but let's not make rash assumptions" surprise handler templates.
>
>-- Graydon

Current Thread