RE: [xsl] Counting nodes efficiently

Subject: RE: [xsl] Counting nodes efficiently
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Thu, 19 Feb 2004 02:19:32 -0800 (PST)
> >  Greetings.
> >  
> >  I've been using Jeni's method from the XSLT FAQ to assign 
> >  unique id's to 
> >  nodes. In order to speed things up, can anyone think of a way 
> >  that I could 
> >  store the running totals for the different nodes, rather than 
> >  having to 
> >  call the count() function repeatedly? A generalized method 
> >  would obviously 
> >  be the best, so that it could be applied to any arbitrary set 
> >  of nodes, 
> >  but I don't know if this is even possible.
> >  
> >  <xsl:template match="*">
> >     <xsl:variable name="name" select="name()" />
> >     <xsl:element name="{name()}">
> >       <xsl:attribute name="id">
> >         <xsl:value-of select="concat($name, '-',
> >         count(preceding::*[name()= $name]) +
> >         count(ancestor::*[name()= $name]))" />
> >       </xsl:attribute>
> >       <xsl:apply-templates />
> >     </xsl:element>
> >  </xsl:template>
> 
> 3 ways:
> 
> 1.  Create a node-set by selecting all the elements you wish to count
> and numbering them using position().  You can then query into this
> node-set using the generate-id() function to get the correct number for
> the element you're processing.  This only requies one pass of the data
> so its quite efficient.
> 
> 2.  Write a SAX Filter in java that numbers the elements on their way
> into the transform.  You can then select this number as if it was
> already in the data.
> 
> 3.  If you are using saxon, you can substring the value returned from
> generate-id() after the 'e', as the generated id's take form 'dxxeyy'
> where d is the document number and e is the element number.

As pointed out earlier, just using generate-id() probably solves the
original problem.

I will show here one solution to the problem, because it is interesting,
simple and uses and important xslt design pattern.

In this and similar problems one can re-use the identity transformation,
which processes strictly one node at a time (would a native English
speaker, please, suggest a good name? The best I could arrive at was "one
node at a time identity transformation"):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 <xsl:output omit-xml-declaration="yes"/>
 
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="node()[1]"/>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]"/>
  </xsl:template>

</xsl:stylesheet>

This transformation produces the same results as the more well-known
identity rule.

The difference is that we now have the finest possible grain-level of
controll as every xsl:apply-templates instruction above always selects at
most one node.

It is trivial to add parameters and to override the one-at-a-time identity
for elements. Thus we finally have:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 <xsl:output omit-xml-declaration="yes"/>
 
  <xsl:template match="@* | node()">
    <xsl:param name="pnAncestors" select="0"/>
    <xsl:param name="pnPreceding" select="0"/>
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="node()[1]">
        <xsl:with-param name="pnAncestors" 
                        select="$pnAncestors+1"/>
        <xsl:with-param name="pnPreceding"
                        select="$pnPreceding"/>
      </xsl:apply-templates>
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding+1"/>
          
    </xsl:apply-templates>
  </xsl:template>
  
  <xsl:template match="*">
    <xsl:param name="pnAncestors" select="0"/>
    <xsl:param name="pnPreceding" select="0"/>
    
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:attribute name="_id">
        <xsl:value-of select=
                    "concat(name(), '_',
                            $pnAncestors, '_', 
                            $pnPreceding 
                            )"/>
      </xsl:attribute>
    </xsl:copy>
    <xsl:apply-templates select="node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors+1"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node()[1]">
      <xsl:with-param name="pnAncestors" 
                      select="$pnAncestors"/>
      <xsl:with-param name="pnPreceding"
                      select="$pnPreceding+1"/>
          
    </xsl:apply-templates>
    
  </xsl:template>

</xsl:stylesheet>

This transformation is not long, it can be written almost mechanically, it
makes exactly one pass over the tree and it has a linear time complexity
(checked with inputs of different size).


Cheers,

Dimitre Novatchev 
FXSL developer,

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html




__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread