RE: [xsl] Finding unique nodes in a non-sibling nodeset

Subject: RE: [xsl] Finding unique nodes in a non-sibling nodeset
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Sun, 30 Jun 2002 20:14:45 +0100
> In a code generation transform that I am working on, I 
> frequently encounter situations where I need to eliminate 
> duplicate expressions or event calls. The nodes with the 
> commonality to be detected are often scattered around 
> different parts of a large (preprocessed) reference document 
> that is loaded with a document call.
> 
> Previously, I had eliminated duplicates with something of the 
> form  $list[not(@key1=preceding-sibling::*/@key1)]
> or
>  $list[not(@key1=preceding::*/@key1)]
> ... If I wanted to look back through the whole document.
> 
> In this situation however, the nodes to be duplicate-trimmed are
> 
> [A] Selected out of the reference document in very specific contextual
>   ways (e.g. deep inside xsl:template / xsl:for-each usages) 
> [B] Not all sibling nodes [C] The preceding axis can't be 
> used since it looks at the whole
>     preceding area of the document, not just my carefully 
> selected nodes. [D] The definition of duplication requires 
> use of multiple node
>     attributes.  i.e. needs a composite key.
> 
> Even if [D] were not true, the "preceding-sibling" axis 
> approach would not work because of [B] and the "preceding" 
> axis approach would not work because of [C].

Muenchian grouping should be able to cope with this, provided (a) all
the nodes are in the same document, and (b) you can code the rules for
"carefully selecting" the nodes in a match pattern. You can handle
composite keys using concatenation.

Where these conditions aren't true, the usual approach is to build a
temporary tree containing copies of the selected nodes. You can then use
Muenchian grouping on this tree, accessing it using the xx:node-set()
extension function.

> 
> I eventually hit on a way to solve this (since I use Saxon) 
> using saxon:tokenize. But I always wondered if there was a 
> non-extension way to do it.
> 
> What I did was build an aggregate string with delimiters from 
> the nodes in the set in question (in a variable called 
> "$list"), like so ...
> 
>   <xsl:variable name="aggregate">
>     <xsl:for-each select="$list">
>       <xsl:value-of select="concat(@key1,'/',@key2)" />
>       <xsl:if 
> test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if>
>     </xsl:for-each>
>   </xsl:variable>
> 
> Then use tokenize to get a node set ...
> 
>  <xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/>
> 
> And eliminate the duplicates the standard (?) way with
> 
>  <xsl:variable name="list4NoDups" 
> select="$list4[not(.=preceding-sibling::*)]"/>

Innovative, but as you say, if you're going to use extensions,
saxon:distinct() does the job more directly.

> There are features in Saxon 7.1 that we are very interested 
> in, so I needed to try to find a different technique.
> 
XPath 2.0 offers a distinct-values() function, but it's not yet
available in Saxon. What you can use, however, is <xsl:for-each-group>.
I think this should solve your problem fairly directly.

<xsl:for-each-group select="$list" group-by="concat(@key1, '/', @key2)">
  ...

This will iterate once for each distinct value of the group-by key, with
the context node being the first node in $list that has that key value.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread