Re: [xsl] Muenchian grouping help - removing 'duplicates' from a nodeset

Subject: Re: [xsl] Muenchian grouping help - removing 'duplicates' from a nodeset
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Thu, 09 Oct 2003 09:45:32 -0500
Laura@xxxxxxx wrote:

I think they way to do this is via Muenchian grouping. I know what I need to
do: group all the <text> elements by their text() content; and select only
the first one in each group. But I've followed the guidelines on Jeni
Tennison's XSLT pages and I can't seem to get my head around how keys
actually work.

The way to do this is with what I call the "union trick". It took me a long time to finally figure out what was going on and I realized that my barrier had been not fully understanding that the "|" operator is a set union, not a logical OR. [I was trying to understand the code Jenny Tennison had written to do back-of-the-book index processing for Docbook.]


What you do is get the current node and the first node of the current nodes' entry in the key table and then construct a set from them using the union operator ("|"). If the result is a list of length one, then the two nodes must be the same node because if they were different nodes you'd get a set of length 2. The key is that sets, by definition, always contain exactly one copy of each node in the set.

So, given this group spec:

<xsl:key name="text-by-content" match="text" use="normalize-space(.)"
/>

You would do something like this:

<xsl:variable name="text-items"
select="//term[count(.|key('text-by-content',


normalize-space(.))[1]) = 1]"/>

Follow this from the inside out:

1. key('text-by-content',
       normalize-space(.))[1]

This looks up the key table entry for each term selected by the "//term" pattern and then selects the first item in that list, that is, the first instance of a given term value.

2. ".|key(...)[1]"

This creates a set from the current node and the first node of the key table entry that contains the current node.

3. count(.|key(...)[1])

This gets the length of the set.

4. count(...) = 1

This returns true if the length of the set is 1, meaning that the current <term> node is the first node in its containing key table entry. This node will be selected and added to the result node list.

You can test the result by doing this:

<xsl:for-each select="$text-items">
<xsl:message>[<xsl:value-of select="position(.)"/>] = '<xsl:value-of select="."/>'</xsl:message>
</xsl:for-each>


When doing this type of grouping work, I find it really useful to create a "debug" template that just constructs all the different groups and then reports them--makes it easier to work out the details of the key specs and lookups. If you're doing sorting, it also makes it easy to test your collation rules.

Cheers,

Eliot


-- W. Eliot Kimber ISOGEN International, LLC eliot@xxxxxxxxxx www.isogen.com


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread