Re: [xsl] document() merge DISTINCT

Subject: Re: [xsl] document() merge DISTINCT
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Wed, 19 Dec 2001 14:55:19 +0000
Hi Alex,

> In each file all /person/@id are unique, but different files might
> contain the same @id . Now I want to produce a list of all <person>
> so that /person/@id is unique.

This is certainly more difficult than trying to find distinct values
within a single document, because key(), preceding-sibling:: and so on
all work within a single document.

One method, if you don't mind using an extension node-set() function,
is to generate a single result tree fragment containing all the person
elements, convert that to a node set, and then work with that new
'document' getting distinct values in the same way as you would
normally (e.g. with the Muenchian method).

That method has disadvantages because it uses the extension function
and because the intermediate node set that you're constructing could
be quite large, take up memory and therefore lead to slower
performance. If these don't turn out to be issues, though, it's the
method that I'd choose because it's easy.

The alternative is to use a recursive method. I'd write a template
that takes two arguments: a node set of unique people and a node set
of remaining people:

<xsl:template name="distinct">
  <xsl:param name="unique" select="/.." />
  <xsl:param name="remaining" select="/.." />
  ...
</xsl:template>

Then work through the remaining people one by one by recursion. If
there are people remaining, look at the first one to see whether it
should be added to the unique list (its id isn't the same as an
existing unique person) or not, and call the template with the new
sets:

<xsl:template name="distinct">
  <xsl:param name="unique" select="/.." />
  <xsl:param name="remaining" select="/.." />
  <xsl:choose>
    <xsl:when test="$remaining">
      <xsl:call-template name="distinct">
        <xsl:with-param name="unique"
          select="$unique | $remaining[1][not(@id = $unique/@id)]" />
        <xsl:with-param name="remaining"
                        select="$remaining[position() > 1]" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

If there are no people remaining, then you need to do whatever you
want to do to the unique people - apply templates to them for example:

<xsl:template name="distinct">
  <xsl:param name="unique" select="/.." />
  <xsl:param name="remaining" select="/.." />
  <xsl:choose>
    <xsl:when test="$remaining">
      <xsl:call-template name="distinct">
        <xsl:with-param name="unique"
          select="$unique | $remaining[1][not(@id = $unique/@id)]" />
        <xsl:with-param name="remaining"
                        select="$remaining[position() > 1]" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates select="$unique" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

When you call the template, I'd start off with $unique being set to
all the person elements in your first file, since you know that they
all have unique IDs. That will save some work. Something like:

  <xsl:call-template name="distinct">
    <xsl:with-param name="unique"
      select="document('sample1.xml')/project/person" />
    <xsl:with-param name="remaining"
      select="document('sample2.xml')/project/person" />
  </xsl:call-template>

The expression for $remaining could include more documents, naturally.
  
I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread