RE: [xsl] Removing duplicates where duplicates are determined by the concatenation of two elements

Subject: RE: [xsl] Removing duplicates where duplicates are determined by the concatenation of two elements
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 18 Dec 2007 15:28:12 -0000
> My question is:
> Can anyone tell me what is wrong with my XSLT(se below)?

A great deal. First of all, the obvious way to tackle this is using
<xsl:for-each-group>. Apart from that:


> xmlns:fn="http://www.w3.org/2005/xpath-functions";>

You never need to declare this namespace. It's the default namespace for
functions; when you call standard functions in XSLT you don't need to use
any prefix. I know Altova creates this namespace declaration automatically,
but you should get rid of it, it's unwanted noise.

>                                <xsl:template match="/ | 
> node() | @* | comment() | processing-instruction()">

You don't need comment() or processing-instruction(); those nodes are
already matched by virtue of node().

In fact you don't need this template rule, since you are never invoking it.

>   <xsl:variable name="persons">
>      <xsl:for-each select="//person">
>          <xsl:copy-of select="."/>

Why are you going to such efforts to copy the data when you could work with
the original? You're also using //person when you could write
/persons/person which would almost certainly be more efficient. You just
want

<xsl:variable name="persons" select="/persons/person"/>

(though the variable in this case doesn't really add much value).

>  <xsl:for-each select="$persons/person">

$persons is a sequence of person elements. Person elements don't have
children called person, so the select will select nothing. Change it to
select="$persons".

>     <xsl:variable name="pos" select="position( )"/>
>     <xsl:if test="$pos = 1 or concat(./first_name,./surname)
!=concat(./first_name[$pos - 1],./surname[$pos - 1])">

Apart from the fact that you are hand-coding <xsl:for-each-group>, and
assuming the correction above:

(a) A person only has one surname, so ./surname[$pos - 1] selects nothing.
What you should be comparing with is $persons[$pos - 1]/surname

(b) Do you really want to compare the concatenation, that is to treat ANN
EWING as a duplicate of ANNE WING? Why not compare the surname and firstname
independently?

(c) The $pos=1 test is redundant. If $pos is 1, the test for equality of
names will automatically be false.
>                                                               
>                 <xsl:copy-of select="."/>
>                                                               
Even without xsl:for-each-group, your logic could be simplified to

<xsl:template match="/">
<xsl:copy-of select="/persons/person[not(first-name =
preceding-sibling::person[1]/first-name and surname =
preceding-sibling::person[1]/surname)]"/>
</xsl:template>

That's essentially the whole stylesheet...

Michael Kay
http://www.saxonica.com/

Current Thread