Re: [xsl] Duplicate Elimination

Subject: Re: [xsl] Duplicate Elimination
From: Ihe Onwuka <ihe.onwuka@xxxxxxxxx>
Date: Thu, 13 Mar 2014 10:59:42 +0000
On Thu, Mar 13, 2014 at 10:14 AM, David Carlisle <davidc@xxxxxxxxx> wrote:
> On 12/03/2014 23:42, Ihe Onwuka wrote:
>> As suspected it was possible to avoid grouping. See the predicate
>> tacked on to B/Date.
>> Thanks all.
>> <xsl:apply-templates select="A/Date | B/Date[not(A/Date/text() = text())]>
>>     <xsl:sort select="." order="ascending"/>
>> </xsl:apply-templates>
> It seems unlikely that that predicate is ever going to be true (unless you
> have a structure like

you are almost certainly right there.

> I suspect you intended
> <xsl:apply-templates select="A/Date | B/Date[not(current()/A/Date/text() =
> text())]>

I was inclined to post a description of the solution rather than code,
but I  posted abbreviated code for illustrative purposes.

The actual implementation was something like

<xsl:variable name="aDate" select="A/Date"/>
<xsl:apply-templates select="$aDate | B/Date[not($aDate/text() =  text())]/>

but showing the variable caching of A/Date doesn't add to what I was
trying to illustrate (which ironically was the simplicity of the
alternative I opted for). In effect I abused the fact that the mailing
list is neither compiler nor interpreter and posted psuedo-code.

I think this is equivalent to what you have above. It worked anyway.

> But unlike muenchian grouping or xsl-for-each (both of which actually have a
> simpler syntax than this)

The for-each syntax is more verbose.

Muenchian grouping is not something I burden my short term memory with
because I hardly ever use it and and it's  a phrase that is
meaningless beyond a very select cognoscenti. What I posted can
literally be described to a layman - add all the B/Dates that aren't
in the set of A/Dates additionally it literally translates it into a

A union (B diff A)

That's why I prefer it.

> this will (unless you have a very aggressively
> optimising XSLT engine) be quadratic in performance as the full A list is
> going to be searched for every B.

good point. If and when the volumes warrant performance tuning I'll
know where to start.

> Also of course using text() rather than
> <xsl:apply-templates select="A/Date | B/Date[not(current()/A/Date = .)]>
> means the code is very fragile and will break if comments spit up the text
> nodes.

I have been doing too much XQuery recently, but is . robust against
changes to the content model?

Current Thread